Mighty Minions: Unleashing domain-specific GenAI via SLMs
Large language models (LLMs) are enabling development of highly
capable generative AI applications. But these generic models can be
expensive and energy-intensive to run, prompting growing interest in
bespoke smaller language models (SLMs) that promise greater
cost-efficiency, deployment flexibility and enhanced privacy control.
While fine-tuning LLMs on smaller datasets for specific use cases is a
prolonged and resource-intensive process, fine-tuning pre-trained SLMs
with domain-specific data can be accomplished swiftly. For example, an
insurance company could fine-tune a pre-trained SLM with its policy
documents in just two to three hours. Using an SLM allows for the
implementation of a generative AI model on devices with relatively low
processing and memory requirements, reducing overall cost of ownership
by around 30%.
As they can draw on customer, network, operations and billing data,
CSPs could build SLMs both for internal use and for enterprise
customers, opening up new revenue streams. This Catalyst plans to
introduce an architectural framework wherein all CSP data is securely
centralized on a single platform, facilitating the creation of clean
and pre-processed datasets.
This end-to-end framework would empower CSPs to extend this service to
other enterprises, which could use their proprietary data to
efficiently and effectively create their own generative AI models.
CSPs could expose pre-trained SLMs arising from the framework as APIs
so that enterprises can access and use them seamlessly, without
needing a team of technical experts.
Until now, implementing generative AI has required specialized skills
in machine learning, data science and AI development. Enterprises may
struggle with a shortage of talent or expertise in these areas, making
it challenging to develop and deploy AI solutions, while also
addressing ethical concerns and regulatory requirements. Once
complete, this Catalyst project will help CSPs and businesses overcome
these challenges by enabling them to harness pre-trained,
domain-specific models that perform better than generic LLMs, while
offering lower latency and reduced power consumption.