Five Stages Of LLM Implementation | by Cobus Greyling | Dec, 2023 | Medium – Medium
Cobus Greyling
Follow
—
Listen
Share
What is clear from market adoption in terms of LLMs, the LLM tooling development is far ahead of LLM implementation; with real-world customer facing execution lagging.
From a tooling perspective most of the focus and consideration is given to LLM Stage 4 but soon organisations will learn that any successful AI implementation requires a successful data strategy. And Hence LLM Stage 5 will should receive much more focus.
When LLM implementations moved from Stage 1 to Stage 2, from design-time use to run-time use, there was a realisation that data needs to be delivered to the LLM at inference.
The importance of In-Context Learning (ICL) has been highlighted by numerous studies, and hence the importance of injecting prompts with highly succinct, concise and contextually relevant data.
Stage 1 of LLM implementations were focussed on the bot development process, and in specific accelerating NLU development.
What really enabled the LLM Stage 1 disruption was the fact that LLM functionality was introduced at design time as opposed to run time.
This meant that elements like inference latency, high volume use, cost and LLM response aberrations could be confined to development and not exposed to customers in a production setting.
LLMs were introduced to assist with the development of NLU in the form of clustering existing customer utterances in semantically similar groupings for intent detection. Once intent labels and descriptions were defined, intent training utterances could be defined.
LLMs could also be used for entity detection and more.
This accelerated the NLU development process and also made for more accurate NLU design data.
The next phase of LLM disruption was to use LLMs / Generative AI for chatbot & Voicebot copy writing and to improve bot response messages.
This approach again was introduced at design time as opposed to run time, acting as an assistant to bot developers in crafting and improving their bot response copy.
Designers could also describe to the LLM a persona, tone and other personality traits of the bot in order to craft a consistent and succinct UI.
This is the tipping point where LLM assistance extended from design time to run time.
The LLM was used to generate responses on the fly and present it to the user. The first implementations used LLMs to answer out-of-domain questions, or craft succinct responses from document search and QnA.
LLMs were leveraged for the first time for:
Stage 1 was very much focused on leveraging LLMs and Gen-AI at design time which has a number of advantages in terms of mitigating bad UX, cost, latency and any aberrations at inference.
The introduction of LLMs at design time was a safe avenue in terms of the risk of customer facing aberrations or UX failures. It was also a way to mitigate cost and not face the challenges of customer and PII data being sent into the cloud.
What followed was a more advanced implementation of LLMs and Generative AI (Gen-AI) with a developer describing to the bot how to develop a UI and what the requirements are for a specific piece of functionality.
And subsequently the development UI went off, leveraging LLMs and Generative AI, it generated the flow, with API place holders, variables required and NLU components.
Stage two saw LLMs being used to edit text prior to sending the bot response to the user. For instance, on different chatbot mediums the appropriate message size differs. Hence bot responses could be easily controlled by asking the LLM to summarise, extract key points and change the tone of the response based on user sentiment.
This meant that the hard requirement for a message abstraction layer was deprecated to some degree. In any chatbot / Conversational AI development framework the job of the message abstraction layer is to hold a whole array of bot response messages.
These bot response messages had placeholders which needed to be filled with context specific data to respond to the user with.
Different sets of responses had to be defined for each modality and medium. LLMs made the crafting of responses on the fly easier. This was the NLG (Natural Language Generation) tool we were all waiting for.
Chatbots can be given a document, piece of information at inference, this allowed for the LLM to have a frame of reference for the conversation. Scaling this approach had two impediments, the first is the impediment of limited LLM context windows, and also scaling this approach.
Rag served as a solution to the problems mentioned above. Read more about RAG here.
Prompt Chaining found its way into Conversational AI development UIs, with the ability to create flow nodes consisting of one or more prompts being passed to a LLM.
Longer dialog turns could be strung together with a sequence of prompts, where the output of one prompt serves as the input for another prompt.
Between these prompt nodes are decision and data processing nodes…so prompt nodes are very much analogous to traditional dialog flow creation, but with the flexibility so long yearned for.
Technology suppliers started creating their own custom playgrounds with extra features and acting as an IDE and collaboration space.
This moved users beyond using LLM-based playgrounds only. Custom playgrounds offered access to multiple models for experimentation, collaboration and various starter code generation options.
Both Haystack and LangChain have launched open community-based prompt hubs.
Prompt hubs help to encode and aggregate best practices for different approaches to Prompt Engineering. The vision is for Gen-Apps to become LLM agnostic where different models are to be used at different stages in the application.
While fine-tuning changes the behaviour of the LLM and RAG provides a contextual reference for inference, fine-tuning has not received the attention it should have in the recent past. One can argue that this is due to a few reasons…read more here…
In Machine Learning a pipeline can be described as an end-to-end construct, which orchestrates a flow of events and data.
The pipeline is kicked-off or initiated by a trigger; and based on certain events and parameters, a flow is followed which results in an output.
In the case of a prompt pipeline, the flow is in most cases initiated by a user request. The request is directed to a specific prompt template.
Read more here.
Agents make use of pre-assigned tools in an autonomous fashion to perform one or more actions. Agents follow a chain-of-thought reasoning approach.
The concept of autonomous agents can be daunting at first, read more here…
From this point onwards, the market has not really caught up…orchestration refers to orchestrating multiple LLMs for an application.
Most of the ailments plaguing LLM implementations are related to LLMs not being self-hosted, or hosted in a private data centre / cloud.
Delayed responses at inference, model drift, data governance and more are all factors which are solved if LLMs are self-hosted and managed.
Data Discovery is the process of identifying any data within an enterprise which can be used for LLM fine-tuning. The best place to start is with existing customer conversations from the contact centre which can be voice or text based. Other good sources of data to discover are customer emails, previous chats and more.
This data should be discovered via an AI accelerated data productivity tool (latent space) where customer utterances are grouped according to semantic similarity, These clusters can be visually represented as seen below, which are really intents or classification; and classifications are still important for LLMs.
Data design is the next step where the discovered data is transformed into the format required for LLM fine-tuning. The data needs to be structured and formatted in a specific way to serve as optional training data. The design phase compliments the discovery phase, at this state we know what data is important and will have the most significant impact on the users and customers.
Hence data design has two sides, the actual technical formatting of data and also the actual content and semantics of the training data.
This step entails the operational side of continuous monitoring and observing of customer behaviour and data performance. Data can be developed by augmenting training data with observed vulnerabilities in the model.
Data Delivery can be best described as the process of imbuing one or more models with data relevant to the use-case, industry and specific user context at inference.
The contextual chunk of data injected into the prompt, is referenced by the LLM to deliver accurate responses in each and every instance.
Often the various methods of data delivery are considered as mutually exclusive with one approach being considered as the ultimate solution.
This point of view is often driven by ignorance, a lack of understanding, organisation searching for a stop-gap solution or a vendor pushing their specific product as the silver bullet.
The truth is that for an enterprise implementation flexibility and manageability will necessitate complexity.
This holds true for any LLM implementation and the approach followed to deliver data to the LLM. The answer is not one specific approach, for instance RAG, or Prompt Chaining; but rather a balanced multi-pronged approach.
⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️
I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.
cobusgreyling.medium.com
—
—
I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com
Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams