All environments require additional Conda packages which can be installed with either the conda/environments/all_cuda-125_arch-$(arch).yaml
or conda/environments/examples_cuda-125_arch-$(arch).yaml
environment files. This example also requires the VDB upload pipeline to have been run previously.
Environment | Supported | Notes |
---|---|---|
Conda | ✔ | |
Morpheus Docker Container | ✔ | Requires launching Milvus on the host |
Morpheus Release Container | ✔ | Requires launching Milvus on the host |
Dev Container | ✘ |
The purpose of this example is to illustrate how a user could build a Retrieval Augmented Generation pipeline integrating informational feeds and an LLM service into a Morpheus pipeline. This example builds on the previous Completion Pipeline example, by adding the ability to augment LLM queries with context information from a knowledge base. Appending this context helps improve the responses from the LLM by providing additional background contextual and factual information which the LLM can pull from for its response.
- In order for this pipeline to function correctly, a Vector Database must already have been populated with information that can be retrieved.
- An example of populating a database is illustrated in VDB upload
- This example assumes that pipeline has already been run to completion.
- Any vector database can be used to store the resulting embedding and corresponding metadata.
- It would be trivial to update the example to use Chroma or FAISS if needed.
- For this example, we will be using Milvus since it is the default VDB used in the VDB upload pipeline.
In order to cater to the unique requirements of the Retrieval Augmented Generation (RAG) mechanism, the following steps were incorporated:
- Embedding Retrieval: Before the LLM can make a completion, relevant context is retrieved from the Vector Database. This context is in the form of embeddings that represent pieces of information closely related to the query.
- Context Augmentation: The retrieved context is then appended to the user's query, enriching it with the necessary background to assist the LLM in generating a more informed completion.
- LLM Query Execution: The augmented query is then sent to the LLM, which generates a response based on the combination of the original query and the appended context.
- Using Milvus as VDB: Milvus offers scalable and efficient vector search capabilities, making it a natural choice for embedding retrieval in real-time.
- Flexible LLM integration: The LLM is integrated into the pipeline as a standalone component, which allows for easy swapping of models and ensures that the pipeline can be easily extended to support multiple LLMs.
The standalone Morpheus pipeline is built using the following components:
- An
InMemorySourceStage
to hold the LLM queries in a DataFrame.- We supply a fixed set of questions in a
source_df
which are then processed by theLLMEngineStage
- We supply a fixed set of questions in a
- A
DeserializationStage
to convertMessageMeta
objects intoControlMessage
objects as needed by theLLMEngine
.- New functionality was added to the
DeserializeStage
to supportControlMessage
s and add a default task to each message.
- New functionality was added to the
- An
LLMEngineStage
then wraps the coreLLMEngine
functionality.- An
ExtracterNode
pulls the questions out of the DataFrame. - A
RAGNode
performs the retrieval and adds the context to the query using the supplied template and executes the LLM. - Finally, the responses are put back into the
ControlMessage
using aSimpleTaskHandler
.
- An
- The pipeline concludes with an
InMemorySink
stage to store the results.
Note: For this to function correctly, the VDB upload pipeline must have been run previously.
Before running the pipeline, we need obtain service API keys for the following services:
- Follow the instructions here
- We'll refer to your NGC API key as
${NGC_API_KEY}
for the rest of this document.
- Follow the instructions here to obtain an OpenAI API key.
- We'll refer to your OpenAI API key as
${OPENAI_API_KEY}
for the rest of this document.
Before running the pipeline, we need to ensure that the following services are running:
- Follow the instructions here to install and run a Milvus service.
The top level entrypoint to each of the LLM example pipelines is examples/llm/main.py
. This script accepts a set
of Options and a Pipeline to run. Baseline options are below, and for the purposes of this document we'll assume a
pipeline option of rag
:
Using NGC NeMo LLMs
export NGC_API_KEY=[YOUR_KEY_HERE]
python examples/llm/main.py rag pipeline
Using OpenAI LLM models
export OPENAI_API_KEY=[YOUR_KEY_HERE]
python examples/llm/main.py rag pipeline --llm_service=OpenAI --model_name=gpt-3.5-turbo