Skip to content

This sample shows how to use a Cosmos DB Trigger in Azure Functions Triggers (C# or Python) to automatically generate embeddings on data on new or updated data.

License

Notifications You must be signed in to change notification settings

AzureCosmosDB/cosmos-embeddings-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure Cosmos DB Embeddings Generator

This sample shows how to use an Azure Cosmos DB Trigger and Output Binding in Azure Functions to automatically generate Azure OpenAI embeddings on new or updated data.

This sample demonstrates the following concepts

  • Azure Cosmos DB Trigger and Output Bindings for Azure Functions in both C# and Python
  • Embedding generation using Azure OpenAI SDK with text-embedding-3-small embedding model
  • Preventing endless loops in Functions Triggers from in-place document updates by comparing hash values generated on the document.
  • Keyless deployment of Azure Functions, Azure Cosmos DB, Azure OpenAI with managed identities and RBAC

Getting Started:

Deployment

  1. Open a terminal and navigate to where you would like to clone this solution.

  2. Navigate to either the csharp or python directory in this solution.

  3. Run the following command to ensure correct permissions to write a local.settings.json file locally.

    • If using Windows, open a second Terminal as Administrator and run the following PowerShell command

      set-executionpolicy remotesigned
    • If using Mac or Linux, open Bash and run the following command. This likely requires sudo.

      chmod +x ./infra/scripts/*.sh 

      Note: This sample deploys using azd. To enable local debugging, a local.settings.json file is created in the project directory with the values for the deployed sample in Azure.

  4. From within the csharp or python directory, deploy the sample to Azure.

azd up

Post Deployment

  1. Check for a local.settings.json file in the directory you deployed from. If it does not exist, create a new one using the sample.settings.json in the same directory.
  2. Replace the placeholder text for the Cosmos DB and OpenAI Account names below. Use the correct value for FUNCTIONS_WORKER_RUNTIME
{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "FUNCTIONS_WORKER_RUNTIME": "python", //"dotnet-isolated"
    "COSMOS_CONNECTION__accountEndpoint": "https://{my-cosmos-account}.documents.azure.com:443/",
    "OPENAI_ENDPOINT": "https://{my-open-ai-account}.openai.azure.com/",
    "COSMOS_DATABASE_NAME": "embeddings-db",
    "COSMOS_CONTAINER_NAME": "customer",
    "COSMOS_VECTOR_PROPERTY": "vectors",
    "COSMOS_HASH_PROPERTY": "hash",
    "COSMOS_PROPERTY_TO_EMBED": "customerNotes",
    "OPENAI_DEPLOYMENT_NAME": "text-3-small",
    "OPENAI_DIMENSIONS": "1536"
  }
}

Quick-Start:

  1. Open a browser to Azure Portal. Locate the resource group for the deployed sample.
  2. Open the deployed Azure Cosmos DB account and navigate to the customer container in Cosmos Data Explorer
  3. Create a new document with the same schema as the one below and save.

Example document:

{
   "id": "00001",
   "customerId": "10001",
   "customerNotes": "lorum ipsum."
}
  1. After clicking Save, the document should reappear with a number of system properties.

  2. Press F5 or refresh the browser window and it should then reappear with a hash property and vectors array stored in the document as shown below.

    Note: The Functions start-up may miss the first trigger execution. If the embeddings do not appear as below. Make a small change to the same document and save to re-execute the trigger.

Sample Embeddings Document

Run locally:

Run the sample locally in a debugger for either the Python or CSharp version that was deployed.

Python

Pre-reqs

  1. Python 3.11
  2. Azure Functions Core Tools 4.0.6610 or higher
  3. Azurite

Setup

MacOS/Linux/WSL
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt
Windows
python -m venv venv
source venv\Scripts\activate
pip install -r requirements.txt

Run the Sample:

  1. Ensure you have the Python extension installed. If not, install it from the Extensions view (Ctrl+Shift+X).
  2. Open the Command Palette (Ctrl+Shift+P) and type Python: Select Interpreter. Choose the appropriate Python interpreter for your project.
  3. Open the function_app.py Python file you want to debug.
  4. Open the Run and Debug view by clicking the Run icon on the sidebar or pressing Ctrl+Shift+D.
  5. Click on create a launch.json file link to create a new launch configuration.
  6. Select Python from the list of environments.
  7. Press F5 to start debugging. The Azure Function will start, and execution will pause at any breakpoints you've set.

CSharp

Pre-reqs

  1. .NET 8
  2. Azure Functions Core Tools 4.0.6610 or higher
  3. Azurite

Run the Sample:

  1. Open the csharp project folder in VS Code.
  2. Ensure you have the C# extension installed. If not, install it from the Extensions view (Ctrl+Shift+X).
  3. Open the CosmosEmbeddingGenerator.cs file to debug.
  4. Set breakpoints by clicking in the gutter to the left of the line numbers.
  5. Open the Run and Debug view by clicking the Run icon on the sidebar or pressing Ctrl+Shift+D.
  6. Click on create a launch.json file link to create a new launch configuration.
  7. Press F5 to start debugging. The Azure Function will start, and execution will pause at any breakpoints you've set.

About

This sample shows how to use a Cosmos DB Trigger in Azure Functions Triggers (C# or Python) to automatically generate embeddings on data on new or updated data.

Resources

License

Stars

Watchers

Forks