Advanced RAG with Azure Cosmos DB and Cohere Rerank 3.5

Now generally available: new educator features in the AI-powered Reading Coach

May 21, 2025

Public Preview: Deploy OSS Large Language Models with KAITO on AKS on Azure Local

May 21, 2025

Published by azurefeeds on May 21, 2025

Example: Vectorising Data Using text-embeddings-3-small and Cosmos DB

The following example uses the text-embeddings-3-small model to vectorise data stored in Azure Cosmos DB.

Process Overview:

Process overview of document ingestion using embeddings in Azure Cosmos Db

The HotpotQA dataset was used as test data. This dataset is designed for multi-hop reasoning, where each question requires synthesising information from multiple documents. A reduced version containing 100,000 documents was used. Sample questions were selected, and relevant corpora were retained to maintain dataset integrity while making it more manageable.

HotpotQA includes:

A corpus dataset (with identifiers, titles, and text),

A list of questions,

A mapping dataset linking questions to relevant corpora.

Example corpus structure (Python dictionary):

{
’12’: {‘text’: ‘Anarchism is a political philosophy …’, ‘title’: ‘Anarchism’},
’25’: {‘text’: ‘Autism is a neurodevelopmental disorder …’, ‘title’: ‘Autism’},
’39’: {‘text’: ‘Albedo (…) is a measure for …’, ‘title’: ‘Albedo’}
}

Document design is straightforward: use the corpus ID as the document ID, include fields for text and title, and vectorise the concatenated title and text. Example document in the database:

{
“id”: “25”,
“text”: “Autism is a neurodevelopmental disorder …”,
“title”: “Autism”,
“vectorized_text”: [0.00988, -0.00505, 0.05237, 0.01458, -0.03818, 0.00907]
}

Evaluating Vector Search

In a typical RAG scenario, the top n results from a search are used. If documents are chunked, the top n chunks closest to the input question are selected. However, language models have token limits, so typically only 3–10 chunks are included—sometimes up to 100 if feasible.

Evaluation Example 1

Question: The director of the romantic comedy Big Stone Gap is based in what New York city?

Required corpora:

Big Stone Gap (film): mentions Adriana Trigiani as the director.

Adriana Trigiani: states she is based in Greenwich Village, NYC.

The Big Stone Gap corpus appears first, but Adriana Trigiani ranks 16th—outside the top 10—preventing a correct answer unless more results are included.

Top 10 results for the example 1

Evaluation Example 2

Question: What government position was held by the woman who portrayed Corliss Archer in the film Kiss and Tell?

Required corpora:

Kiss and Tell (1945 film)

Shirley Temple

While the first corpus ranks first, Shirley Temple ranks 273rd—far too low to be included in typical result sets. As the knowledge base grows, retrieving the right information becomes increasingly difficult.

The corpus with title “Shirley Temple” appears in position 273

Rerank to the rescue

Reranking improves accuracy by reordering results based on relevance. To use Cohere Rerank 3.5, provision it as a Pay-As-You-Go API in Azure AI Foundry. This provides an endpoint and API key for integration:

Cohere Rerank 3.5 in Azure AI Foundry

This is how the optimized RAG application looks like:

Vector Search with Re-ranked Results

Using the Cohere Python SDK, the top 300 results were reranked for evaluation:

With the Python SDK calling the reranking service looks like:

Rerank Evaluation Results

Question 1: Big Stone Gap director’s NYC location

Adriana Trigiani moved from 16th to 12th position—making it more likely to be included.

Question 2: Shirley Temple’s government role

Shirley Temple moved from 273rd to 5th position—making a correct answer feasible.

Conclusion

Integrating vector search and reranking models in Azure Cosmos DB using the NoSQL API can enhance search accuracy and user satisfaction. By leveraging advanced techniques such as text embeddings and machine learning, organisations can deliver more relevant and personalised search experiences.

Additional resources

Notebooks used for the presented tests: pauldj54/RAGCosmosDBReRank

Master Reranking with Cohere Models — Cohere

Get started with Azure Cosmos DB for NoSQL using Python | Microsoft Learn

Now generally available: new educator features in the AI-powered Reading Coach

Public Preview: Deploy OSS Large Language Models with KAITO on AKS on Azure Local

Now generally available: new educator features in the AI-powered Reading Coach

Public Preview: Deploy OSS Large Language Models with KAITO on AKS on Azure Local

Example: Vectorising Data Using text-embeddings-3-small and Cosmos DB

Evaluating Vector Search

Rerank to the rescue

Rerank Evaluation Results

Conclusion

Additional resources

Related posts

Building an Automated Recovery Pipeline for GPU Clusters with Slurm on Azure Part1

Using OSConfig to manage Windows Server 2025 security baselines

Announcing the Public Preview of the Applications feature in Azure API management