Now generally available: new educator features in the AI-powered Reading Coach
May 21, 2025Public Preview: Deploy OSS Large Language Models with KAITO on AKS on Azure Local
May 21, 2025Vector Search
Vector search enables querying large datasets by comparing vector representations of data points. It is particularly useful in applications such as recommendation systems, image search, and natural language processing. Reranking the search results can further improve the relevance and accuracy of the retrieved information.
Why Rerank Results?
Reranking enhances the quality of search outcomes by adjusting the order of results based on additional criteria such as relevance or user preferences. This ensures that the most pertinent results are prioritised, improving user engagement and satisfaction.
What Is a Rerank Model?
A rerank model is an algorithm that refines initial search results by evaluating various factors beyond the basic query. These models often use machine learning to assess the relevance of each result, delivering a more tailored and accurate search experience.
Value Proposition and Problem Solving
Implementing vector search and rerank models in Azure Cosmos DB offers several advantages:
• Improved Search Accuracy: Vector representations capture semantic similarities, leading to more relevant results.
• Scalability: The NoSQL API in Azure Cosmos DB efficiently handles large datasets, ensuring fast and reliable performance.
• Customisation: Rerank models can be adapted to specific business needs, improving the overall user experience.
Example: Vectorising Data Using text-embeddings-3-small and Cosmos DB
The following example uses the text-embeddings-3-small model to vectorise data stored in Azure Cosmos DB.
Process Overview:
The HotpotQA dataset was used as test data. This dataset is designed for multi-hop reasoning, where each question requires synthesising information from multiple documents. A reduced version containing 100,000 documents was used. Sample questions were selected, and relevant corpora were retained to maintain dataset integrity while making it more manageable.
HotpotQA includes:
- A corpus dataset (with identifiers, titles, and text),
- A list of questions,
- A mapping dataset linking questions to relevant corpora.
Example corpus structure (Python dictionary):
{
’12’: {‘text’: ‘Anarchism is a political philosophy …’, ‘title’: ‘Anarchism’},
’25’: {‘text’: ‘Autism is a neurodevelopmental disorder …’, ‘title’: ‘Autism’},
’39’: {‘text’: ‘Albedo (…) is a measure for …’, ‘title’: ‘Albedo’}
}
Document design is straightforward: use the corpus ID as the document ID, include fields for text and title, and vectorise the concatenated title and text. Example document in the database:
{
“id”: “25”,
“text”: “Autism is a neurodevelopmental disorder …”,
“title”: “Autism”,
“vectorized_text”: [0.00988, -0.00505, 0.05237, 0.01458, -0.03818, 0.00907]
}
Evaluating Vector Search
In a typical RAG scenario, the top n results from a search are used. If documents are chunked, the top n chunks closest to the input question are selected. However, language models have token limits, so typically only 3–10 chunks are included—sometimes up to 100 if feasible.
Evaluation Example 1
Question: The director of the romantic comedy Big Stone Gap is based in what New York city?
Required corpora:
- Big Stone Gap (film): mentions Adriana Trigiani as the director.
- Adriana Trigiani: states she is based in Greenwich Village, NYC.
The Big Stone Gap corpus appears first, but Adriana Trigiani ranks 16th—outside the top 10—preventing a correct answer unless more results are included.
Evaluation Example 2
Question: What government position was held by the woman who portrayed Corliss Archer in the film Kiss and Tell?
Required corpora:
- Kiss and Tell (1945 film)
- Shirley Temple
While the first corpus ranks first, Shirley Temple ranks 273rd—far too low to be included in typical result sets. As the knowledge base grows, retrieving the right information becomes increasingly difficult.
Rerank to the rescue
Reranking improves accuracy by reordering results based on relevance. To use Cohere Rerank 3.5, provision it as a Pay-As-You-Go API in Azure AI Foundry. This provides an endpoint and API key for integration:
This is how the optimized RAG application looks like:
Using the Cohere Python SDK, the top 300 results were reranked for evaluation:
With the Python SDK calling the reranking service looks like:
Rerank Evaluation Results
Question 1: Big Stone Gap director’s NYC location
- Adriana Trigiani moved from 16th to 12th position—making it more likely to be included.
Question 2: Shirley Temple’s government role
- Shirley Temple moved from 273rd to 5th position—making a correct answer feasible.
Conclusion
Integrating vector search and reranking models in Azure Cosmos DB using the NoSQL API can enhance search accuracy and user satisfaction. By leveraging advanced techniques such as text embeddings and machine learning, organisations can deliver more relevant and personalised search experiences.
Additional resources
- Notebooks used for the presented tests: pauldj54/RAGCosmosDBReRank
- Master Reranking with Cohere Models — Cohere
- Get started with Azure Cosmos DB for NoSQL using Python | Microsoft Learn