Get real-time task notifications for Project Manager agent in Planner
May 8, 2025IAMCP Profiles in Partnership Ep 10 | Sweet Success: Insights from Suresh Ramani
May 8, 2025AI distillation is a powerful technique in machine learning that involves transferring knowledge from a large, complex model (often called the “teacher”) to a smaller, more efficient “student” model. The goal is to retain the performance and accuracy of the larger model while drastically reducing computational requirements, making AI systems faster, cheaper, and more deployable especially in real-time or resource-constrained environments. In this post, we’ll explore what AI distillation is, why it’s gaining traction, and how it’s being used to bring the power of advanced AI models to everyday applications.
Stored completions in Azure OpenAI’s AI Foundry provide a structured way to capture and reuse high-quality model responses, streamlining the model distillation process. By logging curated prompts and their ideal completions, teams can efficiently train smaller or specialized models that emulate the behavior of larger foundation models. This approach not only accelerates fine-tuning and evaluation workflows but also ensures consistency and control in model behavior that are critical for enterprise applications. Stored completions become a key asset in the distillation pipeline, enabling developers to systematically build domain-specific AI models while reducing cost and inference time.
First let’s create the completions. You’ll have to run snippets of code in your application that can connect to your models in Azure OpenAI. Currently you can connect to your models for Stored Completions using Entra, API Keys, or even Rest APIs with a Bearer token. The code below creates and stores a completion using an Azure OpenAI API Key. You’ll need a minimum of 10 Stored Completions to enabled Distillation on AI Foundry. For this example we’ll use the gpt-4o model to generate responses(completions) about analyzing a company’s quarterly earnings report. The user questions are manually added in the code and can be updated at anytime.
Generate a Stored Completion with GPT-4o
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv(“AZURE_OPENAI_API_KEY”),
api_version=”2025-02-01-preview”,
azure_endpoint = os.getenv(“AZURE_OPENAI_ENDPOINT”)
)
completion = client.chat.completions.create(
model=”gpt-4o”, # replace with model deployment name
store= True,
metadata = {
“user”: “admin”,
“category”: “docs-test”,
},
messages = [
{“role”: “system”, “content”: “You are a financial analyst with expertise in analyzing quarterly earning reports. Your task is to assist users in understanding and interpreting these reports.”},
{“role”: “user”, “content”: “What information would you extract from a earnings call transcript to determine a companies financial health?”}
]
)
print(completion.choices[0].message)
Once at least 10 completions are done you’ll see in AI Foundry a list of these completions along with the “Distill” button enabled. Let’s run through a quick example.
Next we’ll select the training data(generated from the selected input/output), and upload the validation data(helps ensure that the model performs well on unseen data and maintains high accuracy and reliability).
Validation Dataset
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “Can you help me analyze the latest quarterly earnings report for XYZ Corporation?”}, {“role”: “assistant”, “content”: “Absolutely! Let’s dive into XYZ Corporation’s latest quarterly earnings report. First, we’ll look at their revenue, net income, and earnings per share (EPS). Then, we’ll analyze key financial ratios and trends. What specific aspects are you most interested in?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “What are the key takeaways from ABC Inc.’s latest quarterly earnings?”}, {“role”: “assistant”, “content”: “Sure, let’s review ABC Inc.’s latest quarterly earnings. We’ll focus on their revenue growth, profit margins, and any significant changes in their balance sheet. Are there any specific metrics or trends you want to highlight?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “How did DEF Corporation perform in their latest quarter?”}, {“role”: “assistant”, “content”: “Let’s take a look at DEF Corporation’s performance. We’ll examine their revenue, operating income, and net income. Additionally, we’ll review their cash flow statement and any notable changes in their financial position. What specific details are you interested in?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “Can you summarize the latest earnings report for GHI Ltd.?”}, {“role”: “assistant”, “content”: “Certainly! We’ll summarize GHI Ltd.’s latest earnings report by focusing on their key financial metrics such as revenue, net income, and EPS. We’ll also highlight any significant events or trends that impacted their performance. Is there a particular area you want to delve into?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “What were the major highlights from JKL Corporation’s quarterly earnings?”}, {“role”: “assistant”, “content”: “Let’s review the major highlights from JKL Corporation’s quarterly earnings. We’ll look at their revenue growth, profitability, and any significant changes in their financial ratios. Are there any specific metrics or trends you want to focus on?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “How did MNO Inc. perform financially in the last quarter?”}, {“role”: “assistant”, “content”: “Let’s analyze MNO Inc.’s financial performance in the last quarter. We’ll examine their revenue, net income, and EPS. Additionally, we’ll review their balance sheet and cash flow statement for any notable changes. What specific details are you interested in?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “Can you provide an overview of PQR Ltd.’s latest quarterly earnings?”}, {“role”: “assistant”, “content”: “Sure, let’s provide an overview of PQR Ltd.’s latest quarterly earnings. We’ll focus on their revenue, net income, and EPS. We’ll also highlight any significant events or trends that impacted their performance. Is there a particular area you want to delve into?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “What are the key financial metrics from STU Corporation’s latest earnings report?”}, {“role”: “assistant”, “content”: “Let’s review the key financial metrics from STU Corporation’s latest earnings report. We’ll look at their revenue, net income, and EPS. Additionally, we’ll analyze their profit margins and any significant changes in their balance sheet.”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “How did VWX Ltd. perform in their latest quarter?”}, {“role”: “assistant”, “content”: “Let’s take a look at VWX Ltd.’s performance. We’ll examine their revenue, operating income, and net income. Additionally, we’ll review their cash flow statement and any notable changes in their financial position. What specific details are you interested in?”}]}
{“messages”: [{“role”: “system”, “content”: “You are a financial analyst chatbot that provides detailed and insightful analysis of companies’ quarterly earnings and financials.”}, {“role”: “user”, “content”: “Can you summarize the latest earnings report for YZA Corporation?”}, {“role”: “assistant”, “content”: “Certainly! We’ll summarize YZA Corporation’s latest earnings report by focusing on their key financial metrics such as revenue, net income, and EPS. We’ll also highlight any significant events or trends that impacted their performance.”}]}
Although validation data is optional it’s important to see how your models will perform against data that will be used in your solution. This also helps determine how much additional data or fine-tuning is required after the initial run. For the following steps, leave Task Parameters as Defaults, then Review and Submit.
After clicking submit, a job for Distillation will be queued and then begin running which you can view that status on the left menu in Fine-tuning(the tuning job may run for a while). You can also toggle between the Details, Logs, and Checkpoints tabs to view more details.
In the Metrics tab we are able to see the model’s accuracy and training loss where the validation data was used.
Observing the fine-tuned model’s performance
From above we see the training loss go down as the model trains. It’s the difference between the predicted output and the actual output. Lower training loss indicates better model performance. Additionally the mean token accuracy is slightly higher at the end than it was before training. This metric represents the percentage of tokens in the training batch that the model correctly predicts. However the differences between training loss and mean token accuracy are only minimal so additional fine-tuning would be required. With Continual Fine-tuning this is possible. This model can also be deployed and will be ready for use in your AI Solution.
Conclusion
Stored completions and fine-tuning in Azure AI Foundry represent significant advancements in the customization and optimization of AI models. This allows for the capture of conversation histories from chat completion sessions that occur in your AI application where these generated datasets can be used for evaluation and further fine-tuning enabling models to be tailored to specific use cases. These features empower organizations to build robust, high-performing AI solutions that will drive innovation and efficiency across various applications.
Additional Resources
How to use Azure OpenAI Service stored completions & distillation – Azure OpenAI | Microsoft Learn
Customize a model with Azure OpenAI Service – Azure OpenAI | Microsoft Learn