What’s New in Azure AI Foundry Finetuning: July 2025

New Course Now Pages Live for 1-Day, Non-Cert-Aligned ILTs

August 1, 2025

Check This Out! (CTO!) Guide (July 2025)

August 1, 2025

Published by azurefeeds on August 1, 2025

RFT Observability (“Auto-Evals”)

Reinforcement Finetuning (RFT) observability provides real-time, in-depth visibility into your RFT job by automatically kicking off evaluation (“auto-evals”) that shows the detailed finetuning progress at each checkpoint.

What is RFT?

RFT adjusts model weights using a reward model (grader) to score outputs against the reference data. The grader’s results are then used to reward or penalize the model response, steering the model’s reasoning direction and quality towards desired outcomes. The benefit of using RFT over supervised fine-tuning (SFT) is that RFT incorporates these reward signals real-time during training – but it’s slower and more expensive than SFT.

Previously, customers could only evaluate RFT results after a training run has been completed. This meant a wrong grader type selection or unexpected model behavior (e.g. reward hacking) could waste significant time and budget.

How does RFT Observability help?

Today, we’re making it easier to see what’s happening under the hood for RFT. When a RFT job starts, we automatically create a linked evaluation job. The evaluation exposes intermediate results (prompts & responses) and grades from RFT at each checkpoint step, so that you can have more visibility and control of training while the RFT job is running. This means you can now monitor, debug, and steer your RFT model while it is being trained, significantly reducing potential loss in cost and time.

Where can I find RFT Observability?

On Azure AI Foundry’s ‘Fine-tuning’ page, there is a new section ‘Evaluation’ under ‘Details’ that is visible if the method of customization is ‘Reinforcement’. This is the linked evaluation job that has been automatically created for all finetuning jobs that were RFT.

When you click ‘View Report’, you are taken to the evaluation job page, where you will see the grader scores for each checkpoint step. You can also toggle back to the RFT job by clicking ‘Linked finetuning job’ at the top of the page.

When you click ‘Logs’ on the finetuning page for the RFT job, you can track the progress of RFT Observability from the initial creation with a link to the evaluation job.

Quick Evaluation (Quick Evals) for Stored Completions

Quick Evaluation (Quick Evals) is a feature on Stored Completions that allows a user to rapidly assess model outputs with a minimal set-up. With a single click, user can assess model responses without launching a full evaluation job; there is no need to go through the traditional Evaluation flow of customizing test criteria or preparing prompts/responses, reducing manual work.

User can compare and evaluate the model responses against multiple models, without needing to add separate runs for each model. Quick Evals provides immediate feedback by showing scores and highlights for each output, helping users spot issues promptly. It is ideal for fast iteration, allowing instant insight into completion quality without navigating away from the Stored Completions interface.

Python Grader for Evaluation

We are introducing a new type of evaluation test criteria, Python grader. Python grader is a custom evaluation function to score model outputs automatically. The customizable Python code is used inspect specific parts of the model’s response, such as tool usage, structure, or content, and return a numeric score (typically between 0 and 1) based on the user’s pre-defined logic.

Python grader can be especially helpful in providing structured, programmable assessments of model outputs, and can be used for holistic model assessments when used in combination with other model-based or traditional graders.

Development Tier in General Availability

We’re excited to announce the promotion of Developer Tier to General Availability! Since our launch in Public Preview this past //Build (2025), we’ve worked to bring affordable fine-tuned model candidate hosting to all our Azure OpenAI regions as part of AI Foundry.

Today, in conjunction with Global Training (still in Public Preview), AI Foundry customers can complete the full train/test life cycle from over 25 different regions and with all the GPT-4.1-series models.

We’ve seen customers ship new models more confidently, knowing they have a deployment type supporting pre-production workloads, and are excited to continue building new features into the Fine Tuning “Developer Tier” experience in the coming months. Stay tuned because General Availability doesn’t mean we’re done!

In the meantime, learn how to take advantage of Developer Tier either via AI Foundry, Cognitive Services SDK, or the REST API.

New Course Now Pages Live for 1-Day, Non-Cert-Aligned ILTs

Check This Out! (CTO!) Guide (July 2025)

New Course Now Pages Live for 1-Day, Non-Cert-Aligned ILTs

Check This Out! (CTO!) Guide (July 2025)

RFT Observability (“Auto-Evals”)

What is RFT?

How does RFT Observability help?

Where can I find RFT Observability?

Quick Evaluation (Quick Evals) for Stored Completions

Python Grader for Evaluation

Development Tier in General Availability

Related posts

Enabling Open Data Sharing of Unity Catalog Assets with Microsoft Purview

Announcing General Availability of Azure E128 & E192 Sizes in the Esv6 and Edsv6-series VM Families

Announcing a flexible, predictable billing model for Azure SRE Agent