June Fine-Tuning Updates: Preference Alignment, Global Training, and More!

What If You Could Cut AI Costs by 60% Without Losing Quality?

July 8, 2025

Published by azurefeeds on July 8, 2025

GPT-4.1, GPT-4.1-mini support Direct Preference Optimization (DPO) 😍

GPT-4.1 and GPT-4.1-mini now support Direct Preference Optimization (DPO). DPO is a finetuning technique that adjusts model weights based on human preferences. You provide a prompt along with two responses: a preferred and non-preferred example; using this data, you can align a fine-tuned model to match your own style, preferences, or safety requirements.

Unlike Reinforcement Learning from Human Feedback (RLHF), DPO does not require fitting a reward model and uses binary preferences for training. This makes DPO computationally lighter and faster than RLHF while being equally effective at alignment.

Global Training, now available globally 🌎

Since Build, we’ve significantly expanded the availability of Global Training (public preview). If you’ve been waiting for support in your region, we’ve added another 12 regions! Look for additional features (pause/resume and continuous fine tuning) and models (gpt-4.1-nano) in the coming weeks.

New available regions:

East US

East US 2

North Central US

South Central US

Spain Central

Sweden Central

Switzerland North

Switzerland West

UK South

West Europe

West US

West US 3

Responses API now Supports Fine Tuned Models ☎️

Training is great- but inferencing is what matters most when you want to use your models! Responses API is the newest inferencing API. The Responses API is purpose built for agentic workflows: it supports stateful, multi-turn conversations and allows seamless tool calling, automatically stitching everything together in the background.

With this update, you can make better use of fine-tuned models in multi-agent workflows: after teaching your model what tools to use, and when, RAPI will keep track of conversations so the model can remember context, shows how the model is reasoning through its answers, and let users check progress while a response is being generated. It also supports background processing (so you don’t have to wait) and works with tools like web search and file lookup—making it great for building smarter, more interactive AI experiences.

What If You Could Cut AI Costs by 60% Without Losing Quality?

What If You Could Cut AI Costs by 60% Without Losing Quality?

GPT-4.1, GPT-4.1-mini support Direct Preference Optimization (DPO) 😍

Global Training, now available globally 🌎

Responses API now Supports Fine Tuned Models ☎️

Related posts

What If You Could Cut AI Costs by 60% Without Losing Quality?

Announcing: Microsoft transforms Licensing with Cloud Security and Confidential Computing

Summarize transferred calls in Teams with Copilot