Building Application with gpt-oss-20b with AI Toolkit

What’s new in Microsoft Security Copilot

August 12, 2025

A Deep Dive into Spark UI for Job Optimization

August 12, 2025

Published by azurefeeds on August 12, 2025

Understanding gpt-oss

OpenAI has released gpt-oss-120b and gpt-oss-20b, the first open-weight language models since GPT-2. Both models utilize mixture-of-experts (MoE) architecture with MXFP4 quantization, delivering exceptional reasoning capabilities and tool use functionality. gpt-oss-120b features 117 billion parameters with 5.1B active per token, runs on a single H100 GPU (80GB memory), and matches OpenAI o4-mini performance. gpt-oss-20b contains 21 billion parameters with 3.6B active per token, requires only 16GB memory, making it ideal for consumer hardware and edge devices.

Both models support 128k context length, full chain-of-thought reasoning, structured outputs, and agentic workflows. Released under Apache 2.0 license, they allow free commercial use, modification, and redistribution. Compatible with multiple inference frameworks including vLLM, Ollama, and Transformers, plus cloud platforms like Azure AI Foundry and Hugging Face.

Rigorously safety-tested across biological, chemical, and cybersecurity domains, these models provide developers and enterprises with flexible, controllable AI solutions for local deployment, cloud hosting, or edge computing applications. Readme more at https://openai.com/index/introducing-gpt-oss/

Developers can build applications around gpt-oss-20b in a local environment, and the tool is the AI Toolkit for Visual Studio Code Extension. You can deploy models, test them, create agents, and more. Let’s take a look at these scenarios.

Deployment

Deploy gpt-oss-20b from the AI Toolkit Model Catalog to your local environment

System Requirements

Before beginning deployment, ensure your development environment meets these requirements:

Hardware: GPU with 16GB+ VRAM

AI Toolkit for Visual Studio Code Extension

Deployment Steps

Access AITK Model Catalog
After installing the AI Toolkit for VS Code Extension, open the Model Catalog through the Command Palette (Ctrl+Shift+P). Locate gpt-oss-20b in the catalog and click the “Add Model” button.

Initialize Deployment
AI Toolkit will automatically download model files and perform local deployment. The entire process typically takes 15-30 minutes.

Verify Deployment
Once deployment is complete, you can view the gpt-oss-20b runtime status in AI Toolkit’s model management interface.

Note: CPU-only deployment will be available in future releases. Currently, only GPU-accelerated deployment is supported.

Local deployment using Ollama and AI Toolkit

In addition to directly deploying gpt-oss-20b in the onnx format in the AI Toolkit’s Model Catalog, you can also deploy gpt-oss-20b in the gguf format using Ollama. Ollama provides flexible API and integration with various development frameworks. Developers can more quickly test and call Ollama models in the AI Toolkit. The following are the steps to integrate Ollama into the AI Toolkit:

Install Ollama Follow the standard Ollama installation process for your operating system.

Run gpt-oss-20b Model

ollama run gpt-oss

3. Add Ollama gpt-oss-20b in AI Toolkit My Resources

Added successfully，gpt-oss-20b with Ollama in AI Toolkit Resources

AI Toolkit Integration to test gpt-oss-20b

As mentioned above, in AI Toolkit, we’re not just focused on local model deployment. Next, we can test the model. After all, in business scenarios, the generated content of the model is important. Using AI Toolkit’s Playground, we can compare model results. For example, in a programming scenario, we can try comparing gpt-oss-20b and qwen3-coder.

Configure Comparison Experiment
- gpt-oss-20b (locally deployed)
- Qwen3-Coder (locally deployed)

Enable “Model Comparison” mode in the Playground and select:

Code Generation Test Case
Test Prompt: “Creating an HTML5 Teris application”

Creating Agents with gpt-oss-20b

AI agents are a popular technology. Besides creating applications using cloud-based LLM, we can also create agents locally. Especially in development scenarios, we can more conveniently combine local models to create AI agent prototypes and applications.

AITK’s Agent Builder is a visual intelligent agent construction tool that enables developers to rapidly create agent applications powered by gpt-oss-20b. You can combine MCP (Model Control Protocol) services to build sophisticated agents based on gpt-oss-20b.

Conclusion

The AI Toolkit enables local deployment, testing, and application evaluation of newly released models, such as gpt-oss-20b. This accelerates the integration of models and application scenarios, enabling the latest intelligent applications to meet the needs of diverse enterprise scenarios.

Resources

Learn more about AITK https://aka.ms/aitoolkit

Learn more about gpt-oss https://openai.com/index/introducing-gpt-oss/

OpenAI’s open‑source model: gpt‑oss on Azure AI Foundry and Windows AI Foundry https://azure.microsoft.com/en-us/blog/openais-open‑source-model-gpt‑oss-on-azure-ai-foundry-and-windows-ai-foundry/

What’s new in Microsoft Security Copilot

A Deep Dive into Spark UI for Job Optimization

What’s new in Microsoft Security Copilot

A Deep Dive into Spark UI for Job Optimization

Understanding gpt-oss

Deployment

Deploy gpt-oss-20b from the AI Toolkit Model Catalog to your local environment

System Requirements

Deployment Steps

Local deployment using Ollama and AI Toolkit

AI Toolkit Integration to test gpt-oss-20b

Creating Agents with gpt-oss-20b

Conclusion

Resources

Related posts

Azure App Testing: Playwright Workspaces for Local-to-Cloud Test Runs

Microsoft 365 Champion community break in August

🚀 General Availability: Enhanced Data Mapper Experience in Logic Apps (Standard)