What’s new in Microsoft Security Copilot
August 12, 2025A Deep Dive into Spark UI for Job Optimization
August 12, 2025OpenAI has released the open-source models gpt-oss-20b and gpt-oss-120B. Enterprises and developers can now deploy these models on edge devices without relying on cloud APIs, enabling local deployment.
The AI Toolkit for VS Code extension provides developers with a complete development workflow—from model testing and local deployment to building intelligent agent applications—creating a comprehensive AI application development workflow. We used the AI Toolkit in conjunction with gpt-oss-20b to build local AI applications.
Understanding gpt-oss
OpenAI has released gpt-oss-120b and gpt-oss-20b, the first open-weight language models since GPT-2. Both models utilize mixture-of-experts (MoE) architecture with MXFP4 quantization, delivering exceptional reasoning capabilities and tool use functionality. gpt-oss-120b features 117 billion parameters with 5.1B active per token, runs on a single H100 GPU (80GB memory), and matches OpenAI o4-mini performance. gpt-oss-20b contains 21 billion parameters with 3.6B active per token, requires only 16GB memory, making it ideal for consumer hardware and edge devices.
Both models support 128k context length, full chain-of-thought reasoning, structured outputs, and agentic workflows. Released under Apache 2.0 license, they allow free commercial use, modification, and redistribution. Compatible with multiple inference frameworks including vLLM, Ollama, and Transformers, plus cloud platforms like Azure AI Foundry and Hugging Face.
Rigorously safety-tested across biological, chemical, and cybersecurity domains, these models provide developers and enterprises with flexible, controllable AI solutions for local deployment, cloud hosting, or edge computing applications. Readme more at https://openai.com/index/introducing-gpt-oss/
Developers can build applications around gpt-oss-20b in a local environment, and the tool is the AI Toolkit for Visual Studio Code Extension. You can deploy models, test them, create agents, and more. Let’s take a look at these scenarios.
Deployment
Deploy gpt-oss-20b from the AI Toolkit Model Catalog to your local environment
System Requirements
Before beginning deployment, ensure your development environment meets these requirements:
- Hardware: GPU with 16GB+ VRAM
- AI Toolkit for Visual Studio Code Extension
Deployment Steps
- Access AITK Model Catalog
After installing the AI Toolkit for VS Code Extension, open the Model Catalog through the Command Palette (Ctrl+Shift+P). Locate gpt-oss-20b in the catalog and click the “Add Model” button. - Initialize Deployment
AI Toolkit will automatically download model files and perform local deployment. The entire process typically takes 15-30 minutes. - Verify Deployment
Once deployment is complete, you can view the gpt-oss-20b runtime status in AI Toolkit’s model management interface.
Note: CPU-only deployment will be available in future releases. Currently, only GPU-accelerated deployment is supported.
Local deployment using Ollama and AI Toolkit
In addition to directly deploying gpt-oss-20b in the onnx format in the AI Toolkit’s Model Catalog, you can also deploy gpt-oss-20b in the gguf format using Ollama. Ollama provides flexible API and integration with various development frameworks. Developers can more quickly test and call Ollama models in the AI Toolkit. The following are the steps to integrate Ollama into the AI Toolkit:
- Install Ollama Follow the standard Ollama installation process for your operating system.
- Run gpt-oss-20b Model
ollama run gpt-oss
3. Add Ollama gpt-oss-20b in AI Toolkit My Resources
Added successfully,gpt-oss-20b with Ollama in AI Toolkit Resources
AI Toolkit Integration to test gpt-oss-20b
As mentioned above, in AI Toolkit, we’re not just focused on local model deployment. Next, we can test the model. After all, in business scenarios, the generated content of the model is important. Using AI Toolkit’s Playground, we can compare model results. For example, in a programming scenario, we can try comparing gpt-oss-20b and qwen3-coder.
- Configure Comparison Experiment
- gpt-oss-20b (locally deployed)
- Qwen3-Coder (locally deployed)
- Enable “Model Comparison” mode in the Playground and select:
- Code Generation Test Case
Test Prompt: “Creating an HTML5 Teris application”
Creating Agents with gpt-oss-20b
AI agents are a popular technology. Besides creating applications using cloud-based LLM, we can also create agents locally. Especially in development scenarios, we can more conveniently combine local models to create AI agent prototypes and applications.
AITK’s Agent Builder is a visual intelligent agent construction tool that enables developers to rapidly create agent applications powered by gpt-oss-20b. You can combine MCP (Model Control Protocol) services to build sophisticated agents based on gpt-oss-20b.
Conclusion
The AI Toolkit enables local deployment, testing, and application evaluation of newly released models, such as gpt-oss-20b. This accelerates the integration of models and application scenarios, enabling the latest intelligent applications to meet the needs of diverse enterprise scenarios.
Resources
- Learn more about AITK https://aka.ms/aitoolkit
- Learn more about gpt-oss https://openai.com/index/introducing-gpt-oss/
- OpenAI’s open‑source model: gpt‑oss on Azure AI Foundry and Windows AI Foundry https://azure.microsoft.com/en-us/blog/openais-open‑source-model-gpt‑oss-on-azure-ai-foundry-and-windows-ai-foundry/