I want to show my agent a picture—Can I?

Secure by default: What’s new in SQL Server 2025 security

June 17, 2025

FOCUS 1.2 in Microsoft Cost Management: Unified multi-cloud, multi-currency reporting

June 17, 2025

Published by azurefeeds on June 17, 2025

Welcome to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a question inspired by real conversations in the AI developer community, offering practical advice and tips.

To kick things off, we’re tackling a common challenge for anyone experimenting with multimodal agents: working with image input.

Let’s dive in!

Dear Agent Support,

I’m building an AI agent, and I’d like to include screenshots or product photos as part of the input. But I’m not sure if that’s even possible, or if I need to use a different kind of model altogether. Can I actually upload an image and have the agent process it?

Great question, and one that trips up a lot of people early on! The short answer is: yes, some models can process images—but not all of them.

Let’s break that down a bit.

🧠 Understanding Image Input

When we talk about image input or image attachments, we’re talking about the ability to send a non-text file (like a .png, .jpg, or screenshot) into your prompt and have the model analyze or interpret it. That could mean describing what’s in the image, extracting text from it, answering questions about a chart, or giving feedback on a design layout.

🚫 Not All Models Support Image Input

That said, this isn’t something every model can do. Most base language models are trained on text data only, they’re not designed to interpret non-text inputs like images. In most tools and interfaces, the option to upload an image only appears if the selected model supports it, since platforms typically hide or disable features that aren’t compatible with a model’s capabilities. So, if your current chat interface doesn’t mention anything about vision or image input, it’s likely because the model itself isn’t equipped to handle it.

That’s where multimodal models come in. These are models that have been trained (or extended) to understand both text and images, and sometimes other data types too. Think of them as being fluent in more than one language, except in this case, one of those “languages” is visual.

🔎 How to Find Image-Supporting Models

If you’re trying to figure out which models support images, the AI Toolkit is a great place to start! The extension includes a built-in Model Catalog where you can filter models by Feature—like Image Attachment—so you can skip the guesswork.

Filter the AI Toolkit Model Catalog for models that support Image Attachment

Here’s how to do it:

Open the Model Catalog from the AI Toolkit panel in Visual Studio Code.

Click the Feature filter near the search bar.

Select Image Attachment.

Browse the filtered results to see which models can accept visual input.

Once you’ve got your filtered list, you can check out the model details or try one in the Playground to test how it handles image-based prompts.

🧪 Test Before You Build

Before you plug a model into your agent and start wiring things together, it’s a good idea to test how the model handles image input on its own. This gives you a quick feel for the model’s behavior and helps you catch any limitations before you’re deep into building.

You can do this in the Playground, where you can upload an image and pair it with a simple prompt like:

“Describe the contents of this image.” OR

“Summarize what’s happening in this screenshot.”

If the model supports image input, you’ll be able to attach a file and get a response based on its visual analysis. If you don’t see the option to upload an image, double-check that the model you’ve selected has image capabilities—this is usually a model issue, not a UI bug.

AI Toolkit Playground

🔁 Recap

Here’s a quick rundown of what we covered:

Not all models support image input—you’ll need a multimodal model specifically built to handle visual data.

Most platforms won’t let you upload an image unless the model supports it, so if you don’t see that option, it’s probably a model limitation.

You can use the AI Toolkit’s Model Catalog to filter models by capability—just check the box for Image Attachment.

Test the model in the Playground before integrating it into your agent to make sure it behaves the way you expect.

📺 Want to Go Deeper?

Check out my latest video on how to choose the right model for your agent—it’s part of the Build an Agent Series, where I walk through the building blocks of turning an idea into a working AI agent.

And if you’re looking to sharpen your model instincts, don’t miss Model Mondays—a weekly series that helps developers like you build your Model IQ, one spotlight at a time. Whether you’re just starting out or already building AI-powered apps, it’s a great way to stay current and confident in your model choices.

👉 Explore the series and catch the next episode: aka.ms/model-mondays/rsvp

And for all your general AI and AI agent questions, join us in the Azure AI Foundry Discord! You can find me hanging out there answering your questions about the AI Toolkit. I’m looking forward to chatting with you there!

Whatever you’re building, the right model is out there—and with the right tools, you’ll know exactly how to find it.

Secure by default: What’s new in SQL Server 2025 security

FOCUS 1.2 in Microsoft Cost Management: Unified multi-cloud, multi-currency reporting

Secure by default: What’s new in SQL Server 2025 security

FOCUS 1.2 in Microsoft Cost Management: Unified multi-cloud, multi-currency reporting

Welcome to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a question inspired by real conversations in the AI developer community, offering practical advice and tips.

🧠 Understanding Image Input

🚫 Not All Models Support Image Input

🔎 How to Find Image-Supporting Models

🧪 Test Before You Build

🔁 Recap

📺 Want to Go Deeper?

Related posts

MVP Collective Launches In-Depth Guide on SharePoint Content AI

Mastering Model Context Protocol (MCP): Building Multi Server MCP with Azure OpenAI

Building an MCP Server for Microsoft Learn