Open AI’s gpt-oss models on Azure Container Apps serverless GPUs
August 7, 2025Welcome back to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a real question from the community with simple, practical guidance to help you build smarter agents.
Today’s question comes from someone who just discovered that “garbage-in, garbage-out” is very real.
💬 Dear Agent Support
I’m training an agent on a large CSV file, but I keep running into weird predictions. I suspect my data has missing values and other issues, but I don’t have time to spin up a Jupyter notebook just to poke around. Is there a faster way to explore and clean the data?
This looks like a job for quick data exploration! Before you feed any dataset into an agent (or a model, or a pipeline), you need to know exactly what’s in there. A few silent NaNs or out-of-range numbers can snowball into flaky evaluations, unexpected reasoning chains, or flat-out errors downstream.
🧠 What’s the Big Deal with Dirty Data?
When an agent relies on data that’s incomplete, inconsistent, or plain wrong, every downstream step inherits that problem. You will waste time debugging hallucinations that are actually caused by a stray “NULL” string, or re-running fine-tunes because of invisible whitespace in a numeric column. Even small quality issues can:
- Skew model evaluation metrics.
- Trigger exceptions in your application code.
- Undermine user trust when answers look obviously off.
The bottom line is that a five-minute inspection up front can save hours later.
🧩 When Should You Inspect Your Dataset?
Not every update calls for a full audit, but you should always explore your data when:
- You ingest a brand-new source (a CSV from a partner, a freshly landed Parquet file, etc.).
- You notice sudden drops in agent performance.
- You’re about to run an expensive job (fine-tuning, batch inference, evals).
Catch the gremlins early, and your agent (or your boss) will thank you.
📄 Diagnose Data Issues Fast
At a minimum, you want to answer three questions:
- Completeness: Which columns have nulls, blanks, or “N/A” strings?
- Distribution: Do numeric columns have outliers or impossible values?
- Consistency: Are text categories spelled the same way every time?
If you spot problems here, you can decide whether to drop rows, impute values, or standardize formats before you press “Train.”
🔧 Fix it in a Flash with VS Code Data Wrangler
What if you could do all that without leaving your editor? Enter the Data Wrangler extension for Visual Studio Code. It opens your CSV, Parquet, or Excel file in a no-code grid, and shows instant column stats. Removing empty columns / rows, and filtering out bad data takes just a few seconds thanks to its automated features and intuitive UX.
Here’s how to do it
- Install Data Wrangler from the Extensions Marketplace (or simply click the link and reload VS Code after the install).
- Right-click your data.csv (or .parquet, or .xls, or .jsonl) file and choose Open in Data Wrangler.
- Scan the Column Insights bar to spot nulls, errors, and uniques instantly.
- Filter rows (Add step → Filter), e.g., where host_location == ‘New York, NY’.
- Drop rows missing data in one click, e.g., where price is empty.
- Use Quick Aggregations to confirm ranges (mean, min, max) without writing code.
- Click Export As File to save the validated dataset.
Now you’ve got a clean, documented dataset you can trust, and your agent’s next fine-tune won’t trip over bad data.
🔁 Recap
- Bad data can sabotage your agent long before you see an error.
- A quick exploration pass to check completeness, distributions, and consistency catches most issues.
- The Data Wrangler extension in VS Code lets you explore, filter, aggregate, and fix data intuitively and quickly.
📺 Want to Go Deeper?
Watch Mastering your data with Data Wrangler in VS Code on YouTube for a tour of all the features Data Wrangler has to offer, and read more about Data Wrangler in VS Code’s Data Science docs.
Happy wrangling, and see you in the next installment of Agent Support!