
Cost-Optimised Add-on Scaling in AKS: Right-Size Your System Add-ons (Preview)
June 21, 2025General Availability: Key Attestation for Azure Managed HSM
June 21, 2025Selecting the right model for your AI application is more than a technical decision—it’s a foundational step in ensuring trust, compliance, and governance in AI. Today, we are excited to announce the public preview of safety leaderboards within Foundry model leaderboards, helping customers incorporate model safety as a first-class criterion alongside quality, cost, and throughput. This feature introduces three key components to support responsible AI development:
- A dedicated safety leaderboard highlighting the safest models;
- A quality–safety trade-off chart to balance performance and risk;
- Five new scenario-specific leaderboards supporting diverse responsible AI scenarios.
Prioritize safety with the new leaderboard
The safety leaderboard ranks the top models based on their robustness against generating harmful content. This is especially valuable in regulated or high-risk domains—such as healthcare, education, or financial services—where model outputs must meet high safety standards.
To ensure benchmark rigor and relevance, we apply a structured filtering and validation process to select benchmarks. A benchmark qualifies for onboarding if it addresses high-priority risks. For safety and responsible AI leaderboards, we look at different benchmarks that can be considered reliable enough to provide some signals on the targeted areas of interest as they relate to safety. Our current safety leaderboard uses the HarmBench benchmark which includes prompts to illicit harmful behaviors from models. The benchmark covers 7 semantic categories of behaviors:
- Cybercrime & Unauthorized Intrusion
- Chemical & Biological Weapons/Drugs
- Copyright Violations
- Misinformation & Disinformation
- Harassment & Bullying
- Illegal Activities
- General Harm
These 7 categories are organized into three broader functional groupings:
- Standard Harmful Behaviors
- Contextual Harmful Behaviors
- Copyright Violations
Each grouping is featured in a separate responsible AI scenario leaderboard. We use the prompts evaluators from HarmBench to calculate Attack Success Rate (ASR) and aggregate them across the functional groupings to proxy model safety. Lower ASR values means that a model is more robust against attacks to illicit harmful content.
We understand and acknowledge that model safety is a complex topic and has several dimensions. No single current open-source benchmark can test or represent the full spectrum of model safety in different scenarios. Additionally, most of these benchmarks suffer from saturation, or misalignment between benchmark design and the risk definition, can lack clear documentation on how the target risks are conceptualized and operationalized, making it difficult to assess whether the benchmark accurately captures the nuances of the risks. This can lead to either overestimating or underestimating model performance in real-world safety scenarios. While HarmBench dataset covers a limited set of harmful topics, it can still provide a high-level understanding of safety trends.
Navigate trade-offs with the quality-safety chart
Model selection often involves compromise across multiple criteria. Our new quality–safety trade-off chart helps you make informed decisions by comparing models based on their performance in safety and quality. You can:
- Identify the safest model measured by Attack Success Rate (lower is better) at a given level of quality performance;
- Or choose the highest-performing model in quality (higher is better) that still meets a defined safety threshold.
Together with the quality-cost trade-off chart, you would be able to find the best trade-off between quality, safety, and cost in selecting a model:
Scenario-based responsible AI leaderboards
To support customers’ diverse responsible AI scenarios, we have added 5 new leaderboards to rank the top models in safety and broader responsibility AI scenarios. Each leaderboard is powered by industry-standard public benchmarks covering:
- Model robustness against harmful behaviors using HarmBench in 3 scenarios, targeting standard harmful behaviors, contextually harmful behaviors, and copyright violations:
Consistent with the safety leaderboard, lower ASR scores for a model mean better robustness against generating harmful content.
- Model ability to detect toxic content using the Toxigen benchmark:
This benchmark targets adversarial and implicit hate speech detection. It contains implicitly toxic and benign sentences mentioning 13 minority groups. Higher accuracy based on F1-score for a model means its better ability to detect toxic content.
- Model knowledge of sensitive domains including cybersecurity, biosecurity, and chemical security, using the Weapons of Mass Destruction Proxy benchmark (WMDP):
A higher accuracy score for a model denotes more knowledge of dangerous capabilities.
These scenario leaderboards allow developers, compliance teams, and AI governance stakeholders to align model selection with organizational risk tolerance and regulatory expectations.
Building Trustworthy AI Starts with the Right Tools
With safety leaderboards now available in public preview, Foundry model leaderboards offer a unified, transparent, and data-driven foundation for selecting models that align with your safety requirements. This addition empowers teams to move from ad hoc evaluation to principled model selection—anchored in industry-standard benchmarks and responsible AI practices.
To learn more, explore the methodology documentation and start building AI solutions you—and your stakeholders—can trust.