What’s new in Content Governance in SharePoint, OneDrive, and Teams for AI era
May 8, 2025Enhancements in OneNote for the Web
May 8, 2025Introduction
With the rapid advancement of AI in healthcare, ensuring the quality and consistency of medical imaging datasets is crucial. This blog explores outlier detection in medical imaging, demonstrating how advanced embedding techniques and machine learning can uncover anomalies. By leveraging study-level embeddings, we demonstrate a proof of concept for anomaly detection among clinical studies.
Outlier detection plays a vital role in improving AI performance, enhancing diagnostic accuracy, and maintaining high data quality. As medical imaging datasets grow in complexity, these tools become indispensable. Ensuring high-quality data is not just a technical requirement; it directly impacts patient outcomes, research integrity, and the scalability of AI-driven systems. The process of detecting anomalies also contributes to the refinement of imaging protocols and promotes uniformity across multi-center studies, addressing challenges like equipment variability and differing imaging standards.
In this blog we describe an approach to building an outlier detection system using a powerful foundation model available in the Azure AI Foundry Model Catalog for Healthcare. The full working code sample is available in our Github repository.
Outlier Detection Walkthrough
The outlier detection process for medical imaging begins with leveraging the powerful MedImageInsight model, which is hosted on an AzureML managed endpoint. This entire pipeline, shown in Figure 1, starts by generating image-level embeddings for individual medical images, merging those embeddings into a single study-level vector, and then applying outlier detection methods to see which studies deviate from a reference set. Specifically, we feed images into the MedImageInsight model to obtain image level embeddings, use statistical methods to obtain study-level embeddings then use a K-Nearest Neighbors (KNN) to find outliers.
Collecting the Image-Level Embeddings
Our initial task is generating image-level embeddings. This is accomplished using the MedImageInsight model, which is deployed on an AzureML managed endpoint (deployment documentation). We use the “ParallelSubmitter” class generated from the MedImageInsight client, which is particularly useful because it manages batching, retry logic, parallelization, and preprocessing for submitting images to the endpoint. Moreover, it supports a variety of file formats which can be passed as filename or byte arrays, it also supports NumPy arrays.
Below is a code snippet demonstrating how to set up the submitter, submit images, and then capture results as they stream back in a generator:
client = MedImageInsightClient(endpoint)
submitter = client.create_submitter(return_as=”generator_unordered”)
rel_from = dicom_dir
with open(output_file, “w”) as f:
for index, result in submitter.submit(image_list=files, total=len(files)):
path = os.path.relpath(files[index], rel_from)
row = {
“path”: path,
“ref”: path.startswith(“ref”),
“test”: path.startswith(“test”),
“outlier”: path.startswith(“test/outlier”),
“StudyInstanceUID”: path.split(os.path.sep)[-3],
“SeriesInstanceUID”: path.split(os.path.sep)[-2],
**result,
}
print(json.dumps(row), file=f)
Each time an image is processed and returned by the client; the generator yields a new result which we can process. This generator approach enables extra processing steps on-the-fly (in this case, writing results to a JSONL file). The best part is that the MedImageInsight client abstracts away much of the complexity of using the model endpoint. It handles all the parallelization, retrying, and image preprocessing for you.
Converting the Image-Level Embeddings to Study-Level
Next, we need to transform these individual image embeddings into a single, comprehensive vector per study. The biggest issue here is that each study can contain a different number of images, so we end up with embeddings of varying lengths if we try to use the image level embeddings directly. Traditional distance metrics typically assume all vectors are the same size, so we can solve this by a study level embedding, specifically here we compute some basic statistics from image level embeddings across each study and concatenating those values.
In the code snippet below, we define a helper function called compute_study_features that calculates the mean, standard deviation, minimum, maximum, percentiles, and median across all images in a study:
def compute_study_features(x):
stats = {
“mean”: np.nanmean(x, axis=0),
“std”: np.nanstd(x, axis=0, ddof=1),
“min”: np.nanmin(x, axis=0),
“25%”: np.nanpercentile(x, 25, axis=0),
“50%”: np.nanmedian(x, axis=0),
“75%”: np.nanpercentile(x, 75, axis=0),
“max”: np.nanmax(x, axis=0),
}
return np.hstack(list(stats.values()))
The result is a single vector per study, which resolves the different-length issues. Figure 2 is a visualization of the study level embeddings. In this figure each row corresponds to a different study and each pixel corresponds to one scalar value in the study-level vector. We can see the vertical bands which indicate different statistical measures concatenated into the overall vector.
Exploring the Data
Let’s explore the data a bit to get a better sense of what the images look like before we just right into using the embeddings for outlier detection. First, we examine the inlier set. This set includes only CT-Abdomen studies, and Figure 3 shows a handful of examples. Each row corresponds to a single study, with one representative image per series. We notice some variations: some have a SCOUT series, others have multiple axial or coronal series. The volumes also cover slightly different anatomical sections, even though they are all CT Abdomen studies.
|
Next, we have a test set that’s much more diverse. Figure 4 displays a sample of these images. The test set spans everything from MRI to radiographs to nuclear medicine and includes other CT body parts like CT-Chest. Some studies are similar to the reference set, but others are drastically different in modality or view. The end goal is to identify which of these are “outliers” compared to the reference CT-Abdomen data.
|
|
|
|
Finding the Outliers
With the reference set embeddings established and the test set embeddings ready, we fit a simple K-Nearest Neighbors (KNN) model (from scikit-learn) on the inlier reference data. We use this to calculate the distance between a new study and the two studies in the reference set. If that distance is above a certain threshold (“operating point”), we classify the input study as an outlier.
Our test set contains both inliers and outliers, so we can compute metrics like the ROC curve to see how well the method separates them. By finding the optimal threshold on the ROC curve, we decide a maximum allowed distance to label a study as an inlier (or minimum distance to label it an outlier). Using the distances calculated for each study in the test set, we can construct a confusion matrix to evaluate the accuracy of our detection model.
Below in Figure 5, you can see an example of the ROC curve with the chosen operating point, alongside the resulting confusion matrix that breaks down true positives, false positives, and so on.
|
|
Visualizing Results
A great way to visualize the relationships among our embeddings is to use a 2D projection technique like Uniform Manifold Approximation and Projection (UMAP), which effectively highlights clustering patterns and the distribution of data points. In Figure 6, we see the UMAP of the study-level vectors to two dimensions. You can see how the reference studies (blue diamonds) form a tight cluster, with most inlier test studies falling near them, and outliers scattered further away.
Figure 6. UMAP of study features showing reference points, accepted inliers, rejected outliers, and missed outliers.
Interestingly, there’s one “missed” outlier that appears close to the reference cluster. Let’s take a closer look.
Figure 7 shows images from this study, while Figure 8 highlights similar neighbors to the missed outlier, illustrating classification challenges. The outlier resembles two reference studies with less abdominal coverage, likely causing misclassification. Focusing on tighter clustering of reference images with consistent anatomical coverage could resolve this issue.
Hopefully, this walkthrough gave a clear sense of how embeddings and clustering can help identify patterns and anomalies in medical imaging, providing a solid starting point for outlier detection. While this approach lays the groundwork, it’s important to validate these methods with your own data and workflows. With that in mind, let’s consider how these techniques could be applied to real-world scenarios and help tackle unique challenges in research.
Practical Considerations
After exploring the detailed steps and results from the walkthrough, it is important to note that while the walkthrough and notebook provide a useful example, you should validate these techniques against your own datasets and processes. With that in mind, how can these findings be applied to real-world scenarios, and what challenges might you encounter along the way? Outlier detection has practical implications across research, AI workflows, and clinical settings, each with unique considerations and opportunities for improvement.
Research Settings
Automating quality control for imaging datasets is a critical step in modern medical research. Using outlier detection, researchers can ensure only high-quality images are included in analyses, significantly reducing variability and enhancing reproducibility. By identifying and addressing inconsistencies across datasets, such as differences in imaging protocols or equipment, this approach promotes standardization in multi-center studies. Moreover, automating these processes reduces the manual workload, allowing researchers to focus on deeper insights rather than mundane tasks.
However, challenges remain. Ensuring robust embeddings that work across diverse imaging modalities requires careful selection of reference sets. Additionally, scaling workflows to handle large, multi-center datasets without compromising speed or accuracy is essential for long-term success. If you are using these models on Azure Machine Learning or AI Foundry, you can easily scale the endpoints to handle high throughputs while maintaining very low downtime.
AI Workflows
Pre-filtering low-quality images is one of the most effective ways to enhance AI model performance. By eliminating noisy or irrelevant data at the outset, the workflow ensures only meaningful inputs are processed, leading to more accurate predictions. Automated quality checks minimize the need for reprocessing, saving time and computational resources. Categorizing data by quality or relevance further optimizes resource allocation, preventing bottlenecks and improving overall efficiency. With this strategy, AI pipelines can scale seamlessly to accommodate larger datasets without compromising performance.
Striking the right balance between sensitivity and specificity remains an ongoing challenge. Detecting all outliers without flagging false positives is critical, particularly in clinical workflows where incorrect classifications could delay patient care or lead to unnecessary investigations.
Clinical Validation
In clinical environments, ensuring protocol compliance is important. Outlier detection can flag studies that deviate from standardized protocols, helping technicians make real-time corrections. This reduces variability in imaging datasets, ensuring consistency and accuracy in diagnostic processes. Moreover, prioritizing high-quality studies streamlines clinical workflows, enhancing patient throughput and reducing delays. By automating these validations, healthcare providers can focus more on patient care while maintaining rigorous quality standards.
Future advancements in adaptive feedback loops could allow real-time learning from new data, further improving workflow efficiency and accuracy. Integrating multi-modal data, such as combining imaging data with clinical and genomic information, has the potential to offer a more holistic view of anomalies, paving the way for groundbreaking insights in precision medicine.
Conclusion
Outlier detection in medical imaging offers transformative potential for healthcare AI. By combining advanced embedding techniques with robust machine learning methods, this workflow ensures high data quality, enhances diagnostic accuracy, and optimizes clinical operations. This approach is not limited to research or operational settings; it also holds promise for real-time clinical applications, such as monitoring imaging protocols and prioritizing patient studies. The combination of automation and interpretability makes this workflow an asset in modern medical imaging.
We encourage readers to explore the accompanying GitHub notebook to implement these concepts and adapt them for their projects. By leveraging the outlined methods and tools, you can contribute to advancing the state of medical imaging AI, ensuring better outcomes for patients and more efficient workflows for healthcare providers.