Distributed Finetuning, and Inference with NeMo-Run on Azure CycleCloud Workspace for Slurm

Lockheed Martin & Librestream boost mission readiness with secure video collaboration platform

June 19, 2025

Exploring the Extensibility of ADMS Portal Customizations

June 19, 2025

Published by azurefeeds on June 19, 2025

Introduction

The NVIDIA NeMo Framework is a scalable, cloud-native generative AI framework designed to support the development of Large Language Models (LLMs) and Multimodal Models (MMs). The NeMo Framework provides comprehensive tools for efficient training, including Supervised Fine-Tuning (SFT) and Parameter Efficient Fine-Tuning (PEFT).

One notable tool within the framework is NeMo-Run, which offers an interface to streamline the configuration, execution, and management of experiments across various computing environments. This functionality encompasses the ability to launch jobs locally on workstations or on large clusters, either enabled with SLURM or Kubernetes within a cloud environment.

In this blog post, we will demonstrate how you can use CycleCloud Workpsace for Slurm (CCWS) for distributed finetuning and inference with NeMo-Run.

Why Azure CycleCloud Workspace for Slurm?

CCWS clusters come with enroot and pyxis pre-configured, facilitating the execution of workloads in containerized environments.

CCWS offers integration with Open OnDemand and includes an application for running VSCode against the HPC Cluster in web mode. This application can be used to run Jupyter Notebooks.

Using Jupyter Notebooks with NeMo-Run allows for real-time experimentation, visual insight into results, and structured logging—enhancing collaboration and ensuring reproducible workflows.

Get Started

The full example and setup instructions are available in the recently announced AI Infrastructure on Azure repository. The NeMo-Run example includes:

A script to create a Python virtual environment with the required packages to use NeMo-Run.

Detailed instructions and guidance for a deployment of CCWS with Open OnDemand integration.

A custom module for a NeMo-Run Slurm Executor.

A Jupyter Notebook with simple examples that highlight how NeMo-Run recipes can be used to run, track, evaluate, and reproduce experiments.

This example, highlighted in the attached video, demonstrates the use of NeMo-Run recipes for fine-tuning and inference within a SLURM cluster environment. Since NeMo-Run also provides recipes for pre-training, this guide can similarly serve as a reference for those tasks. Note that while an inference example is included, it is intended solely for quick model evaluation. For production-scale or batch inference, it is recommended to adapt the model into a dedicated inference pipeline or service.

Conclusion

By combining NVIDIA’s NeMo-Run framework with Azure CycleCloud Workspace for Slurm, you gain a powerful, flexible, and repeatable setup for training and fine-tuning large language models at scale. This example offers a practical and extensible foundation that can be used in combination with the recommendations from the AI Infrastructure on Azure repository to build large-scale and reproducible AI workflows today.

Lockheed Martin & Librestream boost mission readiness with secure video collaboration platform

Exploring the Extensibility of ADMS Portal Customizations

Lockheed Martin & Librestream boost mission readiness with secure video collaboration platform

Exploring the Extensibility of ADMS Portal Customizations

Introduction

Why Azure CycleCloud Workspace for Slurm?

Get Started

Conclusion

Related posts

Exploring the Extensibility of ADMS Portal Customizations

Lockheed Martin & Librestream boost mission readiness with secure video collaboration platform

Announcing Public Preview of the Root Cert API in App Service Environment v3