New data lake in Microsoft Sentinel

Validate Updates Early by Staggering Maintenance in Azure PostgreSQL

July 30, 2025

Spectral: The API Linting Tool You Need in Your Workflow 🔎

July 30, 2025

Published by azurefeeds on July 30, 2025

Store security data for up to 12 years.

Perfect for long-term investigations and compliance. Check out our new data lake in Microsoft Sentinel.

Streamline your data strategy.

Send high-volume logs to the new low-cost data lake tier and control retention per table. See it here.

Detect threats and trigger blocks.

Schedule automated queries using Microsoft Sentinel jobs and notebooks. Start here.

QUICK LINKS:

00:00 — Microsoft Sentinel Data Lake

01:49 — Data Management

02:46 — Table Management

03:36 — Data Lake exploration

04:17 — Advanced Hunting

05:23 — Query retention data

06:16 — Automate threat detection

07:18 — Move from reactive to predictive

08:50 — Wrap up

Link References

Check out https://aka.ms/SentinelDataLake

Unfamiliar with Microsoft Mechanics?

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.

Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries

Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog

Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast

Keep getting this insider knowledge, join us on social:

Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/

Enjoy us on Instagram: https://www.instagram.com/msftmechanics/

Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics

Video Transcript:

-Microsoft Sentinel, our industry-leading SIEM, now has a brand-new unified data lake, and this changes everything. That’s because your ability to detect and respond to security threats in your organization is only as good as the visibility and longevity of your data. It means being able to look across your digital estate at scale as it produces terabytes of logs, assets, and alerts, and then correlate signals going as far back in time as needed to pinpoint security events. The trick is being able to do this efficiently without being constrained by storage costs or siloed tools, and that’s what we’ve solved for.

-Our new data lake takes an open-format approach to bring all your security data together, centrally leveraging our expanding set of connectors, and you can even mirror data from on-premises and non-Microsoft cloud data sources at hyperscale without migrating it. Because storage and compute are decoupled, you can store massive volumes of data affordably up to 12 years, and query it using experiences, such as KQL and Notebooks, to run advanced analytics, machine learning, and forensic investigations all from a single copy of data.

-This means you can now bring in high-volume, low fidelity, and long-term data, like firewall logs, that previously were not practical to keep, so that you now have the data to discover low and slow attacks happening under the radar, as attackers take weeks to months before striking, and disrupt them with Microsoft Sentinel’s built-in threat intelligence, automation, and AI, all in one single solution. Let me start by showing you the new core capabilities. From Microsoft Defender, you can now connect to a Sentinel Workspace, bringing everything you need to investigate security threats in one place. Once connected, Microsoft Sentinel appears in your left navigation pane with a full set of new experiences, including our unified data lake capabilities.

-Let’s start with Data Management, which is now available directly in the Defender portal. This is where you’ll typically begin by connecting your logging sources using our expanding set of connectors. It works with your existing connectors and gives you new flexibility to cost-effectively store data well beyond the previous 90-day default. And, for the first time, you can mirror data from Microsoft sources, alongside external sources like the AWS S3 and CISCO Network logs shown here, into the data lake. And as mentioned, using Table Management, you can manage precisely where and how you store data and manage retention. From the new Table Management page, you can continue sending data to the Analytics tier as before, and it will now automatically be mirrored to the new lake.

-If I click on Manage Table I can change retention settings, so now it’s easy to send what you want directly to the data lake tier, which is ideal for high-volume, low-fidelity data, like these firewall logs. And from here, I can also stipulate the retention period for this table. And soon, you’ll also be able to split the data between the two tiers, giving you full control over cost and performance, which gives you more flexibility and is more streamlined than using siloed approaches. Next, Data Lake exploration is where you’ll interact with your data in the lake. Using the KQL queries tab, you can run KQL queries against any data in the lake. And from here, you can view all of the tables and schema to help you author your queries.

-Then, if I move over to the Jobs tab, this lets you use automation with your KQL and Notebook jobs to run them on a schedule. And if I filter to the job that I want, and select this Password Spray Analyzer job, this will let me query the lake and even store its output in a table and promote those insights into the Analytics tier where I can create alerts and detections. Now, let me show you how these capabilities help improve your threat investigation and response, and I’ll stay in the context of a password spray attack, where the initial activity could have happened months ago. In the Microsoft Defender portal, I’ll start in Advanced Hunting, and I’ll use Security Copilot.

-To save time, I’ll paste in my prompt: “To create a KQL query to detect slow password spray attacks within the last 90 days.” And I’ve added a few more instructions on other attributes that I’m looking for so that I can assess if the attack is affecting multiple users. And Copilot takes a moment to generate the query. And I can move its query over to the editor, where I can make changes like this one to the failed attempts threshold. I’ll run it, and based on these rows, like this first one where 39 users had 807 failed attempts over 90 days using this same IP address, we can see that this is definitely a password spray pattern, as are the instances in the rows below that.

-So far, I’ve only gone back 90 days, but I want to better understand when the attack first started. This time, I’ll change my query to go further back in time by leveraging the data lake. I’ll use Data Lake exploration and start in KQL Queries. This is where I can query our longer-term retention data, which again can be up to 12 years old. I’ll paste in the same query. And in this case, we worked on a similar incident 12 months ago, so let’s see if they are connected. So I’ll adjust the dates to the timing of that incident. I have a custom range from August 15, 2024 to September 10, 2024, and I’ll save that. And now I’ll go ahead and run it. And that takes a moment.

-Now, I see a clear pattern. Again, multiple accounts targeted from the same network infrastructure, each with a low volume of failed attempts. So I can see the attack was active even then and indeed was the same attacker. It looks like this attacker is persistent, so we should set up logic to see which new IPs and domains they’ll be using moving forward, then update our protections to block new attempts as they move to different infrastructure. To do that, in Jobs, first we need to capture Threat Intelligence matching data against our Cisco network and sign-in logs in the lake, so I’ll create a Job. I just need to give it a name and description, then select a workspace, and create a new table in this case, and I’ll call it “CiscoDailyLog.”

-Next, I’ll paste in my query. Here, I can choose to run it once, or automatically run it as a scheduled job. This will promote the output to the analytics tier. And from there, we’ll be able use those insights to automate blocks in Microsoft Entra and our firewall. To get that going, I’ve selected my start time, then I just need to confirm, and submit. So I traced post-breach data to find the root cause, enabling dynamic and proactive defenses.

-Next, by applying data science and machine learning to the data lake, SOC analysts can work together with data scientists to move from reactive to predictive insights. As a data scientist, in VS Code, I’ve installed the new Microsoft Sentinel extension from the marketplace. This uses the same single copy of data that’s in the lake. Here, at the Notebook, I can author queries in-line, and in this case, I even used GitHub Copilot for that. I don’t need to worry about provisioning compute, since it’s all managed. I’ve also already run this Python query using the Microsoft Sentinel Provider library. I’ve used popular machine learning libraries to train a user sign-in anomaly insights model.

-Notebooks are also great for in-line visualizations, and I’ve created a scatter plot chart to see deviations from baseline user sign in behavior, like sign-ins from unusual IP ranges, login attempts outside normal hours, or unexpected device types. In this case, the red dots represent significant deviation from expected user login behavior. And if I scroll down a little more, I can see that they are captured in this output. Now I have the insights that I need. And of course, from there, I’d just put those anomalous sign-ins into the analytics tier and use that information to generate predictive blocks. And as I showed before, Notebooks can also be rerun as Jobs to automate the process.

-So that’s how Microsoft Sentinel, our industry-leading SIEM, and its brand-new unified data lake expands your visibility so that you can act on new and existing threats, helping you to detect, mitigate, and disrupt them faster. To learn more, check out aka.ms/sentineldatalake. Keep checking back to Microsoft Mechanics for the latest tech updates, and thanks for watching.

Validate Updates Early by Staggering Maintenance in Azure PostgreSQL

Spectral: The API Linting Tool You Need in Your Workflow 🔎

Validate Updates Early by Staggering Maintenance in Azure PostgreSQL

Spectral: The API Linting Tool You Need in Your Workflow 🔎

Store security data for up to 12 years.

Streamline your data strategy.

Detect threats and trigger blocks.

QUICK LINKS:

Link References

Unfamiliar with Microsoft Mechanics?

Keep getting this insider knowledge, join us on social:

Video Transcript:

Related posts

Enabling Open Data Sharing of Unity Catalog Assets with Microsoft Purview

Announcing General Availability of Azure E128 & E192 Sizes in the Esv6 and Edsv6-series VM Families

Announcing a flexible, predictable billing model for Azure SRE Agent