[In preview] Public Preview: Encrypt Premium SSD v2 and Ultra Disks with Cross Tenant Customer Managed Keys
June 13, 2025MS-721 ILT Course Update – July 2025
June 13, 2025
1. Introduction
SharePoint uses a known object hierarchy to store your files, including, in order:
- Site Collection (site, SPSite)
- Web (site, subsite, SPWeb)
- List (SPList, Document Library, SPDocumentLibrary)
- Folder (SPFolder)
- List Item (documents, files, SPListItem, SPFile)
If you are not familiar with all those, you can read more about them at What is in the Permissions dataset?
2.What we have in MGDC for SharePoint
The SharePoint Sites dataset in Microsoft Graph Data Connect offers a list of all sites (site collections) in your tenant. The SharePoint Files will give you details about each file (document). However, there are no MGDC datasets that will give you one object per SharePoint Web (subsite) or SharePoint List (library).
There is a workaround, though. If you pay attention to the schema for the SharePoint Sites and SharePoint Files datasets in MGDC, you will find that Sites includes a few interesting details about the Root Web (the main web in the site collection) and the total number of webs (subsites) in each site. Also, the Files dataset includes details about the web (subsite) and the list (library) where the file lives. You could group on these columns to expose more information about webs and lists.
3. One Web
These days, if you create a modern SharePoint site, you will get one web (called the Root Web) and no obvious mechanism to create subsites or additional webs. For any sites created in the last few years, you likely have no additional webs outside the Root Web.
Your SharePoint might have existed for many years and your company could have sites using old templates with multiple subsites. If you’re trying to find out if that’s the case for your tenant, the information is available in MGDC, if you are willing to dig a little deeper.
4. Finding Old Subsites
The main task here is to create a custom query to join the Sites and Files datasets. You would then group that in a particular way to enumerate each subsites (Webs). You might also tag the Webs that are a Root Web, so that you can identify subsites that are not at the top of the Site hierarchy.
If you pulled the Sites and Files datasets into a SQL Server or a Lakehouse in Microsoft Fabric, this is what that query could look like:
SELECT
Webs.SiteId,
Sites.RootWeb.Title AS SiteTitle,
Sites.Url AS SiteUrl,
Sites.WebCount,
Webs.WebId,
CASE
WHEN Root.RootWeb.Id IS NOT NULL
THEN 1
ELSE 0
END AS IsRootWeb,
Webs.FileCount,
Webs.TotalSize,
Webs.TotalSizeWithVersions
FROM (
SELECT
SiteId,
WebId,
COUNT(*) AS FileCount,
SUM(SizeInBytes) AS TotalSize,
SUM(SizeInBytesWithVersions) AS TotalSizeWithVersions
FROM Files
GROUP BY SiteId, WebId
) AS Webs
LEFT JOIN Sites AS Sites
ON Webs.SiteId = Sites.Id
LEFT JOIN Sites AS Root
ON Webs.SiteId = Root.Id
AND Webs.WebId = Root.RootWeb.Id
4. Empty Webs, Web properties
This technique will get you a nice list of Webs, but there are a few problems.
First, if a specific web has no files, we won’t see that web in the Files dataset and therefore the query will fail to list that particular web. You would still see that Web counted in the WebCount property of the Sites dataset, but there would be no listing of that web in the query proposed above, since it has no files.
Another problem is that there is no way to find in MGDC the details properties of the Web, like the Title. You get some of those for the Root Web in the Sites dataset, but not for the other Webs. For that, you would have to query the Web object using another source like the Graph API or PowerShell.
5. Conclusion
I hope this article helped you find details about enumerating SharePoint Webs using the Microsoft Graph Data Connect. For recent deployments of SharePoint Online, you should have only one web per site and the Sites dataset (with Root Web details) should give you plenty of details. However, this might come in handy if you want to manage some of the old subsites of a slightly older SharePoint deployment.
Finally, if you’re looking for a challenge, I would suggest adapting the SQL code above to enumerate all the document libraries using the Files dataset. It can be done following a similar pattern…