by Wayne Walter Berry via SQL Azure Team Blog on 7/15/2010 2:57:42 PM
When you horizontally partitioning data across multiple SQL Azure databases or using Data Sync Server for SQL Azure, there might come a time when you need to write to a member database without causing primary key merge conflicts. In this case you need to be able to generate a primary key that is unique across all databases. In this article we will discuss different techniques to generate primary keys and their advantages and disadvantage.
One way to generate a unique primary keys is to use the NEWID() function in Transact-SQL, which generates a GUID as a uniqueidentifier data type. The GUID is guaranteed to be unique across all databases.
Advantages:
Disadvantage:
Another option is to use a bigint data type in place of an int. In this technique, the primary key is generated from being an identity column; however each identity in each database starts at a different offset. Different offset create the non-conflicting primary keys.
The first question most people ask, is bigint data type big enough to represent all the primary keys need. The bigInt data type can be as large as 9,223,372,036,854,775,807 because it is stored in 8 bytes. This is 4,294,967,298 times bigger than the maximum size of an int data type: 2,147,483,647. This means that you could potentially have 4 billion SQL Azure databases horizontally partitioned with tables of around 2 billion rows. More information about data types and sizes can be found here.
On the first SQL Azure database you would create the table like this:
CREATE TABLE TEST(x bigint PRIMARY KEY IDENTITY (1,1))
On the second SQL Azure database you would create the table like this:
CREATE TABLE TEST(x bigint PRIMARY KEY IDENTITY (2147483648,1))
And continue incrementing the seed value for each database in the horizontal partitioning.
Disadvantages:
In this technique a single identity database is built where all the primary keys are stored, however none of the data. This identity database just has a set of matching tables that contain a single column of integers (int data type) as an auto incrementing identity. When an insert is needed on any of the tables across the whole partition, the data tier code inserts into the identity database and fetches the @@IDENTITY. This primary key from the identity database is used as the primary key to insert into the member database or the partition. Because the identity database is generating the keys there is never a conflict.
So how many integers can a 50 Gigabyte SQL Azure database hold? This is a good question, because if you run out of space on your database acting as the primary key pool, then you can’t insert anymore rows. If all your tables where single column integers in the primary key database you could have 25,000 tables with two million rows (table size of 2 Megabytes) in a 50 Gigabyte SQL Azure database. 50 Gigabytes is currently the largest SQL Azure database you could use for your primary key database. Or some combination of that, like 12,000 tables of 4 million rows, or 6,000 tables of 8 million rows.
Another technique is to use two columns to represent the primary key. The first column is an integer that specifies the partition or the member database. The second column is an int IDENTITY, that auto increments. With multiple member or partition databases the second column would have conflicts, however together the two columns would create a unique primary key.
Here is an example of a CREATE TABLE statement with a double column primary key:
CREATE TABLE TEST ([partition] int, [id] int IDENTITY, CONSTRAINT PK_TEST PRIMARY KEY([partition], [id]));
Remember you need to add a second column for all the primary keys, and a second column to all foreign key references.
Do you have a favorite technique for solving the same problem; share it with us in the comments below. Or give us your opinion about what technique is the best. Have questions, concerns, comments? Post them below and we will try to address them.
Original Post: Creating Primary Keys Across Databases
The content of the postings is owned by the respective author. AzureFeeds is not responsible for the contents of the postings. This site is automatically generated and cannot be reviewed for abusive content. If you find abusive content on AzureFeeds, please contact us. Designated trademarks and brands are the property of their respective owners. All rights reserved.