What should be done to maintain distributed database keys? (More specifically for example, a replicated database that is updated on more than one machine or a table that is not always connected.) GUIDs have their issues, although they're a possibility. Is there a better solution? This is not an emergency. I'm just interested. ;-)
I concur with Fatherjack and that is generally the solution I have seen used when you want a surrogate key. Another possible answer is to look for a natural key. Not everything has a natural key that is appropriate, but when they do it tends to remove the problem entirely.
Generally with replication you get to set a range of IDs per subscriber so that they are exclusive they get to use up that range between syncs. When they sync and have used thier range they get given the next range after the current largest in use. EG subsvr1 gets ID range 1 - 10000 subsvr2 gets ID range 10001 - 20000 they both use IDs subsvr1 gets to the end of its range 1st and then gets issued 20001 - 30000 and so on. setting the range sizes is where you come in. You have to estimate the suitable range to allow offline functionality. beware large transactions that bump into the end of the range - the whole job gets rolled back ...
Starting from SQL Server 2000, there is a way to manage identity keys by letting the replication allocate different ranges to publishers and subscribers. The problem with this approach is the clustering. Identity keys are ideal clustering candidates because they are both ever increasing and never changing. In the distributed environment, the replication of such data will inevitably lead to inconsistent page splits. The same is applied to GUIDs. I believe that the page split damage can and should be minimized by not allowing front ends to feed the database objects with guids but rather by opting to implement newsequentialid() column defaults. This is not a bullet-proof approach of course because the sequential guids are reseeded at reboot, so they do lead to page splits even locally, and certainly in distributed environment. Another approach is the manufacturing of the pseudo-identity values by borrowing some info from other identifiers. This sounds attractive, but is inherently flawed if the subscribers move from one parent host to another. I happen to have practically worked with GUIDS and the last option, never with seeding identities for subscribers. Any scenario will require one to consider fillfactors very carefully. GUID approach is the easiest to maintain, but it cost some extra space because of the size (16 byte guid is twice as wide as 8-byte bigint, which matters when multiplied by all indexes referencing 8 extra bytes on every leaf pages index record of every non-clustered index in the table). The space is cheap enough though, so GUID combined with appropriately chosen fillfactor (usually 80 to 90) seems to be the easiest way to go. Oleg
I'm going for door number three! We are using homegrown key management for our system. Ids generated on the fly using number ranges based on location. This is a potential bottleneck for inserts - namely when lots of people try to get the next Id, but our system is not constrained on that side of things, so it is not a problem. The other option we looked at was a composite primary key, where a single tinyint column is prepended to the "real" key and hard-set per node in the replication. Don't really know why we didn't go down that route - as we wouldn't have any further problems with number range crashes down the road.
For what it's worth, I've done this before using identity seed and increment for surrogate keys - so, in a 5 node system, I might have: * Node1.Table1 - IDENTITY(1, 10) * Node2.Table1 - IDENTITY(2, 10) * Node3.Table1 - IDENTITY(3, 10) * Node4.Table1 - IDENTITY(4, 10) * Node5.Table1 - IDENTITY(5, 10) This left some space for expansion, and made sure there were no issues 'down the line'. However, as Oleg points out (+1), clustering can become an issue if one node accumulates rows at a different rate to the others.
A natural key. A guid makes a good surrogate but natural keys (I prefer the term "business keys", which means the same thing) are just as important in a distributed database as in a non-distributed one - potentially more so.