Primary Keys and Distributed Environments

September 22, 2010 14:24 by wjchristenson2

In architecting a new application or enhancing an existing one, important considerations need to be made when choosing how to implement primary keys/IDs.  From my own personal research and experience; unique, clustered, auto-incrementing, integer identity columns as the PK perform the best when it comes to DB performance.  They result in more compact clustered indexes, less IO when the table is queried, and they yield more efficient joins.  However, they have inherent problems when used in a distributed environment.

 

Auto-Increment (Identity Columns)
Auto-Increment columns on a SQL Server have seed and and increment properties.  Because these properties are fixed and can be selected from a finite number of possible values, the probability that you’ll get a PK collision is very high.  Therefore an auto-incrementing integer identity column should only be used in non-distributed/download-only situations.  No other server would be creating PK values and thus, you would not have PK conflicts.

There is one way where you could use this type of PK.  Each node which generates PKs could be assigned an identity range to avoid conflicts.  I can see some negatives to this solution: 
1) Have to manage nodes/ranges
2) Page splitting
3) Increased IO on new record inserts as the next identity value isn’t necessarily the the max
4) Loss of PK value possibilities as ranges may not be fully utilized

 

GUIDs - NEWID()
Using a GUID or UNIQUEIDENTIFIER in SQL Server is attractive because the generation algorithm provides a sufficiently high degree of probability that the same key would never be generated twice on any 2 nodes at any given time.

Although GUIDs meet the uniqueness required of a distributed environment, they have some glaring drawbacks that should be considered:
1) Huge in size (4x larger than an integer / 16 bytes). 
2) Large size negatively affects indexes.
3) Because they are random, they would be inserted into random locations within the clustered index resulting in fragmented clustered indexes which yields poor IO performance when the table is queried.
4) Generation of a new GUID in SQL Server does cost more resources than auto-incrementing an integer.  When bulk inserting, this has some glaring performance problems.
5) NEWID() uses the the server’s network card’s ID number plus a unique number from the CPU clock to generate the GUID.  The server would need to have a NIC to ensure GUID uniqueness.

 

GUIDs – NEWSEQUENTIALID()
The answer to the randomness problems with generating a new GUID is to generate sequential GUIDs.  SQL Server has the NEWSEQUENTIALID() function which will generate a GUID that will be greater in value than the previously generated one.  This solves the problems of fragmented clustered indexes.  It also solves the performance problems with generating a new GUID.  Because the new “sequential” GUID takes the previous GUID and increments it per se, the overhead is minimal and performs almost as well as an auto-incremented integer.

So what are the drawbacks to a sequential GUID?
1) Still Huge in size (4x larger than an integer / 16 bytes).
2) Large size negatively affects indexes.
3) NEWSEQUENTIALID() uses the the server’s network card’s MAC address to assist in the GUID generation process.  If privacy is a concern, then this probably isn’t a viable option for you.  Again, the server would need to have a NIC to ensure GUID uniqueness.

 

Natural Keys
In this strategy, you use business keys to uniquely identify records (e.g. social security number for a person).  The main drawbacks with this approach is:
1) Many columns may be needed to create uniqueness of the record.
2) Inconsistent PK columns/data types.
3) Not all records may have a glaring natural key to setup a PK for.

 

Node Identifiers
For this approach, the PK will combine a value that is unique to the node and a value that is unique to the topology.  For instance, you could have a PK that combines an auto-increment identity column and a unique id for the node/server column.  You could also create a custom system to generate values for each inserted row based on the rowID/nodeID.

 

Summary
In summary, there are many ways to architect a distributed database environment in regard to primary keys.  Personally sequential GUIDs and node identifiers are my top 2 picks.  If I had to choose between the two, I think I’d opt for node identifiers.  GUID size, privacy concerns, and ease of use appear to be a bit too much in my opinion.

Bookmark and Share

Silverlight Performance Tips

November 19, 2008 04:30 by wjchristenson2

1)  Do set IsWindowless=False.  There is a high performance price in rendering windowless controls.  However I've found that if you do this, you might run into some problems.  First, if you want your background of your SL application to be transparent, then you must set IsWindowless to True.  Also, if you have HTML overlays (modals) over your SL application, you'll also need to set IsWindowless = True.

2)  Do NOT set the Silverlight's HTML control background property to transparent or any variation thereof (make it opaque).   If the background property is set to such, each render call will go through a blending sequence which adds to a higher CPU cost.

3)  Do change the Silverlight's application default MaxFrameRate.  The default value is 60.  Most SL applications will look/run fine anywhere between 15 to 30.  You can change the MaxFrameRate programmatically or simply markup the Silverlight HTML control.

4)  Do NOT do text size animations.  When you animate the size of text in SL, it uses hinting to smooth each text glyph.  When animating text size, SL may drop frames due to this.  If you can, use a vector graphic to represent large text animations.

5)  Do use Visibility instead of Opacity whenever possible.  Even if an object's opacity is set to 0, SL will still account for it and its still technically rendered.  Setting the object's visibility to Collapsed will cause SL to ignore rendering the object.

6)  When using the MediaElement object, do not specify its Width and Height.  Let SL render the object at its natural size.

7)  Do not set the Width and Height on path objects.  Rely on the points defined for the path.

8)  When displaying a double's value, do use Double.ToString(CultureInfo.InvariantCulture) rather than Double.ToString().  This will alleviate the need for SL to acquire the culture setting before displaying the double and CurltureInfo.InvariantCulture is optimized for perormance.

9)  If your application is very large, consider loading pieces of it "on demand".

Bookmark and Share