Sequential GUIDS in Python

computermagic

5.00/5 (1 vote)

6 Sep 2014CPOL1 min read

12.9K

Example of generating sequential GUIDs from Python

Introduction

Example of generating sequential GUIDs from Python.

The original article explains in detail why and how we use sequential guids: http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database

Our previous post shows how to do the same thing in C++ with Qt: http://www.codeproject.com/Tips/594304/Sequential-GUIDs-in-Cplusplus-with-Qt

Background

Databases insert sequential information quickly. When you insert information out of order it requires the database to rebuild indexes which can be time consuming. Get the advantages of sequential numbering for you database records while getting the benefit of unique ids that a GUID gives.

This code was written in the Web2Py environment but should work for any python project using 2.6 or higher.

The Python Class

Here is the class we use. You can put it in a module or right into your source file.

###### SequentialGUID
import os
import datetime
import sys
from binascii import unhexlify, hexlify
import uuid

class SequentialGUID:
    SEQUENTIAL_GUID_AS_STRING = 0
    SEQUENTIAL_GUID_AS_BINARY = 1
    SEQUENTIAL_GUID_AT_END = 2
    
    def __init__(self):
        pass
    
    @staticmethod
    def NewGUID(guid_type = SEQUENTIAL_GUID_AS_STRING):
        # What type of machine are we runing on?
        endian = sys.byteorder # will be 'little' or 'big'
        # Need some random info
        rand_bytes = bytearray()
        rand_bytes += os.urandom(10) #Get 10 random bytes
        
        # Get the current timestamp in miliseconds - makes this sequential
        ts = long((datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)).total_seconds() * 1000)
        tsbytes = bytearray()
        # NOTE: we don't pass endian into long_to_bytes
        tsbytes += long_to_bytes(ts) # Convert long to byte array
        while (len(tsbytes) < 8):  # Make sure to padd some 0s on the front so it is 64 bits
            tsbytes.insert(0, 0) # Python will most likely make it a byte array
        
        guid_bytes = bytearray(16) # 16 bytes is 128 bit
        
        
        # Combine the random and timestamp bytes into a GUID
        if(guid_type != SequentialGUID.SEQUENTIAL_GUID_AT_END):
            guid_bytes[0] = tsbytes[2]  # Copy timestamp into guid
            guid_bytes[1] = tsbytes[3]
            guid_bytes[2] = tsbytes[4]
            guid_bytes[3] = tsbytes[5]
            guid_bytes[4] = tsbytes[6]
            guid_bytes[5] = tsbytes[7]
            
            guid_bytes[6] = rand_bytes[0]  # Copy rand bytes into guid
            guid_bytes[7] = rand_bytes[1]
            guid_bytes[8] = rand_bytes[2]
            guid_bytes[9] = rand_bytes[3]
            guid_bytes[10] = rand_bytes[4]
            guid_bytes[11] = rand_bytes[5]
            guid_bytes[12] = rand_bytes[6]
            guid_bytes[13] = rand_bytes[7]
            guid_bytes[14] = rand_bytes[8]
            guid_bytes[15] = rand_bytes[9]
        else:
            # Same as above, but different order - timestamp at end not beginning
            guid_bytes[10] = tsbytes[2]  # Copy timestamp into guid
            guid_bytes[11] = tsbytes[3]
            guid_bytes[12] = tsbytes[4]
            guid_bytes[13] = tsbytes[5]
            guid_bytes[14] = tsbytes[6]
            guid_bytes[15] = tsbytes[7]
            
            guid_bytes[0] = rand_bytes[0]  # Copy rand bytes into guid
            guid_bytes[1] = rand_bytes[1]
            guid_bytes[2] = rand_bytes[2]
            guid_bytes[3] = rand_bytes[3]
            guid_bytes[4] = rand_bytes[4]
            guid_bytes[5] = rand_bytes[5]
            guid_bytes[6] = rand_bytes[6]
            guid_bytes[7] = rand_bytes[7]
            guid_bytes[8] = rand_bytes[8]
            guid_bytes[9] = rand_bytes[9]
            pass
        
        # Create the guid and return it
        guid = uuid.UUID(hex=hexlify(guid_bytes))
        return guid
    
    
def long_to_bytes (val, endianness='big'):
    """ Pulled from http://stackoverflow.com/questions/8730927/convert-python-long-int-to-fixed-size-byte-array
    Use :ref:`string formatting` and :func:`~binascii.unhexlify` to
    convert ``val``, a :func:`long`, to a byte :func:`str`.

    :param long val: The value to pack

    :param str endianness: The endianness of the result. ``'big'`` for
      big-endian, ``'little'`` for little-endian.

    If you want byte- and word-ordering to differ, you're on your own.

    Using :ref:`string formatting` lets us use Python's C innards.
    """

    # one (1) hex digit per four (4) bits
    width = val.bit_length()

    # unhexlify wants an even multiple of eight (8) bits, but we don't
    # want more digits than we need (hence the ternary-ish 'or')
    width += 8 - ((width % 8) or 8)

    # format width specifier: four (4) bits per hex digit
    fmt = '%%0%dx' % (width // 4)

    # prepend zero (0) to the width, to zero-pad the output
    s = unhexlify(fmt % val)

    if endianness == 'little':
        # see http://stackoverflow.com/a/931095/309233
        s = s[::-1]

    return s

### Usage
### guid = SequentialGUID.NewSequentialGUID(SequentialGUID.SEQUENTIAL_GUID_AS_STRING)
### Use String for most dbs, and At End for MSSQL if you use their GUID field type
### REQUIRES: Python 2.6+ with bytearray support

###### End SequentailGUID

Using the code

To use the module, calll the NewGUID method as a static member. Remember if you put it into a module you may need to import it first.

C++

# Get a sequential GUID
g = SequentialGUID.NewGUID(SequentialGUID.SEQUENTIAL_GUID_AS_STRING)
# Turn it into a string
string_g = str(g)

# Get a guid for MSSql
SequentialGUID.NewGUID(SequentialGUID.SEQUENTIAL_GUID_AT_END)

Points of Interest

I did the array copies by hand so the code is a touch longer. I don't do Python full time and it was easier to write it out than to look up slicing of arrays. With only a few elementes in the arrays there should be no noticible pefformance impact from doing so.

History

Original article: http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database

QT rewrite: http://www.codeproject.com/Tips/594304/Sequential-GUIDs-in-Cplusplus-with-Qt

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)