ID Obfuscation Part II

Last week, I wrote a simple function for obfuscating a string that can be used to shorten URLs. I got a few emails from people who would actually like to obfuscate an ID key (E.G. a numeric primary key) of a large size (E.G. a PostgreSQL ‘bigserial’ type which can go up to 231). So many examples out there, but they seem convert the input to integers first, which can lead to loss of precision, especially in PHP.

I use Postgres too and I’ve moved around the big number problem by appending a random digit or two to the front and then encoding the whole thing. So when I need the original, I just decode it and remove the front digit(s). This does two things: It obfuscates the ID (no one needs to know 10001 and 10002 are neighbors) and makes sure each one is unique as long as the key given to it is unique. Of course if it’s a primary key from a database, you won’t have to worry too much about uniqueness; it already is. And since I’m always appending the same number of digits as I’ll remove when decoding, it doesn’t matter how large the number gets.

So here’s a function that will create a shortened ID from a given numeric key in PHP :

public function ConvertKey( $k, $create = false ) {
	
	$range = str_split( '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' );
	$il = strlen( $k );
	$cl = 62; // count( $range ) is redundant;
	$out = '';
	
	// Get number from key
	if( $create ) {
		
		$out = 0;
		
		// Letter/number to array key swap
		$a = array_flip( $range );
		for( $i = 0; $i < $il; $i++ ) {
			
			$c = $k[$i];
			$n = bcpow( $cl, $il - $i - 1 );
			$out = bcadd( $out, bcmul( $a[$c], $n ) );
		}
		
		// Strip front two random digits (appended below)
		$out = substr( $out, 2 );
		
	} else {
		
		// Append two random digits to the front
		// (NOT added, just attached to the front)
		$k = mt_rand( 10, 99 ) . $k . '.0';
		
		do {
			$c = bcmod( $k, $cl );
			$out .= $range[$c];
			$k = bcdiv( bcsub( $k, $c ), $cl );
			
		} while( bccomp( $k, 0 ) > 0 );
		
		// We worked from back to front
		$out = strrev( $out );
		
	}
	
	return $out;
}

You can test this out by sticking it in a loop :

for( $i = 5000; $i < 6000; $i++ ) {
	
	$kConverted = ConvertKey( $i );
	$kOriginal = ConvertKey( $kConverted, true );
	echo $i . ' - ' . $kConverted . ' - ' . $kOriginal . '<br />';
}

Of course, you’ll need to keep in mind that the generated key will be different each time you run it, however the end result after decoding will be the same.

I also wrote a post on encryption with… *ahem*… colorful comments and, thankfully, most people stuck to the actual code itself when contacting me about it. Yes, I did change the encryption mode from CFB to CBC. CFB doesn’t need padding so I wasn’t lying about the sleep-deprivation. Thanks to those who wrote to me about it.

An ID obfuscation function that fits into a Tweet

If you’re bothered about long IDs in your app or want to shorten existing large number keys, there are many examples on the web that take your numeric ID and perform some witchcraft with a bunch of numbers, letters and a database to turn something like http://www.mydirtysocks.com into 3Q8zk. What’s basically happening is that the URL is stored in a database and a unique key is generated for it (usually by using the ID field of the table). Some services check if the URL already exists and, if it does, returns the existing key.

When someone visits the shortening service with the key, the serivce looks it up and redirects you to the original URL.

The basic premise of shortening is that once you’ve exhausted numbers 0 – 9, you then move on to a – z and then A – Z. So instead of a character pool of just 36 with numbers + lower case letters, you now have a pool of 62 to represent a long ID. This is almost never why people use these functions and most don’t care that their ID is shorter.

What they’re really using it is for hiding that ID 10001 came before 10002 and for this, a lot of the shortening mechanisms out there are severely overkill. And then there’s the fact that because you’re using both upper and lower case letters; if some genius decided to turn all uppercase letters in a URL to lowercase in their forum/blog/email-service or some other kitchen sink application, your whole shortening scheme is hosed. This happens far more often than you think.

If you just wanted to generate a unique key (provided the original ID was unique) instead of showing the original ID, then there’s an alternative that’s exactly 140 characters. I checked :

All this does is take that initial ID in string format, split it into 9 character chunks, generate a CRC32 hash of each chunk, convert it to base 32 (the maximum value allowed for the base_covnert function in PHP) and append to the output.

Of course, if you want a more nice-ified version of that…

function shortenID($k){

	$out='';
	$p = str_split($k,9);
	
	for( $i = 0; $i < count( $p ); $i++ ) {
		$out. = base_convert( crc32( $p[$i] ), 16, 36 );
	}
	
	return $out;
}

Now, generating the ID is only half of the implementation. Since the hash cannot be reversed, you should generate it once for that particular ID and store it in a separate field in the same row of the DB table. This is what almost all URL shortening services do, but of course, they use that upper and lower case character malarkey.

Of course, if you do still want that malarkey, Lalit has created a class that does it too. Stop searching now and go download that class.

However…

Please try to be sane with your URL shortening shenanigans. It’s really quite stupid to shorten an already short URL, not to mention very irritating.