Pseudo language generator

The usual method of placing filler text on page or on a web design was to copy the Lorem Ipsum text found all over the place. The one down side to this was making sure the text wasn’t too recognizable, considering everyone uses it, and making sure you have enough of it for a particularly big layout or dummy text range.

For some reason this didn’t seem particularly elegant, especially since the afore mentinoed commonality and the only other alternative was to create actual posts with, you know, a post. Well that didn’t seem elegant either since those posts tend to look contrived and stupid… a lot like a most of the “real” posts on the internet, sadly.

So I set about creating my own pseudo language generator in JavaScript that I can plug anywhere and generate as many as needed and not worry about repeating myself. I’m home today thanks to doctor’s orders and bored to death of walking up and down the apartment so might as well do it now…

First step : Wikipedia

Apparently,  in English, ETAON RISHD LFCMU GYPWB VKXJQ Z is the alphabet arranged from the most frequently used letter to the least. While this seems logical to me, it didn’t make sense to mix the vowels in with the consonants since I needed to match a consonant with at least one vowel and in my own cursory observation of Lorem Ipsum, the vowels tended to have even distribution, but since I’m trying to match English as much as possible, I sorted the vowels by decreasing frequency: eaoiu and the consonants as well : tnrshdlfcmgypwbvkxjqz.

To make it easier to pair them with a consonant, I figured I’d put all the vowels together and randomly pick one from the pool, but that leaves their frequency.

The easiest way to make sure the most frequent letters are picked more than the least frequent was to multiply the occurence of the most frequent letters in the pool. I could have repeated the letter a number of times in the pool, but that felt hackish, so for this I wrote a simple multiply function.

function multFrequency(chars) {
	var cn = '';
	for(var i = chars.length; i > 0; i--) {
		var ch = chars.charAt(chars.length - i);
		for(var j = i; j >= 0; j--) {
			cn += ch;
		}
	}
	return cn;
}

What’s going on here? Well, we first we iterate through the pool starting with the character length and decrement by one. Each letter is selected using charAt using i. At each letter, we repeatedly add it to the new pool, “cn”, i number of times. Since i decreases as the loop continues, characters further down the pool get added fewer number of times. We want to add up to and including the last letter, which is why the inner loop is set to j >= 0. (Remember, the index is 1 less than the length.)

Pairing ending letters

According to the Wikipedia article, English also has common pairings. TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO.

I figured it will be easier to again split these up by pairings that start with vowels and those that start with consonants. So that leaves :

Consonant pairs : “th”, “he”, “nd”, “st”, “te”, “ti”, “hi”, “to”.
Vowel pairs : “an”, “er”, “in”, “on”, “at”, “es”, “en”, “of”, “ed”, “or”, “as”

So we need another function that will take the first letter of each pool and randomly select a pairing ending letter. This is actually a lot easier than it sounds. First part of this problem is making a little helper function that will randomly give a number between a minimum and maximum.

// Random number helper
function rnd(min, max) {
	return Math.floor(Math.random() * (max - min)) + min
}

Now we’re going to create the pair matching helper.

function fromPair(pairs, p) {
	var nc = '';
	for(var i = 0; i < pairs.length - 1; i++) {
		if (pairs[i].charAt(0) == p)
			nc += pairs[i].charAt(1);
	}
	if (nc == '')
		return nc;
	
	// Or else...
	return fromRange(nc);
}

What this does is iterate through each pair in the pool and take the second letter of each pair matching the first letter and create a new pool, “nc”. If “nc” is empty, then it didn’t find a matching pair and returns an empty string, but if at least one pair was found, it will randomly select from this pool… using the function below.

We need a function that will avoid letter duplications. I could be wrong, but in the original Lorem Ipsum, I don’t recall seeing double vowels. I think this makes sense in our new pseudo language.

function fromRange(chars, p) {
	var c;
	if (arguments.length > 1) {
		do {
			c = chars.charAt(rnd(0, chars.length -1));
		} while(c == p);
	} else {
		c = chars.charAt(rnd(0, chars.length -1));
	}
	
	return c;
}

This function is what we’ll use to randomly select characters from any pool. It also doubles as a duplicate remover if the second parameter is specified. Basically, it will retry a random pick from the pool until it skips the given parameter.

Building the language

To give this the look and feel of true randomness, I put all the above constants (vowels, consonants, pairings) into variables. I also created a minimum and maximum value set for word length, sentence sizes and paragraph sizes to give the impression of random entries.

var vowels = "eaoiu";

// The consonants are placed in the order of their appearence
var consonants = "tnrshdlfcmgypwbvkxjqz";

// Letters commonly paired (with consonants first and vowels next)
var consonantPairs = ["th", "he", "nd", "st", "te", "ti", "hi", "to"];
var vowelPairs = ["an", "er", "in", "on", "at", "es", "en", "of", "ed", "or", "as"];

var wMin = 2;		// Minimum word length
var wMax = 10;		// Maximum word length
var sMin = 4;		// Minimum sentence size
var sMax = 20;		// Maximum sentence size
var pMin = 1;		// Minimum sentences per paragraph
var pMax = 3;		// Maximum sentences per paragraph
var vFreq = 3;		// Every x characters must be a vowel

That last variable, vFreq, is what I think will really make or break this; I think having every 3rd character a vowel will make this seem realistic.

Now we need a function to generate a realistic sounding word…

function getWord(u) {
	if(arguments.length > 1)
		u = true;
	
	var r = rnd(wMin, wMax);
	
	var w = '';	// Completed word holder
	var c = '';	// Generated letter holder
	
	for(var i = 0; i < r; i++) {
		// Every x characters is a vowel
		if (i % vFreq == 0) {
			c = fromRange(consonants);
		} else {
			c = fromRange(vowels, c);
		}
			
		 // First letter of the word requested in uppercase
		if(u == true && i == 0)
			c = c.toUpperCase();
		w +=  c;
	}
	
	// Commonly paired letters
	if (consonants.indexOf(c) > -1 ) {
		w += fromPair(consonantPairs, c);
	} else {
		w += fromPair(vowelPairs, c);
	}
	
	return w;
}

This function has an argument to make the first letter upper case for use in the beginning of a sentence. Note the wMin and wMax variables we declared earlier between which the word lengths alternate. Also note in the for loop, we’re using that fromRange function with the second parameter (to skip duplicates) specified for vowels. I’m also making use of the fromPair function depending on whether the last character in the word ends in a consonant or vowel.

Now that we have the word generator we need a function that creates a sentence by repeatedly calling the above getWord function. Note the sMin and sMax variables that allow the sentence length to fluctuate.

// Creates a sentence (bunch of words ending in '. ');
function getSentence() {
	var r = rnd(sMin, sMax);
	var s = '';
	for(var i = 0; i < r; i++) {
		if(i == 0) // First letter in first word is uppercase
			s += getWord(true) + ' ';
		else
			s += getWord() + ' ';
	}
	
	return s.substring(0, s.length - 1) + '. ';
}

Finally a very simple paragraph generator that calls getSentence between pMin and pMax.

// Creates a paragraph (bunch of sentences wrapped in <p>)
function getParagraph() {
	var r = rnd(pMin, pMax);
	var p =  '<p>';
	
	for(var i = 0; i < r; i++) {
		p += getSentence();
	}
	
	return p + '</p>';
}

Putting these functions together, I created a paragraph that looks less like faux Latin and more like a Scandanavian language…

Buedaehain ges seist gieneof yauteof moareon noisoin daeceolan peobuohen rieyeiher sieqeawof cuekuaxeof deuliukan roapen teahan noifaogu liacon. Daogeadan rin xaegiehin can qeoviof dairin toefoatean rion teiceivean naijaeton riof rain hiakeof weawean.

But, oh well. For what it is, it does a well enough job, I think. Here’s a running demo of everything together.

For some reason, Modernizr kept throwing an error which means it doesn’t work in Firefox.

Update

Thanks to a very helpful comment by Lin, the code now works on Firefox. Turned out to be an encoding issue (Firefox doesn’t like ANSI and UTF dancing together).

Also a minor imporvement:
I changed the following line in the multFrequency function..

for(var j = i; j >= 0; j--)

Into…

for(var j = (i * i + 1); j >= 0; j--)

This yielded much better distribution of letters for both vowels and consonants.

Advertisements