Pseudo language generator

The usual method of placing filler text on page or on a web design was to copy the Lorem Ipsum text found all over the place. The one down side to this was making sure the text wasn’t too recognizable, considering everyone uses it, and making sure you have enough of it for a particularly big layout or dummy text range.

For some reason this didn’t seem particularly elegant, especially since the afore mentinoed commonality and the only other alternative was to create actual posts with, you know, a post. Well that didn’t seem elegant either since those posts tend to look contrived and stupid… a lot like a most of the “real” posts on the internet, sadly.

So I set about creating my own pseudo language generator in JavaScript that I can plug anywhere and generate as many as needed and not worry about repeating myself. I’m home today thanks to doctor’s orders and bored to death of walking up and down the apartment so might as well do it now…

First step : Wikipedia

Apparently,  in English, ETAON RISHD LFCMU GYPWB VKXJQ Z is the alphabet arranged from the most frequently used letter to the least. While this seems logical to me, it didn’t make sense to mix the vowels in with the consonants since I needed to match a consonant with at least one vowel and in my own cursory observation of Lorem Ipsum, the vowels tended to have even distribution, but since I’m trying to match English as much as possible, I sorted the vowels by decreasing frequency: eaoiu and the consonants as well : tnrshdlfcmgypwbvkxjqz.

To make it easier to pair them with a consonant, I figured I’d put all the vowels together and randomly pick one from the pool, but that leaves their frequency.

The easiest way to make sure the most frequent letters are picked more than the least frequent was to multiply the occurence of the most frequent letters in the pool. I could have repeated the letter a number of times in the pool, but that felt hackish, so for this I wrote a simple multiply function.

function multFrequency(chars) {
	var cn = '';
	for(var i = chars.length; i > 0; i--) {
		var ch = chars.charAt(chars.length - i);
		for(var j = i; j >= 0; j--) {
			cn += ch;
		}
	}
	return cn;
}

What’s going on here? Well, we first we iterate through the pool starting with the character length and decrement by one. Each letter is selected using charAt using i. At each letter, we repeatedly add it to the new pool, “cn”, i number of times. Since i decreases as the loop continues, characters further down the pool get added fewer number of times. We want to add up to and including the last letter, which is why the inner loop is set to j >= 0. (Remember, the index is 1 less than the length.)

Pairing ending letters

According to the Wikipedia article, English also has common pairings. TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO.

I figured it will be easier to again split these up by pairings that start with vowels and those that start with consonants. So that leaves :

Consonant pairs : “th”, “he”, “nd”, “st”, “te”, “ti”, “hi”, “to”.
Vowel pairs : “an”, “er”, “in”, “on”, “at”, “es”, “en”, “of”, “ed”, “or”, “as”

So we need another function that will take the first letter of each pool and randomly select a pairing ending letter. This is actually a lot easier than it sounds. First part of this problem is making a little helper function that will randomly give a number between a minimum and maximum.

// Random number helper
function rnd(min, max) {
	return Math.floor(Math.random() * (max - min)) + min
}

Now we’re going to create the pair matching helper.

function fromPair(pairs, p) {
	var nc = '';
	for(var i = 0; i < pairs.length - 1; i++) {
		if (pairs[i].charAt(0) == p)
			nc += pairs[i].charAt(1);
	}
	if (nc == '')
		return nc;
	
	// Or else...
	return fromRange(nc);
}

What this does is iterate through each pair in the pool and take the second letter of each pair matching the first letter and create a new pool, “nc”. If “nc” is empty, then it didn’t find a matching pair and returns an empty string, but if at least one pair was found, it will randomly select from this pool… using the function below.

We need a function that will avoid letter duplications. I could be wrong, but in the original Lorem Ipsum, I don’t recall seeing double vowels. I think this makes sense in our new pseudo language.

function fromRange(chars, p) {
	var c;
	if (arguments.length > 1) {
		do {
			c = chars.charAt(rnd(0, chars.length -1));
		} while(c == p);
	} else {
		c = chars.charAt(rnd(0, chars.length -1));
	}
	
	return c;
}

This function is what we’ll use to randomly select characters from any pool. It also doubles as a duplicate remover if the second parameter is specified. Basically, it will retry a random pick from the pool until it skips the given parameter.

Building the language

To give this the look and feel of true randomness, I put all the above constants (vowels, consonants, pairings) into variables. I also created a minimum and maximum value set for word length, sentence sizes and paragraph sizes to give the impression of random entries.

var vowels = "eaoiu";

// The consonants are placed in the order of their appearence
var consonants = "tnrshdlfcmgypwbvkxjqz";

// Letters commonly paired (with consonants first and vowels next)
var consonantPairs = ["th", "he", "nd", "st", "te", "ti", "hi", "to"];
var vowelPairs = ["an", "er", "in", "on", "at", "es", "en", "of", "ed", "or", "as"];

var wMin = 2;		// Minimum word length
var wMax = 10;		// Maximum word length
var sMin = 4;		// Minimum sentence size
var sMax = 20;		// Maximum sentence size
var pMin = 1;		// Minimum sentences per paragraph
var pMax = 3;		// Maximum sentences per paragraph
var vFreq = 3;		// Every x characters must be a vowel

That last variable, vFreq, is what I think will really make or break this; I think having every 3rd character a vowel will make this seem realistic.

Now we need a function to generate a realistic sounding word…

function getWord(u) {
	if(arguments.length > 1)
		u = true;
	
	var r = rnd(wMin, wMax);
	
	var w = '';	// Completed word holder
	var c = '';	// Generated letter holder
	
	for(var i = 0; i < r; i++) {
		// Every x characters is a vowel
		if (i % vFreq == 0) {
			c = fromRange(consonants);
		} else {
			c = fromRange(vowels, c);
		}
			
		 // First letter of the word requested in uppercase
		if(u == true && i == 0)
			c = c.toUpperCase();
		w +=  c;
	}
	
	// Commonly paired letters
	if (consonants.indexOf(c) > -1 ) {
		w += fromPair(consonantPairs, c);
	} else {
		w += fromPair(vowelPairs, c);
	}
	
	return w;
}

This function has an argument to make the first letter upper case for use in the beginning of a sentence. Note the wMin and wMax variables we declared earlier between which the word lengths alternate. Also note in the for loop, we’re using that fromRange function with the second parameter (to skip duplicates) specified for vowels. I’m also making use of the fromPair function depending on whether the last character in the word ends in a consonant or vowel.

Now that we have the word generator we need a function that creates a sentence by repeatedly calling the above getWord function. Note the sMin and sMax variables that allow the sentence length to fluctuate.

// Creates a sentence (bunch of words ending in '. ');
function getSentence() {
	var r = rnd(sMin, sMax);
	var s = '';
	for(var i = 0; i < r; i++) {
		if(i == 0) // First letter in first word is uppercase
			s += getWord(true) + ' ';
		else
			s += getWord() + ' ';
	}
	
	return s.substring(0, s.length - 1) + '. ';
}

Finally a very simple paragraph generator that calls getSentence between pMin and pMax.

// Creates a paragraph (bunch of sentences wrapped in <p>)
function getParagraph() {
	var r = rnd(pMin, pMax);
	var p =  '<p>';
	
	for(var i = 0; i < r; i++) {
		p += getSentence();
	}
	
	return p + '</p>';
}

Putting these functions together, I created a paragraph that looks less like faux Latin and more like a Scandanavian language…

Buedaehain ges seist gieneof yauteof moareon noisoin daeceolan peobuohen rieyeiher sieqeawof cuekuaxeof deuliukan roapen teahan noifaogu liacon. Daogeadan rin xaegiehin can qeoviof dairin toefoatean rion teiceivean naijaeton riof rain hiakeof weawean.

But, oh well. For what it is, it does a well enough job, I think. Here’s a running demo of everything together.

For some reason, Modernizr kept throwing an error which means it doesn’t work in Firefox.

Update

Thanks to a very helpful comment by Lin, the code now works on Firefox. Turned out to be an encoding issue (Firefox doesn’t like ANSI and UTF dancing together).

Also a minor imporvement:
I changed the following line in the multFrequency function..

for(var j = i; j >= 0; j--)

Into…

for(var j = (i * i + 1); j >= 0; j--)

This yielded much better distribution of letters for both vowels and consonants.

Advertisements

Why can’t you just answer the question?

Instead of going through an explication todo-list that you must finish unless that train of thought gets derailed.

I was on a conference call Saturday when our boss, Mr. Dick Hardass, asked one of my colleagues a seemingly simple question about a deadline. “This couldn’t take more than 5 seconds to answer” we all thought.

Mr. Shakes-like-a Twig — nice fellow, neat hair —  went on to take 2 minutes to say “a week”.

You see, Mr. Twig had to go through how he was examining three months of business intelligence reports to figure out the right algorithms to sort through it all. Then he had to describe our current layout (which we were all familiar with), that he had just moved, that our report templates are “adequate”, that he just had 2 cups of coffee, that we had setup a second database server to mirror the original data (also something we all knew) and that his parents are from Ohio.

Mr. Hardass is a new boss, so we didn’t get a chance to tell him to never ask Mr. Twig for a status report unless it’s by email. He also has the patience of a fruit fly, the attention span of a hummingbird and the temper of a wounded leopard being poked with a stick, while being forced to watch a Dharma and Greg rerun.

And while Mr. Twig, being a nice fellow, isn’t capable of understanding that silence on the other end is often a sign of a boil-over to come.

“I don’t have the bandwidth to deal with all that, I just want to know how we’re doing!”
etc… etc…

Mr. Hardass

The flustered Mr. Twig replied with a wavering voice, “a week”.

Answer the question!

I’ve heard this said to many people who travel down the winding path of explanations, touching all subjects except the point.

The older I get the more I realize that these people are actually answering a long list of questions in their head accumulated through the hours, days, years and even a lifetime. The one they just heard is actually tacked on at the bottom… which they will get to eventually, time be damned.

Moreover, as I get older, I feel that I’m turning into one of these explanation adventurers touring the treacherous waters of societal impatience. It’s not that I want to delay, frustrate or otherwise confuse the listener, but this is more of an external monologue that I go through to build that very important answer. I’m thinking and I desperately want to give an answer you’re happy with as soon as I can compile a program that displays it — to me first — so I can relay it to you.

Comments are necessary in the code of this program so I can keep track of what I’m doing.

Of course, that’s not to say that you should be wasting your time listening to a dissertation on the consistency of yak dung (which is actually different from cow and buffalo dung, but not many people know this) when you just need a two word answer.

Diatribes, while they personally satisfying, don’t really help people like us. A sudden change in facial expression — raised eyebrows and tilted head works — that you’re waiting is usually the most reasonable thing you can do to get us back on track.

Damn you, brain! Why can’t YOU run spellcheck?!

I just came back from a late night coffee run and decided to sit down to work a little on my discussion forum before going to bed (I need coffee to sleep… don’t ask).

It was all fine and dandy until I decided to add a little spellcheck option to the input form. Not expecting that many people will use it since this is also meant to be mobile friendly so a lot of posts will likewise be txtspeak gibberish, but I thought it would be nice to have the feature anyway.

Let me preface this by saying that I have never been good at spelling or even an OK at spelling for that matter. I was even rubbish at spelling in Sinhalese when I was a little kid so this isn’t just an English thing. I don’t know if it’s some undiagnosed form of dyslexia or maybe I’m typing faster than the throughput of my cerebral plumbing or visa versa; either way, I just can’t spell.

So when I started writing the spellcheck functionality, I thought it was a simple, straightfoward affair. A dictionary source, a backend response generator and some client-side jQuery witchcraft to make this work without any added burden to the UI.

The burden, it turns out, was to my prefrontal cortex.

E.G. This was meant to be just a simpler version of the spellecheck plugin which comes with TinyMCE. I’m wasn’t using an IDE for the JS side of this, so I figured I’d be fine with just notepad.

What’s wrong with this?

(function() {
	tinymce.create('tinymce.plugins.SpelchekcPlugin', {
		inti : function(ed, url) {
			// Some stuff will happen here
		},
		createControl : function(n, cm) {
			return null;
		},
		getInfo : function() {
			return {
				longname: 'Spellcheck Plugin',
				author : 'eksith'
			};
		}
	});
});

Sometimes, I feel like a construction worker who’s always safe with equipment, always wears a helmet, always on time and always forgets his pants.

Site of the Week: CosmicOS

What do you use when you need to initiate first contact with intelligent life-forms? An Open Source Contact Message of course!

Logic gates and Simple Expressions giving birth to a universal language

Logic gates and Simple Expressions. Giving birth to a universal language

If ET is listening out there, then this is probably the code to use to communicate. Nothing says “hello” like inviting them over to conquer us that much quicker, eh?

The language was actually inspired by Lincos by Hans Freudenthal and Carl Sagan’s Contact.

[insert name] Language sucks

So it isn’t just me then. There are many people out there who have their own favorite hatred of [insert name] programming, scripting or query language for various reasons.

Here’s a small list courtesy of Google CodeSearch :

Of course some of the problems mentioned could have been alleviated by using programming best practices from the onset of the project. A lot of the frustration I see is a result of trying to force and wrangle the language to make it do something in a way that could have been accomplished with a different method.

Why fight it when you can maneuver it instead?