Autocomplete with jQuery and MVC

This is just a prelude to a complete spellcheck addon to the discussion forum. I figured I’d start with basic autocomplete first that ties into the wordlist.

All spellcheckers essentially refer to a global wordlist in the specified language and any words that don’t belong, get flagged.

The hardest part of this turned out to be finding a decent wordlist. I was actually surprised at the delicate balance between finding a “good enough” list and one that’s “too good”. Too good? Yes, apparently a list that has too many words will mean you will get a lot of misses where an apparent misspelling turned out to be an obscure word… and you didn’t mean to use obscure words.

The final list I settled on has a word count of 125,346 and was from the Ispell project which also has common acronyms. Note: This is not the same as Iespell (written ieSpell), although if you Google, “Ispell”, you’ll get “ieSpell as the first result. Ispell lists are available for download at the Kevin’s Wordlist page. I have also combined the 4 main english lists into one file (MS Word). WordPress, strangely, won’t allow plain text files to be uploaded, but allows richtext documents. Email me if you want the plaintext version.

I started with a simple DB table to store all the entries. Since I may also be adding more languages, I also have a WordLang field which can be something small like “en”, “de”, “fr” etc…

Wordentries table

 

I then created an MVC app and loaded each of the wordlist files into the db using a simple function (this can take a while depending on filesize):

public List GetWords(string p) {
	var query = from line in File.ReadAllLines(p)
			select new Wordentry
			{
				WordText = NormalizeString(line),
				WordLowercase = NormalizeString(line).ToLower(),
				WordLang = "en"
			};
	return query.ToList();
}

 

After feeding it a HostingEnvironment.MapPath to the filename, I can use this to load all entries into the list and call a db.Wordentries.InsertAllOnSubmition the result. NormalizeString is another helper function which I will list below.

I’m using a Spellword model instead of directly using the Wordentry object since I may want to extend the returned result in the future and changing the columns in the DB wouldn’t be practical.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

namespace Spellcheck.Models
{
	public class Spellword
	{
		public int Id { get; set; }
		public string Spelling { get; set; }
		public string Lowercase { get; set; }
		public string Lang { get; set; }
	}
}

 

And we’re using a SpellRepository class so we’ll keep the controllers free of too much data access stuff.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.IO;
using System.Text;
using System.Globalization;

namespace Spellcheck.Models
{
	public class SpellRepository
	{
		// DataContext global
		private readonly CMDataContext db;

		public SpellRepository(CMDataContext _db)
		{
			db = _db;
		}

		/// <summary>
		/// Counts the total number of word entries
		/// </summary>
		/// <returns>Wordcount int</returns>
		public int GetCount()
		{
			return (from w in db.Wordentries
			 select w.WordText).Count();
		}

		/// <summary>
		/// Searches a given word or word fragment
		/// </summary>
		/// <param name="word">Search word/fragment</param>
		/// <param name="word">Number of returned results</param>
		/// <param name="word">Language to search. Defaults to 10</param>
		/// <param name="word">Search lowercase field only</param>
		/// <returns>List of spellwords</returns>
		public List<Spellword> GetWords(string word, int limit = 10,
			string lang = "en", bool lower = true)
		{
			word = (lower) ?
				NormalizeString(word.ToLower()) :
				NormalizeString(word);

			var query = from w in db.Wordentries
						select w;

			// Get only unique entries in case we have
			// duplicates in the db (Edited from an earlier "GroupBy")
			query = query.Distinct().OrderBy(w => w.WordLowercase);

			// If a language code was specified
			if (!string.IsNullOrEmpty(lang))
				query = query.Where(w=>w.WordLang == lang);

			// Lowercase?
			query = (lower) ?
				query.Where(w => w.WordLowercase.StartsWith(word)) :
				query.Where(w => w.WordText.StartsWith(word));

			// Order alphabetically
			query = query.OrderBy(w => w.WordLowercase);

			return (from w in query
					select new Spellword
					{
						Id = w.WordId,
						Spelling = w.WordText,
						Lowercase = w.WordLowercase,
						Lang = w.WordLang
					}).Take(limit).ToList();
		}
		/// <summary> 
		/// Inserts a new list of words into the spellcheck library
		/// </summary>
		public void SaveWords(List Words)
		{
			var query = Words.GroupBy(w => w.Spelling)
				.Select(w => w.First())
				.OrderBy(w => w.Spelling).ToList();

			List Entries = (from w in query
									   orderby w.Spelling ascending
									   select new Wordentry
									   {
										   WordText = w.Spelling,
										   WordLowercase = w.Lowercase,
										   WordLang = w.Lang
									   }).ToList();

			db.Wordentries.InsertAllOnSubmit(Entries);
			db.SubmitChanges();
		}

		/// <summary> 
		/// Helper function normalizes a given word to the Unicode equivalent
		/// </summary>
		/// <param name="txt">Raw word</param>
		/// <returns>Normalized word</returns>
		private static string NormalizeString(string txt)
		{
			if (!String.IsNullOrEmpty(txt))
				txt = txt.Normalize(NormalizationForm.FormD);

			StringBuilder sb = new StringBuilder();

			sb.Append(
				txt.Normalize(NormalizationForm.FormD).Where(
					c => CharUnicodeInfo.GetUnicodeCategory(c)
					!= UnicodeCategory.NonSpacingMark).ToArray()
				);

			return sb.ToString().Normalize(NormalizationForm.FormD);
		}
	}
}

To use this, we’ll just add a JsonResult action to our controller. I just created a Suggestions action in the default Home controller since this is just an example.

public JsonResult Suggestions(string word, int limit = 10, string lang="en")
{
	List Words = new List();
	if (!string.IsNullOrEmpty(word))
	{
		using (CMDataContext db = new CMDataContext())
		{
			SpellRepository repository = new SpellRepository(db);
			// 10 results is usually enough
			Words = repository.GetWords(word, limit, lang);
		}
	}
	// Need to use AllowGet or else, we'll need use POST
	return Json(Words, JsonRequestBehavior.AllowGet);
}

 

… And that pretty much covers the backend for now.

To test out to see if the word suggestion works, we’ll do one autocomplete textbox. Just add the jQuery and jQuery UI script files and include the jQuery UI CSS to your layout first and add this to the default view :

<script type="text/javascript">
	$(function () {
		var searchtext = $("#search");
		searchtext.autocomplete({
			source: function (request, response) {
				$.ajax({
					url: "/Home/Suggestions", // Or your controller
					dataType: "json",
					data: { word: request.term },
					success: function (data) {
						// Returned data follows the Spellword model
						response($.map(data, function (item) {
							return {
								id: item.Id,
								label: item.Spelling,
								value: item.Lowercase
							}
						}))
					}
				});
			},
			minlength: 3
		});
	});
</script>
<form action="/" method="post">
<input id="search" type="text" name="search" />
</form>

 

Fun fact : Total misspellings as I was writing this (excluding Ispell/ieSpell names and code) before running spellcheck = 12.

Yeah, I really can’t spell.

Damn you, brain! Why can’t YOU run spellcheck?!

I just came back from a late night coffee run and decided to sit down to work a little on my discussion forum before going to bed (I need coffee to sleep… don’t ask).

It was all fine and dandy until I decided to add a little spellcheck option to the input form. Not expecting that many people will use it since this is also meant to be mobile friendly so a lot of posts will likewise be txtspeak gibberish, but I thought it would be nice to have the feature anyway.

Let me preface this by saying that I have never been good at spelling or even an OK at spelling for that matter. I was even rubbish at spelling in Sinhalese when I was a little kid so this isn’t just an English thing. I don’t know if it’s some undiagnosed form of dyslexia or maybe I’m typing faster than the throughput of my cerebral plumbing or visa versa; either way, I just can’t spell.

So when I started writing the spellcheck functionality, I thought it was a simple, straightfoward affair. A dictionary source, a backend response generator and some client-side jQuery witchcraft to make this work without any added burden to the UI.

The burden, it turns out, was to my prefrontal cortex.

E.G. This was meant to be just a simpler version of the spellecheck plugin which comes with TinyMCE. I’m wasn’t using an IDE for the JS side of this, so I figured I’d be fine with just notepad.

What’s wrong with this?

(function() {
	tinymce.create('tinymce.plugins.SpelchekcPlugin', {
		inti : function(ed, url) {
			// Some stuff will happen here
		},
		createControl : function(n, cm) {
			return null;
		},
		getInfo : function() {
			return {
				longname: 'Spellcheck Plugin',
				author : 'eksith'
			};
		}
	});
});

Sometimes, I feel like a construction worker who’s always safe with equipment, always wears a helmet, always on time and always forgets his pants.