04 March 2010

Just in case you’ve been reading the RSS and Twitter feeds you subscribe to instead of your email, you might have missed the post announcing that Build 49 of the VCL Subscription has just been released. So if you’re a subscriber, nip over to the Client Center and start downloading.

The big feature of Build 49 is the release of ExpressSpellChecker 2 with HunSpell support and with some quite remarkable improvements in both performance and memory use.

The attributes we now support with ExpressSpellChecker include

  • Compound words and options for compounding. (Compound words being words formed from joining two other words.)
  • Twofold suffix stripping. (A good one this: in order to spell check words you can either add every single words, with every different possible tense or plural or other derivation present, which would be huge, or you mark words as being able to take certain suffixes. ExpressSpellChecker could cope with one suffix before, now it can cope with two. Think care, careless, carelessness, as an example of twofold suffixes.)
  • Extended affix classes. (Affixes are morphemes that can be tacked onto a word to form another. Think of them as prefixes, suffixes, and circumfixes. For HunSpell, affixes are in a separate file (the affix file) from the dictionary file.)
  • Homonyms. (A homonym is a word that is spelled or that sounds like another, but has a different meaning. This could be something as simple as post the verb, versus post the noun.)
  • Prefix-suffix dependencies. (Tough one this: as soon as you allow prefixes and suffixes to a word, you have to allow for the possibility that certain combinations for certain root words are not valid. So, you could have member, dismember, dismemberment, but you can’t have memberment.)
  • Circumfixes. (Applying a prefix and a suffix at the same time: the circumfix wraps the word. Wikipedia suggests the German past participle where ge--t is wrapped around regular verbs to form the past participle.)

The great thing is you don’t have to worry about all this: the spell checking engine takes care of it all to make correction suggestions. The suggestions it makes will now be of a higher quality than before.

You can find more information about ExpressSpellChecker 2 here, and more about other changes in the overall VCL subscription here.

