Monthly Archives: February 2013

Aspell Custom Dictionary

I’ve been playing around a bit with google go and the aspell package.  It’s been working great, except I haven’t found a good way to tell aspell to exclude or include words as needed.

I was able to find ways to add words to your custom dictionary, but I did not find a good way to exclude the custom words.  In this particular project, I found that many small two letter combinations were marked as correct, but they were not defined as words according to the dictionary.  Perhaps they are abbreviations, but I did not want them marked as correct for this particular project.

Instead of messing with the existing aspell dictionary, I decided to create a new language dictionary in aspell.

The command “aspell dicts” will dump the existing dictionaries so you can see what already exists.  I chose rv_EN to use.  The dictionary files are kept in /usr/lib/aspell on my system.

First, I created the file rv_EN.multi, which contains only a single line: “add rv_EN.rws”.  The command “aspell dicts” will confirm that aspell can now see the en_RV dictionary.

Now, we will need to create the rv_EN.rws file that defines our dictionary.  This is essentially a three step process.

  1. Dump existing dictionary into a text file
    /usr/bin/aspell -d en dump master | aspell -l en expand > /home/ryan/cust_dict/words.txt
  2. Add or remove words as needed
    I created remove_bad.py for this
  3. Convert text file into custom.rws
    sudo aspell –lang=en create master /usr/lib/aspell/rv_EN.rws < /home/ryan/cust_dict/goodwords.txt

I’ve scripted this process, and have put all necessary files in /home/ryan/custom_dict/.  When running the scripts, I have three files:

  • exclude.txt – this contains a list of the words I would like to remove from the dictionary
  • remove_bad.py – This is a python script that generates a new word list.
  • update.sh – Shell script that will execute all commands.  It should be run as root as you will need root privs to write to /usr/lib/aspell/.

Here are my scripts:

#! /bin/bash
# update.sh - run this as root.
# This could be entered into cron, but I have not done so, as I just run
# the script manually after editing the excluded words text file.
#
# creates words.txt by dumping the english dictionary from aspell
# calls remove_bad.py, which generates goodwords.txt
#    goodwords.txt is all words in words.txt except for those listed in
#    exclude.txt
# creates /usr/lib/aspell/rv_EN.rws from goodwords.txt
# rv_EN is already configured to use custom.rws only

# export english dictionaries to words.txt
echo "Exporting words to text file."
/usr/bin/aspell -d en dump master | aspell -l en expand > /home/ryan/cust_dict/words.txt

# remove the bad words
/home/ryan/cust_dict/remove_bad.py

echo "Converting word list into dictionary file."
aspell --lang=en create master /usr/lib/aspell/rv_EN.rws < /home/ryan/cust_dict/goodwords.txt

echo "Cleaning up!"
rm /home/ryan/cust_dict/words.txt
rm /home/ryan/cust_dict/goodwords.txt

 

#! /usr/bin/python
# remove_bad.py - this script generates a text file containing a list of
# good words to include into the aspell dictionary.
# remove_bad.py is called by update.sh

# open up list of words to remove from aspell dictionary.
# File should contain one word per line.
f = open("/home/ryan/cust_dict/exclude.txt")
badw = f.readlines()
f.close()

# status message showing how many words are in exclude list
print len(badw), "words in the exclude list."

# opens up the text dump of existing dictionary
f = open("/home/ryan/cust_dict/words.txt")
lines = f.readlines()
f.close()

# number of words in original dictionary
print len(lines), "words in the original dictionary."

# create file of good words
f = open("/home/ryan/cust_dict/goodwords.txt","w")

# this will write the dictionary words into goodwords.txt
# if they do not exist on the exclude list
for line in lines:
if line not in badw:
f.write(line)

# a function could be added here to add words to goodwords.txt
# if desired.

f.close()
exit()

Yes, I realize that I could probably do this with just a shell script, but I just prefer coding the file operations with python.  Either, way, I have my new dictionary, so I just define that when I’m creating my speller in go:

speller, err := aspell.NewSpeller(map[string]string{"lang": "rv_EN",})
// code from the go-aspell documentation.

Reference links: golang.orggo-aspell on Github | aspell

Edit: I found that my custom localization of english wasn’t working as expected, so I created a new language/local title rv_EN.  I’ve updated my post to reflect my changes.