Check spelling on multiple LaTeX files using multiple languages

Consider that you have several directories and sub-directories, e.g., the organization of a thesis document in which each chapter has its own directory and several .tex files. And your goal is to check the spelling on all the .tex files in this tree of directories, recursively. Moreover, you want to check the spelling in different languages, some circumstances require that, such as the abstract of the thesis was written in multiple languages. Here is my approach for that, explained step-by-step:

First, in order to check the spelling, we are going to use aspell. Therefore, we rely on its dictionaries, which are packeged on Debian as aspell-[language], e.g., aspell-fr is the package for French dictionary. Install the packages referent to the dictionaries you want. To check which language dictionaries you have installed do:

aspell dump dicts

Then take note of those you gonna use, we gonna need them later on.

Now, back to our .tex spell check. First, lets get the content of all the .tex files, from here on lets consider you are in the root directory of your document’s tree:

find . -name '*.tex' -exec cat {} \;

Second, to spell check the words from text that resulted from the command above using, at the same time, many languages, for example, English from US and Brazilian Portuguese , we use aspell as following:

aspell --lang=en_US -t list | aspell --lang=pt_BR -t list

the -t indicates that we are going to check LaTeX files, i.e., to discard its directives and the list parameter outputs the words that were not found in the dictionary.

A final touch would be write to a file only unique instances of the misspelled words alphabetically sorted, thus the whole command is:

find . -name '*.tex' -exec cat {} \; | aspell --lang=en_US -t list | aspell --lang=pt_BR -t list | sort -u > typo.txt

Now you can proofread the typo.txt file and see if there are problematic words in there.

Using mirrors in SED

It took me long to discover mirrors in SED. That’s a feature that may help you a lot in some replacing problems. Mirrors are denoted by \1, \2, \3, .. \9 and are used to cut a part of the input string to use it in the replacement string, like a temporary variable. For use mirrors in SED, you need to use extended regexp, flag -r. Let’s see some applications:

Chaging a data:

$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\1.\2.\3@'
$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\2.\1.\3@'
$ echo 12/31/2004 | sed -r 's@([0-9][0-9])/([0-9]{2})/([0-9]{4})@\2-\1-\3@'

Inverting field order:

$ echo abcdefg | sed -r 's/(a)(b)(c)/\3:\2:\1/'
$ echo abcdefg | sed -r 's/(a)(b)(c).*/\1:\2:\3/'
$ echo abcdefg | sed -r 's/(a)(b)(c).*/\3:\2:\1/'

Read more:

See ya!

How to generate a bash script with an embeeded tar.gz (self-extract)

Consider that you need to perform a routine in a remote server, where you need to decompress a and execute a list of commands on this data. One alternative is send the tar.gz file to the remote server throught a ftp or scp and then log in the remote server and run a shell script or run manually a list of commands. Recall Java JRE setup, they use script.bin that comes with an embeeded tar.gz, which is self-extracted in the beginning of script execution. To build the self-extraction script I follow a tutorial published by Stuart Wells, which consists in four steps:

1) Create/identify a tar.gz file that you wish to become self extracting.

2) Create the self extracting script. A sample script is shown below:

> cat
echo "Extracting file into `pwd`"
# searches for the line number where finish the script and start the tar.gz
SKIP=`awk '/^__TARFILE_FOLLOWS__/ { print NR + 1; exit 0; }' $0`
#remember our file name
# take the tarfile and pipe it into tar
tail -n +$SKIP $THIS | tar -xz
# Any script here will happen after the tar file extract.
echo "Finished"
exit 0
# NOTE: Don't place any newline characters after the last line below.

3) Concatenate The script and the tar file together.

> cat example.tar.gz >
> chmod +x

4) Now test in another directory.

> cp /tmp
> cd /tmp
> ./

See ya!