anagramm(1) Manual Page

NAME

anagramm - anagram generator

SYNOPSIS

anagramm [OPTIONS] INPUT

DESCRIPTION

anagramm generates anagrams from INPUT using a dictionary. Anagrams are short pieces of text, usually few words in length, which can be formed from another piece of text by rearranging its letters. For example, “karma manager” is an anagram of “anagram maker”.

The default dictionary file is /usr/share/dict/words.

OPTIONS

Informative Options

--help: display help on program usage and exit
--license: display full license text and exit
--version: display version information and exit

Usage Options

Short options that do not require a parameter (-a, -d, -p and -s) can be grouped after a single dash. For example, -sd would turn on options -s and -d.

-a: print anagrams. This is the default operation (see also the -d and -s options).
-c STRING: print only the anagrams which contain the letters in STRING. For -d and -s options this works as if the letters in STRING were first removed from the input.
-d: print the dictionary. Only the words which can be formed from INPUT are printed. The -c option may affect the output.
-e WORDS: add the words given in WORDS to dictionary. The words must be separated by commas. Word length limits (as set by -l and -L options) are ignored.
-E CODING: sets the dictionary character encoding to CODING. The default is to use the system default encoding, but if you set this to some other value, the dictionary will be converted on the fly. Valid values are anything supported by iconv(3).
-f FILE: use FILE as the dictionary. The default is /usr/share/dict/words.
-i STRING: ignore the characters of STRING in input and dictionary words. This can be used to ignore punctuation, such as dashes, so that they do not affect the anagram generation, but the characters are still present in the output. The default is to ignore space, period, comma, dash and apostrophe.
-l LEN: set the minimum word length to LEN letters. Words shorter than this are not considered for anagrams. The default is 1.
-L LEN: set the maximum word length to LEN letters. Words longer than this are not considered for anagrams. There is no limit by default.
-m MODE: sets the mode. The modes can be defined as sections in the configuration file. See the below section THE CONFIGURATION FILE for details. The default mode is ‘main’.
-p: print all permutations of word sequences instead of just the first. Without this option anagrams that contain the same words as some other, earlier anagram, but in different order, are not printed. Note that this may increase the output length enormously!
-r STRING: remove the characters in STRING from INPUT and dictionary. These characters will not affect the anagram generation and they will also not be present in output. The default is to remove line feed, carriage return, dollar sign and backslash. Line feed and carriage return are always in the list of remove characters, even if an empty STRING is given.
-s: print letter count for INPUT. The -c option may affect the output.
-S CODING: sets the system character encoding to CODING. This should only be used if anagrammer fails to correctly detect the encoding!
-w COUNT: set the minimum word count to COUNT. Anagrams containing fewer words than this will not be printed. The default is 1.
-W COUNT: set the maximum word count to COUNT. Anagrams containing more words than this will not be printed. The default is input length (in letters) divided by three, meaning that the average word length has to be at least three letters.
-x WORDS: exclude the given words WORDS from anagrams. The words must be separated by commas. Anagrams that contain even a single word from the list will not be printed. If both -e and -x are given and there are same words in the lists, -x takes precedence and those words will not be added.

If more than one of the options -a, -d or -s are given, all the relevant information will be output.

EXAMPLES

These examples are enough to get you started with anagramm, but you should also read the “Hints” section below.

anagramm "I produce anagrams": generate all possible anagrams from the input, using the default dictionary. Note that the output will be very long, if you have a large dictionary!
anagramm -r "'" "I produce anagrams": use the default dictionary and generate all the anagrams, but remove apostrophes from input and dictionary words. This will make most genitives and plurals equal in English and the output will be shorter, because words like ‘makers’ and ‘maker’s’ will not be considered separate.
anagramm -d "I produce anagrams": print the dictionary words that can be formed from the input
anagramm -f words -w 3 -W 4 "I produce anagrams": use the dictionary words in the current directory and output all anagrams of the input which contain at least three words and at most four words
anagramm -x anagram,anagrams "I produce anagrams": print all the anagrams which do not contain the words ‘anagram’ or ‘anagrams’
anagramm -c dinosaur -l 2 -e a,i "I produce anagrams": use the default dictionary, output only anagrams that contain the word ‘dinosaur’, don’t consider words shorter than two letters and add the extra words ‘a’ and ‘i’
anagram -E ISO-8859-1 -f /usr/share/dict/swedish "anagram på svenska": use the ISO-8859-1 encoded Swedish dictionary for generating the anagrams.
anagram -m fi "sanankääntökoje": use the configuration of the fi mode, defined in the configuration file.

HINTS

anagramm can generate enormous numbers of anagrams from even short inputs, but as a computer program it does not understand language and it cannot tell if any of these anagrams make any sense. Thus, it is left for the user to filter the output and find the sensible anagrams.

Here are a few pointers on how to filter the output efficiently:

use a custom dictionary: The -d option is very useful for producing a custom dictionary, which only contains the words that can be formed from the input. This dictionary can then be edited to remove words that are not desired in the output (at least my dictionary files contain a lot of nonsensical words, names, etc. which are not very useful for most anagrams). You can also set a ‘tone’ for the anagrams by removing words that you do not like.
use the -c option: If there is some interesting word, or even several words, that you think might generate good anagrams, you can use the -c option to output only the anagrams that contain these words.
use a minimum word length: Especially with long inputs, it is often useful to initially set the minimum word length (the -l option) to three or four, or even five, letters. This will make the output a lot shorter and you can more easily pick out some candidate words or word combinations for the -c option.
use a maximum word count: This is similar to the above tip, but the output is a little bit different and more useful in some situations, because using a minimum word length completely removes some common words like ‘a’ and ‘I’ from the output. With a maximum word count (the -W option) these words may be present in the output when they are accompanied by long enough words.

When you run anagramm for the first few times, you may be horrified by the length of its output and how almost all of the anagrams are complete nonsense, but don’t let this scare you. After you learn to use the program a bit, you should be able to filter vast majority of the output in a few quick steps.

Example: the words “I produce anagrams” generate almost two million anagrams with a ‘small’ English dictionary. By removing apostrophes with the -r option, and by using a slightly filtered dictionary, produced by the -d option, the anagram count drops to half a million. Setting the maximum word count to three further reduces the output to just over 3000 anagrams!

After that the output is short enough to peruse manually, and you can pick words or word combinations to use with the -c option. Or you may even find some good anagrams from the output, if you read it through.

THE CONFIGURATION FILE

The configuration file is searched from $XDG_CONFIG_HOME, if that environment variable is set, or from $HOME/.config/ otherwise. The file must be called anagramm.conf. The file must also use the same encoding as the system does — if your system uses UTF-8, then the configuration file must also be in UTF-8.

An example file is provided with the project and will be installed with the program (usually either to /usr/share/doc/anagramm/ or to /usr/local/share/doc/anagramm).

Note that if an option is set both in the config file and with a command line arguments, the command line argument takes precedence.

The configuration file can contain comments, section headers and variables. Any line that begins with a hash (‘#’) is considered a comment and is skipped. Empty lines are also ignored.

The file is divided into sections (which can be selected with the -m option) and a valid configuration file must contain at least one section (‘main’). Sections are defined by a section header and variables that define the section. Section header consists of the section name placed inside square brackets. For example, [main] would start the section main.

After a section has been set, the section can be defined with variables. All variables until the next section or the end of the file belong to the section. The syntax for setting variables is:

NAME = VALUE

The VALUE is a string that can be either a C style quoted string with C escape sequences (\n, \t, \", \' or \\) or a non-quoted string that doesn’t contain white-space.

The valid variable names are:

dict: sets the dictionary file
coding: sets the dictionary encoding
min_words: sets the minimum number of words in an anagram
max_words: sets the maximum number of words in an anagram
min_word_len: sets the minimum word length
max_word_len: sets the maximum word length
ignore: sets the ignore characters
remove: sets the remove characters. Line feed and carriage return are always added to these.
extra: adds extra words to the dictionary. Words must be separated by commas. The minimum and maximum word lengths are ignored for these words.
exclude: sets the words to exclude. Words must be separated by commas.
system_coding: sets the system encoding. This should only be set if anagramm fails to detect the encoding correctly!

A short example config file might look like this:

[main]
  dict = /usr/share/dict/british-english-large
  min_word_len = 2
  remove = "'"
  extra = i,a

This would use the given dictionary, set the minimum word length to two, add apostrophe to remove characters and it would add the extra words ‘i’ and ‘a’ to the dictionary (these are added despite the minimum word length limit).

THE DICTIONARY FILES

The dictionary files should be just plain lists of words, one word on each line of the file. Empty lines are ignored and you can set some characters to be removed from words by the -r option (for example, I’ve seen $\ used to mark declensions of a word), but otherwise anagramm merely reads the dictionary and assumes that each line is a word.

If the dictionary does not use the default system encoding, you should tell anagramm the encoding with -E. The dictionary is then converted on the fly to the same encoding as used by the system. The conversion is done with iconv(3) and any supported encoding can be used.

LETTERS

Every character that is not on the remove or ignore lists (see the -r and -i options) is considered a “letter” and will be significant for anagram creation. All different characters are considered different letters with the exception of differences in case; uppercase and lowercase versions of the same alphabetic characters are considered the same letter, if the character is recognized as being alphabetic.

BUGS

There are no known bugs in the current release of anagramm, but the program is lacking some important features — the lack of which may well be considered bugs:

Characters can not be grouped. For some languages it would be useful to be able to group several characters as a single letter, so that, for example ‘a’ and ‘à’ would be considered the same letter.

These features will be added in the future versions.

EXIT STATUS

0: The program was executed successfully.
1: An error occurred.

ENVIRONMENT

XDG_CONFIG_HOME: configuration file directory. If the variable is not set, $HOME/.config/ will be used instead.

FILES

/usr/share/dict/words: the default dictionary
$XDG_CONFIG_HOME/anagramm.conf: configuration file. May also be located in $HOME/.config/anagramm.conf, if $XDG_CONFIG_HOME is not set.

AUTHOR

Written by Joni Toivanen, <jomiolto@gmail.com>

RESOURCES

Program home page: http://anagramm.sourceforge.net/