Synopsis


words [options] files

-c,--count	Report the number of occurrences of each word.
-s,--sum	Report only the total number of words
-f,--fold	Convert input to lower case before detecting words.
-p,--pattern=RE	set pattern defining the word separators
	default: `[^[:alpha:]_]`
-V,--version	print version and exit
-h	print short help and exit
--help	print full documentation via less and exit

Description

words lists, and optionally counts, words occurring in a list of files or, if no arguments are present, standard input. Words are defined as character sequences separated by the regexp set by the --pattern option. By default, any character other then underscore and alphabetic characters (including accented characters) acts as a separator.

Without the --count option, the output comes in 1 column of words, sorted in case insensitive order. With the --count option two tab-separated columns appear with the counts in column 1 and the words in column 2; the order will be reverse numerically sorted on column 1 and normally sub-sorted on column 2.

The --fold option converts all input to lowercase.

Examples

Given an input file test containing:

  The Prêt-à-porter robe is priced at € 77.50,
  the shoes (ladies' only) at € 255.

To show the words in it:

    words test #=> 
    à
    at
    is
    ladies
    only
    porter
    priced
    Prêt
    robe
    shoes
    the
    The

To count the words, after folding upper to lower case:

    words --count --fold test #=>
    2at
    2the
    1à
    1is
    1ladies
    1only
    1porter
    1prêt
    1priced
    1robe
    1shoes

to include - to be a possible word character, thus finding words like avant-garde:

    words -p '[^[:alpha:]-]' test #=>
    at
    is
    ladies
    only
    priced
    Prêt-à-porter
    robe
    shoes
    the
    The

Note that the - must be at the end of the expression, in order not to be interpreted as a range-character.

To count the number of backslashes in a TeX file:

    words --pattern='[^\\]' -c test #=>

but, of course, this is a lot faster:

    tr -dc '\\' <test |wc -c

Author

Wybo Dekker

Copyright

Released under the GNU General Public License