words [options] files |
-c,--count | Report the number of occurrences of each word. |
-s,--sum | Report only the total number of words |
-f,--fold | Convert input to lower case before detecting words. |
-p,--pattern=RE | set pattern defining the word separators |
default: [^[:alpha:]_] | |
-V,--version | print version and exit |
-h | print short help and exit |
--help | print full documentation via less and exit |
--pattern
option. By default, any character other then underscore and alphabetic characters (including accented characters) acts as a separator.
Without the --count
option, the output comes in 1 column of words, sorted in case insensitive order. With the --count
option two tab-separated columns appear with the counts in column 1 and the words in column 2; the order will be reverse numerically sorted on column 1 and normally sub-sorted on column 2.
The --fold
option converts all input to lowercase.
The Prêt-à-porter robe is priced at € 77.50, the shoes (ladies' only) at € 255.
To show the words in it:
words test #=> à at is ladies only porter priced Prêt robe shoes the The
To count the words, after folding upper to lower case:
words --count --fold test #=> 2at 2the 1à 1is 1ladies 1only 1porter 1prêt 1priced 1robe 1shoes
to include -
to be a possible word character, thus finding words like avant-garde
:
words -p '[^[:alpha:]-]' test #=> at is ladies only priced Prêt-à-porter robe shoes the The
Note that the - must be at the end of the expression, in order not to be interpreted as a range-character.
To count the number of backslashes in a TeX file:
words --pattern='[^\\]' -c test #=>
but, of course, this is a lot faster:
tr -dc '\\' <test |wc -c