http://msen.com/~wei/JT-homepage.html http://spd.erim.org/jt_papers/ It now is available from John Trenkle's homepage., as papers/sdr94ps.gz.

I have applied the technique to implement a written language identification program. At the moment, the system knows about 69 natural languages (counting Esperanto as a natural language).

The textcat programme is not any langer actively maintained by me. However, the SpamAssassin spam filter programme includes a version of TextCat. They have been working on it some more, so perhaps you want to get their version from http://spamassassin.apache.org.

Installation

Usage

Remotely related links

Interesting test cases

  • Staat men perplex, wil men eerst wat thee, of direct op visite in een bastion (erg vitaal detail: is er concreet theatraal of eerder absurd, abstract amusement)? (Hilverd Reker)
  • Shy Pakistani chaps' wan kin always aim to put bananas away and wink at (or chat up) hepatitic llama mamas at Jesus' pita chip snack shack by a quay in China. (Jon Azose)