[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sheflug] Convert JPEG to Acsii
> If you want to OCR look at using gocr
> (http://jocr.sourceforge.net/) ... I use this with in my mail
> delivery chain to remove the image-based spam. Works quite well :-)
Hmmmm.... how to decode the help file ;) .....
$ gocr -h
Optical Character Recognition --- gocr 0.41
using: gocr [options] pnm_file_name # use - for stdin
options (see gocr manual pages for more details):
-h - get this help
-i name - input image file (pnm,pgm,pbm,ppm,pcx,...)
-o name - output file (redirection of stdout)
-e name - logging file (redirection of stderr)
-x name - progress output to fifo (see manual)
-p name - database path including final slash (default is ./db/)
-f fmt - output format (ISO8859_1 TeX HTML XML UTF8 ASCII)
-l num - threshold grey level 0<160<=255 (0 = autodetect)
-d num - dust_size (remove small clusters, -1 = autodetect)
-s num - spacewidth/dots (0 = autodetect)
-v num - verbose (see manual page)
-c string - list of chars (debugging, see manual)
-C string - char filter (ex. hexdigits: 0-9A-Fx, only ASCII)
-m num - operation modes (bitpattern, see manual)
-a num value of certainty (in percent, 0..100, default=95)
examples:
gocr -m 4 text1.pbm # do layout analyzis
gocr -m 130 -p ./database/ text1.pbm # extend database
djpeg -pnm -gray text.jpg | gocr - # use jpeg-file via pipe
Not sure about output file formats. Have to have a think.
--
Richard
_______________________________________________
Sheffield Linux User's Group
http://www.sheflug.org.uk/mailfaq.html
GNU - The choice of a complete generation