Discussion:
[sane-devel] help with improving text scans
gobo
2008-12-19 16:34:09 UTC
Permalink
for some time now i've been using homemade scripts with scanimage and
scanadf to scan my paper documents. most of my documents are plain
text. the results have always been poor and marginally acceptable. i'm
using suse 10.3 and an hp aio j6450 or psc1210xi.

recently i obtained a canon scanner w/adf for use at work where i must
use windows. to get around the image compatibility issues of microsoft
document imaging (office 2003) i simply print the scanned image to pdf
with acrobat. the results obtained with mdi are far superior to
anything i've ever been able to achieve with sane apps.

i've spent hours fumbling around with scanimage options, imagemagick
convert to resize the images and ps2pdf to produce the pdf files.
while i have made some slight improvements over the default settings,
i've never been able to get even close to the mdi output. in the few
places where i must have a good scan, i use resolutions of 150 or 300,
but to get prints of the image becomes a real pain. i must load the
image in gimp, fiddle around resizing it and then printing.


my standard scanimage script would contain:
scanimage -x 215.9 -y 297 -d
hpaio:/net/Officejet_J6400_series?ip=192.168.1.103 \
-pv --mode gray > $FILE


pieces from a perl script using the adf:
# this is the scan device
@scanr = ("hpaio:/net/Officejet_J6400_series?ip=192.168.1.103");
# these are the command line options for scanadf
@opts = ("-x 215.9 -y 297 -v --mode=gray --source ADF --batch-scan=no -e 1");

# scan page
system("scanadf @opts -d @scanr -o $fnamepg");

adding --resolution=150, or 300 does produce a larger image, with less
artifacting, and much more readable, but difficult to print.

the answer must be one of two things -- either i'm missing something
real simple about producing hi-res 8.5x11" images (that is right in
front of my nose) or we are just not there yet with linux scanning.

can someone correct, or put me on a better path?

thanks.
m. allan noah
2008-12-19 16:51:35 UTC
Permalink
The gimp probably defaults images to something like 72dpi. if you have
scanned at a higher dpi, it wont know, and will print it huge. just
change the print dpi in the gimp to match that at which you scanned.
look in the image->print size menu option

allan
Post by gobo
for some time now i've been using homemade scripts with scanimage and
scanadf to scan my paper documents. most of my documents are plain
text. the results have always been poor and marginally acceptable. i'm
using suse 10.3 and an hp aio j6450 or psc1210xi.
recently i obtained a canon scanner w/adf for use at work where i must
use windows. to get around the image compatibility issues of microsoft
document imaging (office 2003) i simply print the scanned image to pdf
with acrobat. the results obtained with mdi are far superior to
anything i've ever been able to achieve with sane apps.
i've spent hours fumbling around with scanimage options, imagemagick
convert to resize the images and ps2pdf to produce the pdf files.
while i have made some slight improvements over the default settings,
i've never been able to get even close to the mdi output. in the few
places where i must have a good scan, i use resolutions of 150 or 300,
but to get prints of the image becomes a real pain. i must load the
image in gimp, fiddle around resizing it and then printing.
scanimage -x 215.9 -y 297 -d
hpaio:/net/Officejet_J6400_series?ip=192.168.1.103 \
-pv --mode gray > $FILE
# this is the scan device
@scanr = ("hpaio:/net/Officejet_J6400_series?ip=192.168.1.103");
# these are the command line options for scanadf
@opts = ("-x 215.9 -y 297 -v --mode=gray --source ADF --batch-scan=no -e 1");
# scan page
adding --resolution=150, or 300 does produce a larger image, with less
artifacting, and much more readable, but difficult to print.
the answer must be one of two things -- either i'm missing something
real simple about producing hi-res 8.5x11" images (that is right in
front of my nose) or we are just not there yet with linux scanning.
can someone correct, or put me on a better path?
thanks.
--
http://lists.alioth.debian.org/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
--
"The truth is an offense, but not a sin"
gobo
2008-12-19 17:58:16 UTC
Permalink
the default for scanimage is 75dpi, and that is barely readable. i'll
certainly give your gimp setting change a try.

but, can i also do this in the shell? can i take an image scanned
with --resolution=150, funnel it through another utility (or two) and
end up with a pdf containing 8.5x11" hi-res pages?

what i do now when i need a print from a scanned image:
convert filename.pgm filename.ps
ps2pdf filename.ps

with anything larger than 75dpi, the pdf is junk. i can add the
-resize option to convert, and that helps, but not much. the document
is still hard to read.
Post by m. allan noah
The gimp probably defaults images to something like 72dpi. if you have
scanned at a higher dpi, it wont know, and will print it huge. just
change the print dpi in the gimp to match that at which you scanned.
look in the image->print size menu option
allan
Post by gobo
for some time now i've been using homemade scripts with scanimage and
scanadf to scan my paper documents. most of my documents are plain
text. the results have always been poor and marginally acceptable. i'm
using suse 10.3 and an hp aio j6450 or psc1210xi.
recently i obtained a canon scanner w/adf for use at work where i must
use windows. to get around the image compatibility issues of microsoft
document imaging (office 2003) i simply print the scanned image to pdf
with acrobat. the results obtained with mdi are far superior to
anything i've ever been able to achieve with sane apps.
i've spent hours fumbling around with scanimage options, imagemagick
convert to resize the images and ps2pdf to produce the pdf files.
while i have made some slight improvements over the default settings,
i've never been able to get even close to the mdi output. in the few
places where i must have a good scan, i use resolutions of 150 or 300,
but to get prints of the image becomes a real pain. i must load the
image in gimp, fiddle around resizing it and then printing.
scanimage -x 215.9 -y 297 -d
hpaio:/net/Officejet_J6400_series?ip=192.168.1.103 \
-pv --mode gray > $FILE
# this is the scan device
@scanr = ("hpaio:/net/Officejet_J6400_series?ip=192.168.1.103");
# these are the command line options for scanadf
@opts = ("-x 215.9 -y 297 -v --mode=gray --source ADF --batch-scan=no -e 1");
# scan page
adding --resolution=150, or 300 does produce a larger image, with less
artifacting, and much more readable, but difficult to print.
the answer must be one of two things -- either i'm missing something
real simple about producing hi-res 8.5x11" images (that is right in
front of my nose) or we are just not there yet with linux scanning.
can someone correct, or put me on a better path?
thanks.
--
http://lists.alioth.debian.org/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
--
"The truth is an offense, but not a sin"
m. allan noah
2008-12-19 18:27:45 UTC
Permalink
Post by gobo
the default for scanimage is 75dpi, and that is barely readable. i'll
certainly give your gimp setting change a try.
but, can i also do this in the shell? can i take an image scanned
with --resolution=150, funnel it through another utility (or two) and
end up with a pdf containing 8.5x11" hi-res pages?
you are confusing the dpi of the scan and the dpi of the print. If you
scan at 150, you should print at 150, if you want the size to match.
Since the scans as the come from sane likely wont have the dpi
embedded, you will have to tell someone in your chain of progs what
the original scan dpi was. try the -density flag to convert.

allan
Post by gobo
convert filename.pgm filename.ps
ps2pdf filename.ps
with anything larger than 75dpi, the pdf is junk. i can add the
-resize option to convert, and that helps, but not much. the document
is still hard to read.
Post by m. allan noah
The gimp probably defaults images to something like 72dpi. if you have
scanned at a higher dpi, it wont know, and will print it huge. just
change the print dpi in the gimp to match that at which you scanned.
look in the image->print size menu option
allan
Post by gobo
for some time now i've been using homemade scripts with scanimage and
scanadf to scan my paper documents. most of my documents are plain
text. the results have always been poor and marginally acceptable. i'm
using suse 10.3 and an hp aio j6450 or psc1210xi.
recently i obtained a canon scanner w/adf for use at work where i must
use windows. to get around the image compatibility issues of microsoft
document imaging (office 2003) i simply print the scanned image to pdf
with acrobat. the results obtained with mdi are far superior to
anything i've ever been able to achieve with sane apps.
i've spent hours fumbling around with scanimage options, imagemagick
convert to resize the images and ps2pdf to produce the pdf files.
while i have made some slight improvements over the default settings,
i've never been able to get even close to the mdi output. in the few
places where i must have a good scan, i use resolutions of 150 or 300,
but to get prints of the image becomes a real pain. i must load the
image in gimp, fiddle around resizing it and then printing.
scanimage -x 215.9 -y 297 -d
hpaio:/net/Officejet_J6400_series?ip=192.168.1.103 \
-pv --mode gray > $FILE
# this is the scan device
@scanr = ("hpaio:/net/Officejet_J6400_series?ip=192.168.1.103");
# these are the command line options for scanadf
@opts = ("-x 215.9 -y 297 -v --mode=gray --source ADF --batch-scan=no -e 1");
# scan page
adding --resolution=150, or 300 does produce a larger image, with less
artifacting, and much more readable, but difficult to print.
the answer must be one of two things -- either i'm missing something
real simple about producing hi-res 8.5x11" images (that is right in
front of my nose) or we are just not there yet with linux scanning.
can someone correct, or put me on a better path?
thanks.
--
http://lists.alioth.debian.org/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
--
"The truth is an offense, but not a sin"
--
"The truth is an offense, but not a sin"
David Poole
2008-12-19 18:36:54 UTC
Permalink
Post by m. allan noah
try the -density flag to convert.
-density 300 -units PixelsPerInch

is what I use when scanning to TIF or PNG. (Substitute '300' for appropriate scan DPI.)

-----Original Message-----
From: sane-devel-bounces+dpoole=***@lists.alioth.debian.org on behalf of m. allan noah
Sent: Fri 12/19/2008 11:27 AM
To: gobo
Cc: sane-***@lists.alioth.debian.org
Subject: Re: [sane-devel] help with improving text scans
Post by m. allan noah
the default for scanimage is 75dpi, and that is barely readable. i'll
certainly give your gimp setting change a try.
but, can i also do this in the shell? can i take an image scanned
with --resolution=150, funnel it through another utility (or two) and
end up with a pdf containing 8.5x11" hi-res pages?
you are confusing the dpi of the scan and the dpi of the print. If you
scan at 150, you should print at 150, if you want the size to match.
Since the scans as the come from sane likely wont have the dpi
embedded, you will have to tell someone in your chain of progs what
the original scan dpi was. try the -density flag to convert.

allan
Jeremy Johnson
2008-12-19 18:36:27 UTC
Permalink
Post by gobo
for some time now i've been using homemade scripts with scanimage and
scanadf to scan my paper documents. most of my documents are plain
text. the results have always been poor and marginally acceptable. i'm
using suse 10.3 and an hp aio j6450 or psc1210xi.
recently i obtained a canon scanner w/adf for use at work where i must
use windows. to get around the image compatibility issues of microsoft
document imaging (office 2003) i simply print the scanned image to pdf
with acrobat. the results obtained with mdi are far superior to
anything i've ever been able to achieve with sane apps.
i've spent hours fumbling around with scanimage options, imagemagick
convert to resize the images and ps2pdf to produce the pdf files.
while i have made some slight improvements over the default settings,
i've never been able to get even close to the mdi output. in the few
places where i must have a good scan, i use resolutions of 150 or 300,
but to get prints of the image becomes a real pain. i must load the
image in gimp, fiddle around resizing it and then printing.
scanimage -x 215.9 -y 297 -d
hpaio:/net/Officejet_J6400_series?ip=192.168.1.103 \
-pv --mode gray > $FILE
# this is the scan device
@scanr = ("hpaio:/net/Officejet_J6400_series?ip=192.168.1.103");
# these are the command line options for scanadf
@opts = ("-x 215.9 -y 297 -v --mode=gray --source ADF --batch-scan=no -e 1");
# scan page
adding --resolution=150, or 300 does produce a larger image, with less
artifacting, and much more readable, but difficult to print.
the answer must be one of two things -- either i'm missing something
real simple about producing hi-res 8.5x11" images (that is right in
front of my nose) or we are just not there yet with linux scanning.
can someone correct, or put me on a better path?
thanks.
I use 2 bash scripts for document scanning, bscan and scans2pdf,
located at http://www.acjlaw.net:8080/~jeremy/Ricoh/scripts/
The scripts are based on simpler versions I found on the net (I forget where)

The bscan (batch scan) script acquires pnm images from the scanner using
scanimage
and then processes those images into a multipage pdf using pnmtools.

The scans2pdf script takes sequential pnm images from xsane
(e.g. file.%04d.pnm) and converts them into a multipage pdf.

The processing logic in scans2pdf is exactly the same as in bscan.
I never got around to substituting the processing logic in bscan
with a call to scans2pdf
(it's mainly just a matter of repackaging
arguments to bscan to work with scans2pdf --
eg. the option "-gray nshades" enables both grayscale scanning and also
sets the number of grayshades to keep in the final processed pdf.)


To facilitate one-key scanning it's convenient to define some aliases:
alias B='bscan -gray 2'
alias BL='bscan -gray 2 -page Legal'
alias CL='bscan -color 32 -page Legal'
alias b='bscan -s 0'
alias bl='bscan -s 0 -page Legal'
alias c='bscan -color 32'

Thus to scan a letter-sized document in grayscale, and then convert to
black+white using adaptive/dynamic thresholding/binarization
I would simply use the command "B -bw filename" which will create
filename.pdf
To scan legal sheets in lineart mode: "bl filename" or in color "cl filename"
I have here a 13-page legal -sized document which was scanned in
grayscale and converted to b/w. It is 749K or 57K/pg which is reasonable.
I could have scanned in b/w but it would not have saved all that much space.

The bscan program accepts many options for
changing the default behavior:

SCANNER OPTIONS:
-d "device name" eg. HS2P or SP15C
-source ADF= Y | N
-page legal | letter
-color number_of_colors (enables color scanning & set max # colors)
-gray number_of_gray_shades
-res resolution
-duplex enables duplex
-s (user settings defaults)

PROCESSING OPTIONS
-bw (convert to black+white using adaptive thresholding)
-dither (eg. atkinson, see pamdither)
-color (remap colorspace to number_of_colors)
-gray (downsample to nshades of gray)
-flip r180 (rotates 180 degrees)

OUTPUT OPTION:
-pnm (don't convert to pdf)


Some documents which don't have enough contrast to still be readable
after conversion to b/w are simply scanned in gray or color mode:
"B filename" or "C filename"

The large filename.pdf can then be reduced in size by conversion to djvu:
pdf2djvu filename.pdf -o filename.djv
djview4 filename.djv -> print to ps
ps2pdf14 filename.ps filename.pdf (now much smaller ~1/50 original size)

Some documents may need user-interaction to set cropping,
brightness/contrast/gamma, etc. using xsane.
The scans (file.0001.pnm, file.0002.pnm, ...) can then be converted to pdf:
"scans2pdf -bw file" which will convert all the file*.pnm to a single
multipage file.pdf containing b/w images.


It should be straightforward to modify this script to recognize your scanners'
options and device names.

There is also a promissing gui-program gscan2pdf on sourceforge:
http://gscan2pdf.sourceforge.net/
There was a bug in the program which would not let me change my SP15C
scanner's options. I submitted a bug report to the author, but he hadn't been
able to fix/work around the problem. But the program may work for you.
Jeffrey Ratcliffe
2008-12-19 20:13:39 UTC
Permalink
Post by Jeremy Johnson
http://gscan2pdf.sourceforge.net/
There was a bug in the program which would not let me change my SP15C
scanner's options. I submitted a bug report to the author, but he hadn't
been able to fix/work around the problem. But the program may work for you.
Which bug was this? I can't find the one to which you are referring.

Regards

Jeff
m. allan noah
2008-12-20 00:30:13 UTC
Permalink
I can't speak for Jeremy, but there is the general bug that gscan2pdf
only shows options that it knows, which hides alot of nice
compression, or image processing, or imprinter options provided by
office scanners.

allan

On Fri, Dec 19, 2008 at 3:13 PM, Jeffrey Ratcliffe
Post by Jeffrey Ratcliffe
Post by Jeremy Johnson
http://gscan2pdf.sourceforge.net/
There was a bug in the program which would not let me change my SP15C
scanner's options. I submitted a bug report to the author, but he hadn't
been able to fix/work around the problem. But the program may work for you.
Which bug was this? I can't find the one to which you are referring.
Regards
Jeff
--
http://lists.alioth.debian.org/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
--
"The truth is an offense, but not a sin"
Jeffrey Ratcliffe
2008-12-20 15:13:43 UTC
Permalink
Post by m. allan noah
I can't speak for Jeremy, but there is the general bug that gscan2pdf
only shows options that it knows, which hides alot of nice
compression, or image processing, or imprinter options provided by
office scanners.
Nobody has ever reported this "general bug" to me. I have implemented
any requests for specific options I have received.

Having written Perl bindings for SANE, I will be able to do a more
general interface, but please file the bug on Sourceforge or against
the Debian package in the mean time.

Perhaps you have some ideas on what to do for those scanners which
offer so many options that the option dialog window is so tall that it
must be scrolled - maybe some way of showing/hiding certain options
that you don't use.

Regards

Jeff
m. allan noah
2008-12-20 15:40:27 UTC
Permalink
On Sat, Dec 20, 2008 at 10:13 AM, Jeffrey Ratcliffe
Post by Jeffrey Ratcliffe
Post by m. allan noah
I can't speak for Jeremy, but there is the general bug that gscan2pdf
only shows options that it knows, which hides alot of nice
compression, or image processing, or imprinter options provided by
office scanners.
Nobody has ever reported this "general bug" to me. I have implemented
any requests for specific options I have received.
And that is the problem: a user that only uses gscan2pdf won't know
the other options are there, and so won't report them missing.
Post by Jeffrey Ratcliffe
Having written Perl bindings for SANE, I will be able to do a more
general interface, but please file the bug on Sourceforge or against
the Debian package in the mean time.
Yes- your Perl bindings are most useful, hopefully you will find the
time to convert the app to use them :)
Post by Jeffrey Ratcliffe
Perhaps you have some ideas on what to do for those scanners which
offer so many options that the option dialog window is so tall that it
must be scrolled - maybe some way of showing/hiding certain options
that you don't use.
Hopefully the backend will breakup the options into reasonable sized
chunks with SANE_TYPE_GROUP options, and you could add a tab for each
group. Also, some options will have SANE_CAP_ADVANCED set, and you can
hide those until a user requests to see adv options.

allan
Post by Jeffrey Ratcliffe
Regards
Jeff
--
"The truth is an offense, but not a sin"
Jeffrey Ratcliffe
2008-12-20 16:05:49 UTC
Permalink
Post by Jeffrey Ratcliffe
Having written Perl bindings for SANE, I will be able to do a more
general interface, but please file the bug on Sourceforge or against
the Debian package in the mean time.
Now I remember the other problem with a general interface -
internationalisation. scanimage, for instance, doesn't seem to have
been translated:

LANGUAGE=de scanimage --help

produces English output.

If I look in saneopts.h, the macro SANE_I18N is used for all the
titles, but how do I get at the translations?

The option titles for the options that are currently supported in
gscan2pdf are translated in gscan2pdf (currently 26 languages have
been started).

Regards

Jeff
abel deuring
2008-12-20 17:57:22 UTC
Permalink
Post by Jeffrey Ratcliffe
Post by Jeffrey Ratcliffe
Having written Perl bindings for SANE, I will be able to do a more
general interface, but please file the bug on Sourceforge or against
the Debian package in the mean time.
Now I remember the other problem with a general interface -
internationalisation. scanimage, for instance, doesn't seem to have
LANGUAGE=de scanimage --help
produces English output.
If I look in saneopts.h, the macro SANE_I18N is used for all the
titles, but how do I get at the translations?
Just open sane's .mo file with gettext :) (OK, I'm more involved with
the "competition" -- Python --, so I don't know if a gettext
implementation for Perl exists, but I would be really suprised if it
doesn't.)

Abel
Jeffrey Ratcliffe
2008-12-20 22:15:38 UTC
Permalink
Post by abel deuring
Just open sane's .mo file with gettext :) (OK, I'm more involved with
the "competition" -- Python --, so I don't know if a gettext
implementation for Perl exists, but I would be really suprised if it
doesn't.)
Locale::gettext is the Perl module, which gscan2pdf uses for its own
localisation.

#!/usr/bin/perl
use warnings;
use strict;
use Locale::gettext;
use POSIX; # Needed for setlocale()
setlocale(LC_MESSAGES, "");
my $d = Locale::gettext->domain("sane-backends");
print $d->get("Preview"), "\n";

works.

What I really meant is that scanimage doesn't seem to use SANE's
translations, which makes it difficult to provide a consistent
interface for the different frontends scanimage/scanadf/libsane-perl.

But you are right - I can get the English strings from scanimage, and
feed them through gettext. Unfortunately scanimage --help only gives
the option titles for the groups.

Can the backends be induced to give the option titles in some debug mode?

Regards

Jeff
m. allan noah
2008-12-21 02:40:41 UTC
Permalink
On Sat, Dec 20, 2008 at 5:15 PM, Jeffrey Ratcliffe
Post by Jeffrey Ratcliffe
Post by abel deuring
Just open sane's .mo file with gettext :) (OK, I'm more involved with
the "competition" -- Python --, so I don't know if a gettext
implementation for Perl exists, but I would be really suprised if it
doesn't.)
Locale::gettext is the Perl module, which gscan2pdf uses for its own
localisation.
#!/usr/bin/perl
use warnings;
use strict;
use Locale::gettext;
use POSIX; # Needed for setlocale()
setlocale(LC_MESSAGES, "");
my $d = Locale::gettext->domain("sane-backends");
print $d->get("Preview"), "\n";
works.
What I really meant is that scanimage doesn't seem to use SANE's
translations, which makes it difficult to provide a consistent
interface for the different frontends scanimage/scanadf/libsane-perl.
actually, your life would be harder if scanimage DID use translations,
cause you would have to reverse the translation to figure out the
meaning. ;)
Post by Jeffrey Ratcliffe
But you are right - I can get the English strings from scanimage, and
feed them through gettext. Unfortunately scanimage --help only gives
the option titles for the groups.
Can the backends be induced to give the option titles in some debug mode?
IIRC, the groups are not required to have a description?

allan
--
"The truth is an offense, but not a sin"
abel deuring
2008-12-21 12:14:22 UTC
Permalink
Post by Jeffrey Ratcliffe
Post by abel deuring
Just open sane's .mo file with gettext :) (OK, I'm more involved with
the "competition" -- Python --, so I don't know if a gettext
implementation for Perl exists, but I would be really suprised if it
doesn't.)
Locale::gettext is the Perl module, which gscan2pdf uses for its own
localisation.
#!/usr/bin/perl
use warnings;
use strict;
use Locale::gettext;
use POSIX; # Needed for setlocale()
setlocale(LC_MESSAGES, "");
my $d = Locale::gettext->domain("sane-backends");
print $d->get("Preview"), "\n";
works.
What I really meant is that scanimage doesn't seem to use SANE's
translations, which makes it difficult to provide a consistent
interface for the different frontends scanimage/scanadf/libsane-perl.
Ah, I missed that point.
Post by Jeffrey Ratcliffe
But you are right - I can get the English strings from scanimage, and
feed them through gettext. Unfortunately scanimage --help only gives
the option titles for the groups.
Can the backends be induced to give the option titles in some debug mode?
Well, since you have now libsane-perl -- do you really need to call
scanimage or scanadf from gscan2pdf anymore ;) ?

Abel
Jeffrey Ratcliffe
2008-12-21 20:50:06 UTC
Permalink
Post by abel deuring
Well, since you have now libsane-perl -- do you really need to call
scanimage or scanadf from gscan2pdf anymore ;) ?
It is nice to have them to fall back on - there was one point a couple
of Ubuntu releases ago where Xsane, etc. had not quite kept up with
libsane, and so the only scanning programs that worked with one
particular set of scanners were scanimage, and therefore gscan2pdf.

But if it were possible to do everything with scanimage, then I
wouldn't have written libsane-perl...

Regards

Jeff

Loading...