<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://www.codesrc.com/mediawiki/index.php?action=history&amp;feed=atom&amp;title=ScanningToPDF</id>
	<title>ScanningToPDF - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://www.codesrc.com/mediawiki/index.php?action=history&amp;feed=atom&amp;title=ScanningToPDF"/>
	<link rel="alternate" type="text/html" href="http://www.codesrc.com/mediawiki/index.php?title=ScanningToPDF&amp;action=history"/>
	<updated>2026-05-02T04:29:57Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.7</generator>
	<entry>
		<id>http://www.codesrc.com/mediawiki/index.php?title=ScanningToPDF&amp;diff=179&amp;oldid=prev</id>
		<title>Michael: Created page with &quot;HowTo improve the quality of scanned documents.  == Step 0: Generate PNM files == If you already have a set of PNM files from [http://www.sane-project.org/man/scanadf.1.html scan…&quot;</title>
		<link rel="alternate" type="text/html" href="http://www.codesrc.com/mediawiki/index.php?title=ScanningToPDF&amp;diff=179&amp;oldid=prev"/>
		<updated>2013-03-12T11:48:23Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;HowTo improve the quality of scanned documents.  == Step 0: Generate PNM files == If you already have a set of PNM files from [http://www.sane-project.org/man/scanadf.1.html scan…&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;HowTo improve the quality of scanned documents.&lt;br /&gt;
&lt;br /&gt;
== Step 0: Generate PNM files ==&lt;br /&gt;
If you already have a set of PNM files from [http://www.sane-project.org/man/scanadf.1.html scanadf(1)] you can skip this step.&lt;br /&gt;
&lt;br /&gt;
If you have a PDF document, split it into 1 PNM file per page with:&lt;br /&gt;
 pdftoppm -r 300 scanned.pdf pages&lt;br /&gt;
&lt;br /&gt;
== Step 1: Deskew the pages ==&lt;br /&gt;
Run [http://www.flameeyes.eu/projects/unpaper unpaper] to fix page rotations and remove noise at the edge of the page.  It&amp;#039;s amazing how much each page is rotated when fed through an ADF.&lt;br /&gt;
&lt;br /&gt;
 unpaper -v -ni 2 -ms 100,100 -dn bottom,right  --no-border-align --no-grayfilter page-%02d.ppm fixed-%03d.ppm&lt;br /&gt;
&lt;br /&gt;
Options:&lt;br /&gt;
* -ni 2: Dot your I&amp;#039;s.  The default &amp;quot;noise&amp;quot; parameters of 4 lonely pixels may remove all dots from the &amp;#039;i&amp;#039; characters.&lt;br /&gt;
* -ms 100,100: Try not to truncate headers and footers where text extends beyond the rest of the body text.&lt;br /&gt;
* -dn bottom,right: Detect skew-angles from the right-hand-side and bottom.  The values given here should depend on the scanned content.  Choose side(s) where there is a definitive column where the content starts.&lt;br /&gt;
* --no-border-align: Don&amp;#039;t reposition content to the centre of the image.  This option can cause pages with content only at the top of the page (eg. a single paragraph) to be moved to the centre.&lt;br /&gt;
* --no-grayfilter: Don&amp;#039;t remove large blocks of gray, as is used for table headings etc.&lt;br /&gt;
&lt;br /&gt;
== Step 2: Recombine images into PDF ==&lt;br /&gt;
The [http://www.imagemagick.org/script/convert.php ImageMagick convert] utility can be used to re-create a PDF document from a set of images.&lt;br /&gt;
&lt;br /&gt;
 convert fixed-0*.ppm -colorspace Gray -unsharp 0.5x0.5+0.5+0.008 -page A4 -quality 100 &amp;quot;Output.pdf&amp;quot;&lt;/div&gt;</summary>
		<author><name>Michael</name></author>
	</entry>
</feed>