Cleaning Up Letters From Scanned Pages

RW
Posted By
Richard_Wright
Jul 7, 2007
Views
809
Replies
19
Status
Closed
I am working on creating an electronic version of an old manual that has graphics and letters. I have copies of the book and am scanning these pages in.
I scan in using 1200dpi gray scale and saving as tiff. The file is about 131MB when finished. I convert this to jpg and it’s reduced to about 350k but the letters have jagged edges and just don’t look as good as I would like. I don’t want to spend a lot of time on each page because there are over 650.
Suggestions?
The documents are black and white only.
I use CS2

MacBook Pro 16” Mockups 🔥

– in 4 materials (clay versions included)

– 12 scenes

– 48 MacBook Pro 16″ mockups

– 6000 x 4500 px

P
Phosphor
Jul 7, 2007
650 pages is a heavy-weight project, and one where the time/money/effort investment upfront would pay off dividends on the back end.

To that end I might suggest you take a hard look at the Fujitsu ScanSnap (PCMag.com review) <http://www.pcmag.com/article2/0,1895,1992786,00.asp>.

Scan directly to PDF, with a document handler that can do upwards of 18 images-per-minute one side/30 i.p.m. duplex.

My brother uses one of these for his real estate business and loves it. Fast, accurate for OCR and well made.
RW
Richard_Wright
Jul 8, 2007
It’s a personal project, not one that I would sell and make any money on, or I would definitely look at your recommendations.
I’m trying to get the file size to 1 MB or less so I can fit the project onto a CD but I may need to save as .gif and put the project on a DVD instead.

Thanks for the suggestion.
RW
Richard_Wright
Jul 8, 2007
I should clarify that – I meant each page file size should be 1MB or less so it would fit on a CD. Approx 650 files would be less than a CD holds which is 700MB.
DM
Don_McCahill
Jul 8, 2007
the letters have jagged edges and just don’t look as good as I would like.

At what enlargement? An image scanned at 1200 dpi should look fine when printed at a normal resolution. It will look blocky at 100% in Photoshop, because that is roughly 72-100 ppi. Your 10 point letters will appear 120 points high at that resolution. If you look at a view where those letters again appear 10 points high, they should seem sharp.
B
Bigguy
Jul 9, 2007
wrote:
I am working on creating an electronic version of an old manual that has graphics and letters. I have copies of the book and am scanning these pages in.
I scan in using 1200dpi gray scale and saving as tiff. The file is about 131MB when finished. I convert this to jpg and it’s reduced to about 350k but the letters have jagged edges and just don’t look as good as I would like. I don’t want to spend a lot of time on each page because there are over 650.
Suggestions?
The documents are black and white only.
I use CS2
Use an OCR program to extract the text…
it can then be reformatted, edited etc. (if required) – also your file sizes will be dramatically smaller.

Guy
RW
Richard_Wright
Jul 9, 2007
I did some testing and it looks like when converting to .jpg is causing the poor quality. Other file types produce better looking images but the size of the file is quite large so I will need to see if I can compress the files further. Any recommendations for compressing .gif files?

Here is what I am doing so please correct my process is it is not the best: Using CS2, I scan the image in using grayscale with a resolution like 10200 x 13000 (approximate).

I then make my corrections, flatten the layers, and save as .tiff. The resultant files are approximately 131MB each.

I then use a program called SnagIt to convert to 1683 x 2175 (approx) .jpg and it reduces the 131MB file down to approx 500k.

Thanks for the suggestions!
B
Bernie
Jul 9, 2007
My recommendation:

Scan at 1200 ppi (100%) in greyscale, adjust using curves and levels then convert to bitmap mode (using 50% threshhold) keeping the resolution the same. Save as LZW TIFF.
RW
Richard_Wright
Jul 9, 2007
I’ll try that tonight thanks.

I don’t know what you mean by "curves" and I use "auto levels". What should I change these to?
JO
Jim_Oblak
Jul 9, 2007
Why not just scan as bitmap in 600 ppi? The manual is probably already screened so scanning in grayscale and converting to bitmap seems like an extra step. You don’t need to worry so much about levels if you are grabbing the exact screen from the previous print.

….and why JPG? Wouldn’t a manual be best in Acrobat PDF where you could run Acrobat’s OCR to make the thing have searchable text?
RW
Richard_Wright
Jul 9, 2007
I will try this too. As you have probably deduced, I am not proficient in graphic editing so all tips and suggestions are appreciated. I will have to research the Acrobat OCR function tho. I don’t know what version we have.

I use jpg because of the file size. I have about 650 pages and was hoping to fit it all as a pdf onto a CD. If this is not going to work then I would consider a DVD.

I take the images and plug into MS Word then print as a pdf. It sure seems like a lot of steps too, but I do not know a better way to do it.

Thanks for your input.
JO
Jim_Oblak
Jul 9, 2007
The 8-bit grayscale JPG image may be larger than what you get with a 1-bit image in a PDF file.

Mixing Microsoft Word into this is certainly going to increase your file size for no reason. If you do not have Acrobat Pro, download the 30-day trial to do this (although most scanner software already has a scan to PDF function).
RW
Richard_Wright
Jul 9, 2007
The reason for using Word was so that I could have all of the images in on file and create one .pdf. However, I can combine several .pdf files into one so that may be what I will need to do. I just don’t know if I can combine 650 .pdf files into one. I may need to create ‘chapter’ .pdf files.
You are right, my scanner can scan to pdf.
B
Bernie
Jul 9, 2007
The manual is probably already screened so scanning in grayscale and converting to bitmap seems like an extra step.

I’ve found scanning in greyscale and converting to bitmap afterwards gave me better control on small details in line illustrations. YMMV
TG
Tom Glowka
Jul 9, 2007
You can scan to PDF, combine files and use the File/Reduce File Size in Acobat distiller to make a smaller file.
BD
Brett Dalton
Jul 10, 2007
Just a note, most graphics compression in PDF’s IS Jpg. so your jsut rapping a Jpg in another file type.

Have you considered using OCR software to extract the text to a REAL PDF. This would be massivly smaller. More work but the end result would be searchable etc.
RW
Richard_Wright
Jul 10, 2007
I didn’t know that… Thanks.

I have considered OCR but this document is a historical manual that I would like to preserve in it’s native format (font, design, graphics, etc.)

I have not ruled that out tho.

Thanks!
JO
Jim_Oblak
Jul 10, 2007
Just a note, most graphics compression in PDF’s IS Jpg. so your jsut rapping a Jpg in another file type.

Scanners that scan to 1-bit will be using a type of TIFF in the PDF. JPG format is typically for 8 and 24 bit images.

I have considered OCR but this document is a historical manual that I would like to preserve in it’s native format (font, design, graphics, etc.)

OCR in Acrobat allows different options. What you probably want is a visual copy of the original but with searchable (invisible) text on top. This is possible with Acrobat.

I just did a 148 page historic document like this and it came out to be 11 MB. Do the math and note that you could fit thousands of pages on a CD-ROM.
RW
Richard_Wright
Jul 10, 2007
You are correct, that’s exactly what I want.
I have Acrobat 5 but have not used it in a long time. Do you know if the feature you mentioned is included in that version?
JO
Jim_Oblak
Jul 10, 2007
I believe OCR is in there but I cannot tell you if the option to retain the original scan and add searchable text is possible.

It should be an easy find in the help files under ‘Capture’ and/or ‘OCR’.

MacBook Pro 16” Mockups 🔥

– in 4 materials (clay versions included)

– 12 scenes

– 48 MacBook Pro 16″ mockups

– 6000 x 4500 px

Related Discussion Topics

Nice and short text about related topics in discussion sections