DIY High-Speed Book Scanner from Trash and Cheap Cameras by daniel_reetz
Contest WinnerFeatured

Step 78: Download Page Builder

PAGEBUILDER.jpg
Aaron Clarke wrote the software to process the output of this book scanning system. It reads in all the images, allows you to set a crop, corrects for irregular lighting, and outputs PDF.

Currently, this is alpha software. It makes a number of assumptions. It requires a powerful machine to work. You will be best off with at least 2 gigabytes of ram and up to ten gigabytes of free hard disk space. At some point, this will change, but likely not very soon.

While it is very easy to tell the software what to do, it takes a while to process so much image data. Page Builder may take more than 3 hours to process a 300 page book. Currently, we have to make a book into a couple smaller PDFs -- the reason being the way we make PDFs from Matlab. If anyone has Matlab code for good PDF printing, please contact us.

Download Page Builder for XP here.
Download Page Builder for Vista here.
Mac users will need a copy of Matlab, as we can't get a standalone version to work. The XP and Vista copies both include the source script, which was developed on a Mac and works fine..

UPDATE 2009/04/20: If you are having page order issues, please try this version, which also includes some imaging enhancements.

Only the XP version has been extensively tested.

Page Builder is Free Software. The sources are available. We are graduate students and have extremely limited time to support this software. So little time that we actually have no time. It is our hope that other people will help shoulder some of the development costs of this software.

 
Remove these adsRemove these ads by Signing Up
you1 says: May 24, 2009. 12:20 PM
Unfortunately, the software requires MATLAB, which is not free.
Consider the following alternative, until we get a free solution:
1) Using XnView to crop, resize, adjust brightness/contrast as a batch job. Download XnView from http://www.xnview.com/en/screenshots.html
As I was searching for a free solution, I found this XnView. I really liked the simplicity and speed of XnView. After the initial learning curve; I was able to create a batch process under the tools menu. At times, I found it even more convenient than PhotoShop Automated script (I’m not a graphics guru)
2) Rename the files for the left and the right camera, and merge them into one folder.
I first used spreadsheet to aid in renaming the files; later, I created script for this process. See my instructions here: http://www.mind2b.com/component/content/article/9-info/8-renaming-or-renumbering-camera-or-image-files.
3) I used Adobe Acrobat to create my PDF.
Perhaps someonelse can suggest a good free alternative.

rjwarpath says: Oct 1, 2009. 8:20 PM
A good Free PDF printer program is Primo-PDF.
daniel_reetz (author) says: May 24, 2009. 12:33 PM
Are you on a Mac? Because on PC, you don't need Matlab, and we may be able to get it compiled for Mac, too. I hope to get the new version out by Monday.
you1 says: May 24, 2009. 1:41 PM
I had trouble running the software (on XP); The README stated that it requires MATLAB! I will retry the software on XP, and report my findings. Perhaps, I made a mistake.
daniel_reetz (author) says: May 24, 2009. 1:53 PM
The EXE supplied should work on XP without Matlab. Keep in mind that there is a Vista version and an XP version. The XP version is here:

http://danreetz.com/book_scanner/PgBldr2_XP.zip
you1 says: May 24, 2009. 9:24 PM
When I run PgBldr2exe on XP with service pack 3, I get the following dialog error message:

Title: PgBldr4.exe - Unable To Locate Component
Message: This application has failed to start because mclmcrrt77.dll was not found. Re-installing the application may fix this problem

The readme.txt has the following to say:

...Ensure that the MATLAB Component
Runtime (MCR) is installed on target machines, and ensure you
have installed the correct version...

daniel_reetz (author) says: May 24, 2009. 10:13 PM
Thanks a lot for the detailed error message. I'll see what I can figure out when Aaron and I get together tomorrow.
daniel_reetz (author) says: May 24, 2009. 1:54 PM
Also, we should have a version by tomorrow that outputs nice JPGs instead of PDF... it will be a lot faster and will work well with Acrobat and ABBY.
moddi says: May 27, 2009. 8:04 AM
I have finished building the bookscanner (thanks!), but also get the mclmcrrt77.dll error message and can't use page builder. Which is disappointing!
daniel_reetz (author) says: May 27, 2009. 8:32 AM
PS. Hey, please post some pictures of your scanner!
daniel_reetz (author) says: May 27, 2009. 8:10 AM
Hey moddi, we are working on this. Sorry for the delay. I'll get a new version together ASAP... should be this afternoon.
daniel_reetz (author) says: May 27, 2009. 8:21 AM
It turns out that we need to include the Matlab Component Runtime with our compiled Matlab code. I am uploading the Matlab Component Runtime to my webserver now. I will also be uploading a new version of PageBuilder that can output JPG (which is about a zillion times faster than our current PDF method).
daniel_reetz (author) says: May 27, 2009. 8:31 AM
Here's the new version of PageBuilder. I'll update the Instructable itself later.

http://danreetz.com/book_scanner/PgBldr2_JPEG_XP.zip

This version gives you the option of JPG or PDF output. I strongly recommend using JPG output for the time being, and putting those JPGs together into a book using another application.

You'll need the Matlab Component Runtime to make it work. Unfortunately, it's 175mb, but I have no control over that. I'm uploading it here:

http://danreetz.com/book_scanner/MCRInstaller.exe

It will be finished uploading in about 20 minutes.
daniel_reetz (author) says: May 27, 2009. 8:51 AM
It's finished uploading. Please download the MCR and install and let me know if it works for you. I'm very sorry for these problems. My systems all have the Student version of Matlab on them, so of course everything works perfectly over here.
moddi says: May 27, 2009. 11:29 AM
Thanks! The new combo of the Mathlab Runtime and the JPG output version worked fine for me. I tried in on a few sample pages. Now I have to figure out how to do the photos better. It is not clear to me if I should be zooming in on the pages so they fill the screen (and how to get both sides the same size), and how the focus should be operating. I'll try to send some photos of my book scanner tomorrow. I used drawer glides for the bookholder and for attaching the patten to the main post. Also, I have a handheld caera switch on a cord, it is made out of a glue stirck container.
daniel_reetz (author) says: May 27, 2009. 12:08 PM
*wipes sweat from forehead* Great! I can't help you with your cameras until you tell me which ones you have. Most of the modern Powershots have discrete zoom -- by this, I mean that if you tap the zoom button once, the zoom advances a set amount. That makes it easy to get the zoom the same for all images, just tap the left and right cameras the same number of times. You want to get the image as large as possible on the screen, because that gives you more resolution. it pays to give yourself a little slop in both directions, though. Focus should operate as normal unless you go into manual mode. The bkrpr people posted great instructions on manual mode in this thread.
moddi says: May 27, 2009. 2:06 PM
They are the Canon A590 and I'm not a camera person. The tapping sounds good, I hope it works on my cameras.

Re "manual mode", that is what you say to use in the instructables. I made all the settings as shown. But re "Focus should operate as normal unless you go into manual mode" -- I don't understand that, is there an alternative mode to use for this? Or maybe you can be in manual mode and choose either manual focus or auto focus?

A final question for now - when I push the switch, what should happen? Should the cameras take the photos, simultaneously, on that single click? Or should I have to click twice, once to focus and then again to shoot?

Thanks for your patience.
daniel_reetz (author) says: May 27, 2009. 3:13 PM
Hey, no problem. Manual mode on the camera doesn't necessarily mean that it will be focusing manually. Manual mode allows you to control the 3 important variables -- shutterspeed, ISO, and aperture -- but it can still auto focus or manually focus. I think for your case the easiest thing to do is just to let the camera autofocus while in manual mode. The click can be a bit deceiving. Both things that you mention can happen. If you click really fast, you can have both cameras attempt to focus and then you have to press again to fire their shutters. The desired behavior happens every time if you kind of click-hold for just a fraction of a second. Both cameras should focus and fire nearly simultaneously.
you1 says: May 27, 2009. 9:49 AM
Success (the program launched). I'll give it a full test run this weekend. Thank you.
daniel_reetz (author) says: May 27, 2009. 10:05 AM
w00t! Please let me know if the new JPG output works for you. I know the PDF output is very, very slow... we're working on it. Thanks for the report.
spamsickle says: May 31, 2009. 11:54 AM
I'm using an open-source product called ImageMagick to do the conversions. I've written a Perl script which accepts information about my scans (the names and ranges of the left- and right-page scans, the offsets and sizes of the portions of the scan I want to save, the page number to start with for output, etc.) which generates the script to run ImageMagick. The script generated looks something like this: convert.exe PICT2283.JPG -crop 2850x1760+200+120 -rotate 270 1.pdf convert.exe CIMG0001.JPG -crop 2700x1850+200+180 -rotate 90 2.pdf convert.exe PICT2284.JPG -crop 2850x1760+200+120 -rotate 270 3.pdf convert.exe CIMG0002.JPG -crop 2700x1850+200+180 -rotate 90 4.pdf convert.exe PICT2285.JPG -crop 2850x1760+200+120 -rotate 270 5.pdf convert.exe CIMG0003.JPG -crop 2700x1850+200+180 -rotate 90 6.pdf convert.exe PICT2286.JPG -crop 2850x1760+200+120 -rotate 270 7.pdf convert.exe CIMG0004.JPG -crop 2700x1850+200+180 -rotate 90 8.pdf convert.exe PICT2287.JPG -crop 2850x1760+200+120 -rotate 270 9.pdf convert.exe CIMG0005.JPG -crop 2700x1850+200+180 -rotate 90 10.pdf The names of the scans for the left and right pages are different in my setup because I'm using two different models of camera. While it COULD be done with two similar cameras, you'd either have to guarantee that the left range and the right range didn't overlap, or keep them in separate directories until they were converted. I get the area to crop by loading a couple of images into Photoshop, but any image software that will tell you where your cursor is (in pixels) and the dimensions of your selection could be used. ImageMagick crops, rotates, and converts the JPG to PDF in less than 2 seconds per image. I can convert 100 images in about 3 minutes. Once I have all the images converted to PDFs, I use another free tool called PDFTK to stitch them together into a book. Once again, to spare myself typing, I have a Perl script to generate the command line for me. It works for me. I can convert a 1500-page book in less than an hour (once the scanning is done, and the images are loaded on my computer), and (after I get my numbers from Photoshop and generate the script) it runs unattended. I've found ImageMagick and PDFTK are handy tools to have, and (as you might be able to tell from my bare-bones bookscanner) I'm a fan of using what I already have.
daniel_reetz (author) says: Jun 2, 2009. 8:53 AM
I wonder if a good software solution might be just a modular set of programs glued together with a little scripting, something like Hugin. We could have one program that does photometric correction, like lens distortion and luminance problems, one program for deskew, and then one program for OCR. It seems doable. Can you share an example output image from your software? I can supply hosting if you need it.
Pro

Get More Out of Instructables

Already have an Account?

close

PDF Downloads
As a Pro member, you will gain access to download any Instructable in the PDF format. You also have the ability to customize your PDF download.

Upgrade to Pro today!