Recent, I bought a Chumby aka Insignia Infocast to display web pages and it works great for that. But then I wanted to be able to read pdf files (ebooks) also. The browser I am using with the Chumby really does not support PDF files. So there had to be away around this issue. Web browser is good at displaying web pages, so why not convert my pdf files to html. You should be able to do this with any touchpad or internet viewer. We are in the process of converting our instructables from pdf to html. We are also in the process of converting our personal ebooks to html so that they can be read anywhere without the need of a expensive proprietary ebooks reader ( i.e. nook, kindle, and etc), Existing laptop, nettop, or desktop will do just fine. Enjoy!
linux web server (once files are converted they should be able to be used on Apple Mac or MSWindows servers also)
Touchpad or equivalent with access to the web server.
Step 1: Software Setup on the Server.
Use your gui (graphical user interface) package manager or from the command line:
$ sudo apt-get update
$ sudo apt-get install poppler-utils
That is all there is to it to install the utilites.
Step 2: Simple Conversion.
You have the utility installed well how do you use it.
On the server from the command line you would type
$ pdftohtml filetoconvertfilename.pdf convertedtohtmlfilename.html
In my case I made a pdf of an instructables page using pdfit (a firefox addin). So to convert the file was:
$ pdftohtml Pudding\ from\ Scratch.pdf pfs.html
You could probably put this in a script to convert lots of files at once. but for now we are going simple.
The you want to move the html file (pfs.html) to a directory (/var/www/htmlfiles/) that will be easily accessible from the web. You may also want to set up a menuing system to access the files easily without having to remember all the file names. See: (https://www.instructables.com/id/Introduction-to-installing-web-apps/) for more info.
Now point your web browser to that directory and have fun!
Step 3: Appendix: Poppler Utility Commands.
pdftohtml [options] [pdf file] [html file]
A summary of options are included below.
-h, -help - Show summary of options.
-f - first page to print
-l - last page to print
-q - don’t print any messages or errors
-v - print copyright and version info
-p - exchange .pdf links with .html
-c - generate complex output
-i - ignore images
-noframes - generate no frames. Not supported in complex output mode.
-stdout - use standard output
-zoom - zoom the pdf document (default 1.5)
-xml - output for XML post-processing
-enc - output text encoding name
-opw - owner password (for encrypted files)
-upw - user password (for encrypted files)
-hidden - force hidden text extraction
-dev - output device name for Ghostscript (png16m, jpeg etc)
-nomerge - do not merge paragraphs
-nodrm - override document DRM settings
pdftohtml test.pdf test.html
This command gives you a simple HTML file suitable for reading or copying the textual content of the PDF file. You can actually grab the text from your browser and paste it into other applications. It doesn’t produce any PNG files, so you won’t be able to see any embedded graphics. It’s a great utility if you just want to extract the text from an Adobe file.
If you want to see graphics, you’ll need to use the -c (as in “complex”) option:
pdftohtml -c test.pdf test.html
This option produces individual HTML files, one for each page of the PDF file, with the PNG references mixed in. The graphics in the original PDF file show up in a browser and the text part can be cut and pasted. The total size of the HTML and PNG files generated with the -c option tend to be roughly equivalent to that of the original PDF.
Step 4: Extract Images From Pdf.
Just thought this might be a good idea to add this feature to the instructable. Extract jpg files from an instructable.
$ pdfimages -j foo.pdf bar
Extract JPEG images from a PDF document
This will extract all DCT format images from foo.pdf and save them in JPEG format (option -j) to bar-000.jpg, bar-001.jpg, bar-002.jpg, etc.
pdfimages - Portable Document Format (PDF) image extractor (version
pdfimages [options] PDF-file image-root
Pdfimages saves images from a Portable Document Format (PDF) file as
Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files.
Pdfimages reads the PDF file PDF-file, scans one or more pages, and
writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx,
where nnn is the image number and xxx is the image type (.ppm, .pbm,
Specifies the first page to scan.
Specifies the last page to scan.
-j Normally, all images are written as PBM (for monochrome images)
or PPM (for non-monochrome images) files. With this option,
images in DCT format are saved as JPEG files. All non-DCT
images are saved in PBM/PPM format as usual.
Specify the owner password for the PDF file. Providing this
will bypass all security restrictions.
Specify the user password for the PDF file.
-q Don't print any messages or errors.
-v Print copyright and version information.
-h Print usage information. (-help and --help are equivalent.)
The Xpdf tools use the following exit codes:
0 No error.
1 Error opening a PDF file.
2 Error opening an output file.
3 Error related to PDF permissions.
99 Other error.
The pdfimages software and documentation are copyright 1998-2004 Glyph
& Cog, LLC.
pdftops(1), pdftotext(1), pdfinfo(1), pdffonts(1), pdftoppm(1),
22 January 2004 pdfimages(1
Nowadays newer browsers can view PDF files without converting them to html. So all you have to do is to copy the files to the server and then make an html file of the directory listing. File is attached for converting the file listing into an html file.
Batch file notes:
First was to make a list of the files.
But then you may not want the pdf extention as part of the descriptions.
So now to create a file called descripts without the extensions:
Now we need to start building the file parts with the first part of the
href. We are assuming the pdf files will be in the same directory as the html file.
Now we need to close the href.
Lets add the description and create the html file:
And then finally to close up the reference:
Cool. Add a bit of window dressing and we are done.