¿Insertar imágenes PDF en texto, desde pdftotext y pdfimages?

I was able to install the pdftotext utility (comes with Linux I guess) to convert PDF's into text, and extract the images on a Mac:

# install poppler, xpdf, and imagemagick
brew install imagemagick
brew install poppler # not sure if this worked, had to install `xpdf` from online .dmg
pdftotext sample.pdf output.txt
pdfimages sample.pdf pdf-images
# then convert .ppm to .jpg
# one at a time:
# convert pdf-images-001.ppm pdf-images-001.jpg
# batch:
mogrify -format jpg *.ppm

So now I have an output.txt with the (impressively well formatted) text from the PDF, and a bunch of images which I had to convert from .ppm a .jpg with ImageMagick.

Question is, is there any way to now insert references to these images in the right places in the output.txt document? Or, is there a way to combine those two commands so it extracts both text and images and creates links in the text to the images, all at once? Wondering if I have to manually write the parsing code to insert images into the text myself.

preguntado el 04 de julio de 12 a las 01:07

Maybe you can use pdftohtml from poppler. Then you have linked your images automatically. -

0 Respuestas

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.