Skip to content

PDFtoText

PDFtoText is a tool for converting Portable Document Format (PDF) to text.

PDFtoText is available as a module on Apocrita.

Usage

To run the default installed version of PDFtoText, simply load the pdftotext module:

$ module load pdftotext
$ pdftotext --help
pdftotext version X.Y.Z
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -r <fp>           : resolution, in DPI (default is 72)
...(output has been truncated)

For full usage documentation, run pdftotext --help or see the user guide.

Example job

Here is an example job running on 1 core and 1GB memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load pdftotext

# Convert PDF file to text
pdftotext input-file.pdf output-file.txt

References