PDF documents are commonly used to hold lengthy amounts of text, especially for formal matters like contracts or terms and conditions. These PDF documents can prove unwieldy in certain scenarios, since a PDF reader application is required to open them, and a PDF editor must be used for changing the contents.
In many cases, a plain text file is just easier to work with. Luckily, we can easily convert the text of a PDF into a normal plain text file on the Linux command line. In this tutorial, you will learn how to extract the text from a PDF document on a Linux system.
In this tutorial you will learn:
- How to install the
pdftotext
command on all major Linux distros - How to use the
pdftotext
command to extract text from PDF

Category | Requirements, Conventions or Software Version Used |
---|---|
System | Any Linux distro |
Software | pdftotext |
Other | Privileged access to your Linux system as root or via the sudo command. |
Conventions |
# – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
Install pdftotext command on major Linux distros
We are able to use the
pdftotext
Linux command in order to extract the text from a PDF document. This command is normally installed by default, but if not, it is provided by the Poppler software package. You can use the appropriate command below to install pdftotext with your system’s package manager.
To install pdftotext on Ubuntu, Debian, and Linux Mint:
$ sudo apt install poppler-utils
To install pdftotext on Fedora, CentOS, AlmaLinux, and Red Hat:
$ sudo dnf install poppler
To install pdftotext on Arch Linux and Manjaro:
$ sudo pacman -S poppler
pdftotext Command Examples
Be aware that
pdftotext
will only extract text that has been stored as text. If your PDF document contains scanned images (JPG files, for example), then pdftotext
does not support OCR and will not be able to extract any text. - Use the
pdftotext
command followed by your PDF document file name as an argument.
$ pdftotext document.pdf
Your text file will be created with the same file name, just a
.txt
extension. In other words,document.pdf
would have its text extracted into thedocument.txt
text file. - If you only want to extract text from a certain range of pages, we can use the
-f
and-l
options to specify the first page and the last page that we want to extract, respectively (and all pages in between). For example, to extract all text from page 3 to page 9:$ pdftotext -f 3 -l 9 document.pdf
Our
document.txt
plain text file will now contain all the same text from pages 3 to 9. - If you find that your plain text file is structured oddly, you can tell
pdftotext
to maintain the original layout as much as possible by supplying the-layout
option. By default,pdftotext
will try to undo certain structuring like columns, which do not translate nicely to plain text.$ pdftotext -layout document.pdf
- There are some other options available, but these would only cover niche scenarios and do not require much elaboration. See the help page for a full list with this command:
$ pdftotext -h
Or more explanation of available options…
$ man pdftotext
Closing Thoughts
In this tutorial, we saw how to extract text from a PDF document on a Linux system. This involved the installation of the
pdftotext
command, which is the must-have utility on Linux for a task like extracting text from PDF files. We also learned how to change the output structure in our text files, since retaining the same layout from PDF to plain text is often impossible.