Optimize the PDF
By default, Asciidoctor PDF does not optimize the PDF it generates or compresses its streams. This page covers several approaches you can take to optimize your PDF.
If you’re creating a PDF for Amazon’s Kindle Direct Publishing (KDP), GitLab repository preview, or other online publishers, you’ll likely need to optimize the file before uploading.
In their words, you must tidy up the reference tree and flatten all transparencies (mostly likely referring to images).
If you don’t do this step, the platform may reject your upload or fail to display it properly.
(For KDP, -a optimize works best.
For GitLab repository preview, either -a optimize or hexapdf optimize will do the trick.)
|
Enable stream compression
A simple way to reduce the size of the PDF file is to enable stream compression (using the FlateDecode method).
You can enable this feature by setting the compress
attribute on the document:
$ asciidoctor-pdf -a compress filename.adoc
For a more thorough optimization, you can use the integrated optimizer or hexapdf.
rghost
Asciidoctor PDF also provides a flag (and bin script) that uses Ghostscript (via rghost) to optimize and compress the generated PDF with minimal impact on its quality.
You must have Ghostscript (command: gs
) and the rghost
gem installed to use it.
Install rghost
To install the rghost gem, open a terminal and type the following command.
$ gem install rghost
Convert and optimize
Here’s an example usage that converts your document and optimizes it:
$ asciidoctor-pdf -a optimize filename.adoc
The command will generate an optimized PDF file that is compliant with the PDF 1.4 specification.
If this command fails because the gs
command cannot be found, you’ll need to set it using the GS
environment variable.
On Windows, this step is almost always required since the Ghostscript installer does not install the gs
command into a standard location.
Here’s an example that shows how you can override the gs
command path:
$ GS=/path/to/gs asciidoctor-pdf -a optimize filename.adoc
You’ll need to use the technique for assigning an environment variable that’s relevant for your system.
In addition to optimizing the PDF file, you can also configure the optimizer to convert the document from standard PDF to PDF/A or PDF/X.
To do so, you can pass one of the following compliance keywords in the value of the optimize attribute: PDF/A
, PDF/A-1
, PDF/A-2
, PDF/A-3
, PDF/X
, PDF/X-1
, or PDF/X-3
.
$ asciidoctor-pdf -a optimize=PDF/A filename.adoc
The one limitation of generating an optimized file is that it does not allow non-ASCII characters in the document metadata fields (i.e., title, author, subject, etc.).
To work around this limitation, you can force Ghostscript to generate a PDF 1.3 file using the pdf-version
attribute (or you can generate a PDF/X document):
$ asciidoctor-pdf -a optimize -a pdf-version=1.3 filename.adoc
Downgrading the PDF version may break the PDF if it contains an image that uses color blending or transparency. Specifically, the text on the page can become rasterized, which causes links to stop working and prevents text from being selected. If you’re in this situation, it might be best to try hexapdf instead. |
If you’re looking for a smaller file size, you can try reducing the quality of the output file by passing a quality keyword to the optimize
attribute (e.g., --optimize=screen
).
The optimize
attribute accepts the following keywords: default
(default, same if value is empty), screen
, ebook
, printer
, and prepress
.
Refer to the Ghostscript documentation to learn what settings these presets affect.
$ asciidoctor-pdf -a optimize=prepress filename.adoc
To combine the quality and compliance, you separate the keywords using a comma, with the quality keyword first:
$ asciidoctor-pdf -a optimize=prepress,PDF/A filename.adoc
If you’ve already generated the PDF, and want to optimize it directly, you can use the bin script:
$ asciidoctor-pdf-optimize filename.pdf
The command will overwrite the PDF file with an optimized version.
You can also try reducing the quality of the output file using the --quality
flag (e.g., --quality screen
).
The --quality
flag accepts the following keywords: default
(default), screen
, ebook
, printer
, and prepress
.
In both cases, if a file is found with the extension .pdfmark
and the same rootname as the input file, it will be used to add metadata to the generated PDF document.
This file is necessary when using versions of Ghostscript < 8.54, which did not automatically preserve this metadata.
You can instruct the converter to automatically generate a pdfmark file by setting the pdfmark
attribute (i.e., -a pdfmark
)
When using a more recent version of Ghostscript, you do not need to generate a .pdfmark
file for this purpose.
The asciidoctor-pdf-optimize is not guaranteed to reduce the size of the PDF file.
It may actually make the PDF larger.
You should probably only consider using it if the file size of the original PDF is several megabytes.
|
If you have difficulty getting the rghost
gem installed, or you aren’t getting the results you expect, you can try the optimizer provided by hexapdf instead.
hexapdf
Another option to optimize the PDF is hexapdf (gem: hexapdf, command: hexapdf). Before introducing it, though, it’s important to point out that its license is AGPL. If that’s okay with you, read on to learn how to use it.
Compress a PDF
You can then use it to optimize your PDF as follows:
$ hexapdf optimize --compress-pages --force filename.pdf filename.pdf
This command does not manipulate the images in any way. It merely compresses the objects in the PDF and prunes any unreachable references. But given how much waste Prawn leaves behind, this turns out to reduce the file size substantially.
You can hook this command directly into the converter by providing your own implementation of the Optimizer
class.
Start by creating a Ruby file named optimizer-hexapdf.rb, then populate it with the following code:
require 'hexapdf/cli'
class Asciidoctor::PDF::Optimizer
def initialize(*)
app = HexaPDF::CLI::Application.new
app.instance_variable_set :@force, true
@optimize = app.main_command.commands['optimize']
end
def optimize_file path
options = @optimize.instance_variable_get :@out_options
options.compress_pages = true
#options.object_streams = :preserve
#options.xref_streams = :preserve
#options.streams = :preserve # or :uncompress
@optimize.execute path, path
nil
rescue
# retry without page compression, which can sometimes fail
options.compress_pages = false
@optimize.execute path, path
nil
end
end
To activate your custom optimizer, load this file when invoking the asciidoctor-pdf
using the -r
flag and set the optimize
attribute as well using the -a
flag.
$ asciidoctor-pdf -r ./optimizer-hexapdf.rb -a optimize filename.adoc
Now you can convert and optimize all in one go.
To see more options that hexapdf optimize
offers, run:
$ hexapdf help optimize
For example, to make the source of the PDF a bit more readable (though less optimized), set the stream-related options to preserve
(e.g., --streams preserve
from the CLI or options.streams = :preserve
from the API).
You can also disable page compression (e.g., --no-compress-pages
from the CLI or options.compress_pages = false
from the API).
hexapdf also allows you to add password protection to your PDF, if that’s something you’re interested in doing.
Rasterizing the PDF
Instead of optimizing the objects in the vector PDF, you may want to rasterize the PDF instead. Rasterizing the PDF prevents any of the text or other objects from being selected, similar to a scanned document.
Asciidoctor PDF doesn’t provide built-in support for rasterizing the generated PDF. However, you can use Ghostscript to flatten all the text in the PDF, thus preventing it from being selected.
$ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dNoOutputFonts -r300 -o output.pdf input.pdf
You can adjust the value of the -r
option (the density) to get a higher or lower quality result.
Alternately, you can use the convert
command from ImageMagick to convert each page in the PDF to an image.
$ convert -density 300 -quality 100 input.pdf output.pdf
Yet another option is to combine Ghostscript and ImageMagick to produce a PDF with pages converted to images.
$ gs -dBATCH -dNOPAUSE -sDEVICE=png16m -o /tmp/tmp-%02d.png -r300 input.pdf convert /tmp/tmp-*.png output.pdf rm -f /tmp/tmp-*.png
Using Ghostscript to handle the rasterization produces a much smaller output file. The drawback of using Ghostscript in this way is that it has to use intermediate files.