XYplorer: Quick PDF Processing

06 Apr 2018

In my work, I work with PDFs on a daily basis. Typically I’m generating them as documentation, or reading documentation. But I also occasionally work with PDFs in different ways. Pulling out pages, replacing them with others, or building entirely new ones. When I’m doing that on my work laptop running windows, I fall back to using the PDF Toolkit utility. Whilst there are visual interfaces to use this tool I most often identify a need to do something with a PDF when I’m working with the file itself. For that reason I’ve created some XYplorer scripts that make accessing common PDF manipulations even easier.

To utilise these scripts, you’ll need a copy of the free PDF Toolkit utility, PDFtk, and of course XYplorer.

I have a PDF script library where I keep the scripts and then access them from a toolbar button which automatically builds a menu of the scripts and I can choose which one to run.

First of all I created a function that returns the path to the PDFtk executable. This means that if I move the EXE, I only have one place to modify the location, I don’t need to modify it in several locations. Also by making it a simple function, if I want to do something more sophisticated with the path I can. For example I could have severalPDF utilities in the future that may all share the same base path and so I may build the paths based on multiple functions/variables.

function get_pdftkpath() {
	return "C:\Command-Line-Tools\pdftk\pdftk.exe";
}

Combining PDFs

The script starts with a small bit of validation. It checks that there are at least two files currently selected. It doesn’t currently check if they are all PDF, but that might be an enhancement to implement at some point in the future; though honestly, I’m pretty good at just selecting the PDF files I want to use when using this script.

If two or more files are selected, the combination processing takes place. The process for combining PDFs is relatively straight forward. PDFtk is passed a list of the PDFs to be combined as file paths and an output path is specified. I’ve configured the output path to always be in the same directory as the source PDFs and it outputs the combined pdf in the format yyyy-mm-dd-hh.mm.ss.combined.pdf.

When run, a command window will appear while the PDFtk command line tool is running. It is usually quite brief, but the duration will vary based on the amount of PDF data being combined, and the speed of the PC being used.

//Combine multiple PDFs into a single PDF
"Concatenate PDFs"
	if (get("CountSelected") < 2) {
		msg "You must select at least two PDF files to combine";
		status "PDF Combination Could Not Be Completed", "000000", "alert";
	}
	else {
		$pdfSuffix = ".combined";
		run  '"' . get_pdftkpath() . '" ' . <selitems> . ' output "' . <curpath> . '\' . <date yyyy-mm-dd_hh_nn_ss> . $pdfSuffix . '.pdf"';
	}

Splitting PDFs

The inverse operation of splitting a PDF is s little more involved. The script is configured to only allow one PDF to be split at once. While the script could be enhanced to handle multiple files in one batch, I’ve never had a need to do this and so I have it set to operate on just one.

In PDFtk, splitting a PDF is known as “bursting”, and so we use the bust argument on the command line to trigger that. The script will output one page PDFs in the same directory as the original PDF, and produce a data file about the burst. Since I don’t use that, I then remove that file, and the result is to just have the split PDFs.

//Split PDF into a PDF per page
"Burst PDF"
	if (get("CountSelected") < 1) {
		msg "You must select one PDF file to split";
		status "PDF Burst Could Not Be Completed", "000000", "alert";
	}
	elseif (get("CountSelected") > 1) {
		msg "You must select only one PDF file at a time to split";
		status "PDF Burst Could Not Be Completed", "000000", "alert";
	}
	else {
		status "Splitting PDF file...", "000000", "progress";
		run  '"' . get_pdftkpath() . '" ' . <selitems> . ' burst', , 1;
		//Remove the data file it generates
		delete 1, 0, <curpath> . "\doc_data.txt";
		status "PDF Splitting Complete", "000000", "ready";
	}

Combining Burst PDFs

If I need to remove or reorder pages, I can use the split and combine scripts to do just that by appropriate removal and/or renaming of the single PDF files. I vary between using this technique and an equivalent approach using a PDFtk visual user interface, but for the simpler changes, juts having these available in XYplorer always wins out in terms of ease of access and speed of applying the changes.

Author: Stephen Millard

Tags: | xyplorer |

Buy me a coffee

ThoughtAsylum

XYplorer: Quick PDF Processing

Combining PDFs

Splitting PDFs

Combining Burst PDFs

Related posts that you may also like to read

Automators Episode 68 - Guest

XYplorer: Quick Tips with 7-Zip

XYplorer: Trigger Word Document Compare

XYplorer: Copy Last Screenshot Here