Xavid (kihou) wrote,

Of Cabbages and PDFs

I just talked about HGing my first Chuubo’s game on Sunday. This post is not really about that. It’s about my ridiculous manic PDF hackery to prepare materials for it.

The deal is that the Chuubo’s books, which I have in PDF, have all sorts of helpful boxes explaining things that would be useful to reference during play, but there aren’t official cheatsheets bringing them all together. I could, obviously, have copied stuff into a new document and formatted it myself. But I instead decided I wanted to combine the actual boxes and passages with the styling from the book. Because, pretty typesetting and icons.

I used the real Adobe Acrobat for my O-umajirushi translation. It had two key features that I needed: the ability to batch-downsample high-resolution images to an appropriate resolution, and the ability to mess with CMYK ink levels to avoid exceeding IngramSpark’s limits. (The actual typesetting was, of course, LaTeX; this was for post-processing.) However, its interface really sucks, particularly for small-scale ad-hoc editing, and my free trial had expired. I mean, I could’ve gone to the NMC at MIT or something. And maybe if I’d thought it’d not be obnoxious, I would’ve.

Historically, when I’ve wanted to mess with PDFs, I’ve mostly just used Preview, the PDF viewer that comes with OS X. Beyond just a viewer, it lets you add text or shapes to PDFs and also to crop things, both of which are pretty handy. But it doesn’t let you combine multiple PDFs onto a single page except by the hack of printing something N-to-a-page, which doesn’t work well when your boxes are different sizes/proportions. So it wasn’t quite enough here.

FormulatePro is an, unfortunately abandoned, open-source Mac program that I’ve used in the past for putting together SCA heraldic device submissions. It does some of the same stuff as Preview with a worse interface, but has the key feature of letting you paste a PDF on top of another PDF. So I could crop stuff with Preview and then paste them back together with FormulatePro. And that actually would’ve worked perfectly, as long as I cropped very precisely, if Chuubo’s had all rectangles. But with non-rectangular boxes, I had dorky bits of background sticking out places, and obviously this was a very important thing to address.

I knew that PDFs are composed of discrete objects and that one can theoretically, say with Acrobat, delete them individually. But, still bearish on Acrobat, I decided to see what other, cheaper or better-interfaced PDF tools there were for deleting objects, or at least backgrounds. There are a variety of tools that claim to be able to do this, but most of the ones I tried didn’t actually work, or at least not on the particular PDFs I had. Honorable mention goes to PDFpen, which let me delete some stuff but had a confusing interface and seemed to only let me select some objects for reasons I didn't really understand.

(The other honorable mention goes to Inkscape, which is great at editing graphical PDFs, and I did maybe use a little for pointless graphical purposes. It doesn’t losslessly edit PDFs with text, though; it messed up text too badly to use it for these boxes. It is free and open source and generally awesome for vector graphics.)

So, as you do, I thought, “PDF is sorta vaguely a text format. How hard can it be to write up a python script to remove objects?” Turns out, actually, not that hard. I coded up something that would list the objects in a PDF and let me delete them by id. Combined with Preview as a viewer auto-updating to let me see the effects of my changes, this actually worked rather well.

Except then, when I went to paste the boxes in, instead of having the image background from before, there was a white rectangular background. Turns out that this is just a white rectangle that was added as the default full-page background by whatever layout program. Now, you might think that this is just another object to remove. However, in PDFs for drawing shapes or text you have a combined object with Postscript-style commands in a stream, not one object per shape. And these streams are compressed and hard to read. But based on a helpful post I found I learned how to use a program called qpdf to make the PDF more easily editable, and used a basic sed command to strip out rectangles. Success!

And then Jenna released a book with nice official character sheets, and then I wanted to tweak the text labels. Text, like rectangles, is in these streams, but I didn’t want to remove all the text, so I combined my two techniques to have an interactive thing that removed text writing commands. And due to text encodings or something the characters in the file didn’t match the characters that actually show, so it has a mode where you delete something then type in what the actual characters are and it learns the substitution cipher.

Maybe some day I’ll incorporate these techniques into my mythical drawing program, Wind Worker. Or I’ll make some overly fancy system for tracking character sheets online that lets you print them as overly nice PDFs. But for now, I should probably stop obsessing over PDFs.
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened