crystal-combine-pdf

PDF page numbering in pure Crystal. A4-aware booklet numbering with partition support. Inspired by Ruby combine_pdf.

= crystal-combine-pdf :toc: macro :toclevels: 2

ISO-compatible Crystal port of the Ruby gem https://github.com/boazsegev/combine_pdf[`combine_pdf`] v1.0.31. Drop-in syntax for Ruby users — CombinePDF.new, CombinePDF.load, CombinePDF.parse plus the << / >> / insert / remove / save / to_pdf / number_pages instance API are wired up and behave the same way.

On top of that we ship two Crystal-specific bonuses on top of the gem :

  • A4-aware page numbering — fixes the upstream US Letter hardcoding bug of number_pages ; coordinates are derived from each page's actual MediaBox.
  • Partition-aware intra-numbering — the one that lets a music-score booklet show "1/4", "2/4" on a 4-page partition while the global "3/12" stays in the bottom-right corner.

🇫🇷 Lisez ce document en français : link:README.fr.adoc[README.fr.adoc]

toc::[]

== Quickstart (mode déclaratif, recommandé pour les livrets)

Trois commandes pour assembler un dossier de PDF en livret :

[source,shell]

cd /chemin/vers/mes-partitions crystal-combine-pdf init # génère .crystal-combine-pdf.yml $EDITOR .crystal-combine-pdf.yml # réordonner, titrer, exclure crystal-combine-pdf # construit mes-partitions.pdf

Voyez la section <<Mode déclaratif>> pour le détail du fichier .crystal-combine-pdf.yml.

== Quickstart (ISO API, mirrors the Ruby gem)

[source,crystal]

require "crystal-combine-pdf"

Build a fresh PDF

pdf = CombinePDF.new

Load + append two existing PDFs

pdf << CombinePDF.load("partition1.pdf") pdf << CombinePDF.load("partition2.pdf")

Optional: prepend a cover page

pdf >> CombinePDF.load("cover.pdf")

Optional: metadata

pdf.title = "Recueil de partitions" pdf.author = "Philippe Nénert"

Optional: number the pages (Crystal-specific bonus)

pdf.number_pages

Save

pdf.save("livret.pdf")

== Why

The Ruby gem combine_pdf is the de facto tool for assembling PDF booklets. Its number_pages helper draws the page number at hardcoded coordinates derived from US Letter (612 × 792 pt). On A4 (595 × 842 pt), Asia A4 or A3 the number lands several centimetres off.

Beyond the format mismatch, an assembled music-score booklet — where each "partition" (sheet) may span 1, 2 or N pages — needs a second layer of numbering: a small "1/4", "2/4", … in the corner of each page of the partition, so the performer knows when to turn the page and how many are left in the current piece.

crystal-combine-pdf does both, in pure Crystal, with the page geometry read from the actual MediaBox of every page.

== Features (v0.1)

  • Global page numbering in the bottom-right corner (format customisable, default "N/T").
  • Intra-partition numbering in the top-left corner (format customisable, default "n/t"), shown only when the partition has more than one page (configurable).
  • A4-aware, Letter-aware, A3-aware — coordinates computed from each page's MediaBox, no per-format flag needed.
  • Skip pages (e.g. covers): --skip 1,2.
  • Pure Crystal, single binary, no pdftk/qpdf/pdftotext on $PATH.

== Roadmap

=== Shipped

  • v1.0.31.1 — ✅ ISO-compatible module + PDF instance API (<<, >>, insert, remove, pages, page_count, new_page, title=, author=, number_pages, save, to_pdf).
  • v1.0.31.2 — ✅ Declarative .crystal-combine-pdf.yml mode (--init, --refresh, --recursive, default-build). Numbering rewrite: 15 cardinal positions + 4 duplex-aware (outer-* / inner-*), 5 styles (plain/badge/circle/square/oval), cover-aware numbering, watermarks via watermark.
  • v1.0.31.3 — ✅ Extended PDF compatibility (xref streams 1.5+, object streams, CCITT scans, JPEGs, LilyPond, …) — every PDF in the wild now readable without qpdf preprocessing.
  • v1.0.31.4 — ✅ Title + clickable TOC page inserted at the front of the booklet. Each TOC line is a /Subtype /Link annotation that jumps to the target partition.

=== Upcoming

  • v1.0.31.5 — PDF bookmarks (/Outlines) + per-file title header.
  • v1.0.31.6 — encryption (40/128/256 bits), secured?.
  • v1.0.31.7 — PAdES-style signatures.
  • (Ongoing) — resource deduplication when merging.

=== Improvement ideas

  • Web app — online interface to assemble booklets without installing Crystal: drag-and-drop PDFs, reorder by mouse, preview, one "Build" button that returns the finished PDF. Likely Kemal or Lucky on the server side with a simple HTML+HTMX front. ALOLI hosting.
  • Standalone desktop app — cross-platform GUI for the same workflow as the CLI, no terminal needed. To investigate: native webview (crystal-webview lib or GTK WebKit), or a ncurses/termbox TUI to keep the single-binary spirit. Starting point worth studying: https://github.com/serge-hulne/Crystal-App-template-for-Windows[serge-hulne/Crystal-App-template-for-Windows] — a Crystal application template targeting Windows, useful for packaging the binary into a distributable desktop app.

== Installation

Add to your shard.yml:

[source,yaml]

dependencies: crystal-combine-pdf: github: aloli-crystal/crystal-combine-pdf version: "~> 1.0.31"

then run shards install.

For the CLI:

[source,shell]

git clone https://github.com/aloli-crystal/crystal-combine-pdf cd crystal-combine-pdf shards build --release cp bin/crystal-combine-pdf ~/bin/ # or use bin-installer

== Mode déclaratif

The recommended way to assemble a booklet of partitions, scores or any list of PDFs.

=== Commands

[source,shell]

crystal-combine-pdf init [-r] [--profile NAME] [--user-config PATH] Initialise .crystal-combine-pdf.yml in the current folder. -r scans subfolders too (parents first, then alpha). --profile : booklet (default) | book | report | slides | minimal --user-config : custom path (defaults to ~/.crystal-combine-pdf.yml)

crystal-combine-pdf refresh [-r] Refresh the files: list of an existing YAML : add new PDFs at the end, comment out entries whose file has disappeared. Preserves comments, current order and inline titles.

crystal-combine-pdf build crystal-combine-pdf No subcommand : reads .crystal-combine-pdf.yml from the current folder, assembles the booklet, applies numbering and watermark, writes the output.

crystal-combine-pdf compress FILE.pdf [-o OUTPUT.pdf | -i] [--deep] Reduce the size of a single PDF (Flate recompression + GC). --deep delegates to ghostscript for image downsampling. -i : in-place, --backup : keep original as .bak.

crystal-combine-pdf -d, --dir DIR Common option : target a different folder than the current.

=== .crystal-combine-pdf.yml example

[source,yaml]

output: mes-partitions.pdf title: "Mes Partitions" author: "Philippe Nénert"

duplex: true # recto-verso

cover: mode: recto-verso # none | recto | recto-verso include_in_numbering: false

numbering: enabled: true global: enabled: true format: "%page%/%total%" style: badge # plain | badge | circle | square | oval position: outer-bottom font_size: 10 color: "#333333" margin: 24 partition: enabled: true format: "%page%/%total%" style: plain position: outer-top font_size: 9 color: "#666666" hide_when_single: true skip_pages: []

watermark:

text: "Confidentiel"

style: diagonal # diagonal | tiled | header | footer | center

font_size: 48

color: "#cccccc"

opacity: 0.15

rotation: 45

Encrypt the OUTPUT booklet (password-protected).

encrypt:

enabled: true

level: aes_256 # rc4_128 | aes_128 | aes_256

user_password: "" # blank = no password to open

owner_password: "" # blank or nil = identical to user

permissions: [print, copy, modify, annotate]

Password tried on every encrypted source PDF (global default).

Override per-entry via the inline mapping syntax in files: below.

input_password: ""

files:

  • couverture.pdf
  • partition1.pdf: "Allegro — C. Debussy"
  • partition2.pdf: "Adagio — F. Chopin"

- brouillon-2025.pdf

Encrypted source PDFs with their own password (inline mapping):

  • {path: confidentiel.pdf, password: secret}
  • {name: backup.pdf, pass: other-pwd, title: "Secured annex"}

[NOTE]

Security: leave passwords empty in the YAML when it's versioned in Git. Pass them on the CLI:

[source,shell]

crystal-combine-pdf -u 'output-pwd' -w 'owner-pwd' # output encryption crystal-combine-pdf -I 'global-source-pwd' # source decryption

CLI flags always override the YAML. For booklets where each source has its own password, use the inline mapping syntax above.

=== Position vocabulary

[cols="1,4"] |=== | Position type | Values

| Static (fixed regardless of page parity) | top-left, top-center, top-right, bottom-left, bottom-center, bottom-right

| Duplex-aware (alternates per page parity) | outer-top, inner-top, outer-bottom, inner-bottom |===

When duplex: true, outer-* resolves to the page side opposite to the spine (right for odd/recto pages, left for even/verso pages). inner-* is the spine side. When duplex: false, outer-*right, inner-*left.

=== Cover semantics

  • cover.mode: none — no cover, all pages numbered (default).
  • cover.mode: recto — 1 cover page at the front and 1 at the back.
  • cover.mode: recto-verso — 2 cover pages at the front and 2 at the back.
  • For asymmetric covers, use cover.front: and cover.back: instead of cover.mode:.

When include_in_numbering: false (default), neither the front nor the back cover pages display a number, and the booklet's first content page is numbered "1". %total% reflects the content total (not including covers).

=== Encryption

The aloli-crystal/pdf shard (≥ 0.5.2) implements the PDF spec's "Standard Security Handler" (ISO 32000-1 and 32000-2 § 7.6.4) for both reading and writing, with no external dependency. This shard exposes those capabilities at three levels: standalone subcommands, YAML section, and CLI flags.

==== Standalone sub-commands

[source,shell]

crystal-combine-pdf encrypt rapport.pdf -u "secret" -l aes_256 crystal-combine-pdf decrypt protege.pdf -u "secret" -i --backup

  • encrypt: encrypt an arbitrary PDF (RC4-128, AES-128 or AES-256).
  • decrypt: symmetric — produces a copy without /Encrypt.

Common options (inherited from compress / gs):

  • -o FILE or -i (in-place) or auto-named output (*-encrypted.pdf, *-decrypted.pdf) if neither is set
  • --backup (with -i): keeps the original as .bak
  • -l rc4_128 | aes_128 | aes_256 (encrypt only, default aes_256)
  • -u USER_PWD user password, -w OWNER_PWD owner password
  • -p LIST permissions (csv: print,copy,modify,annotate; shortcuts none / all. Default: all)

==== Encrypting the OUTPUT of a build

encrypt: section in .crystal-combine-pdf.yml:

[source,yaml]

encrypt: enabled: true # false disables the section level: aes_256 # rc4_128 | aes_128 | aes_256 user_password: "open-me" # empty = no password to open owner_password: "permissions" # empty or nil = identical to user permissions: [print] # absent = everything granted encrypt_metadata: true

Available levels:

[cols="1,1,3"] |=== | Level | V/R | Compatibility

| rc4_128 | 2/3 | Acrobat ≥ 5 (1999), legacy only (RC4 is broken) | aes_128 | 4/4 | Acrobat ≥ 7 (2005), CryptFilter AESV2 | aes_256 | 5/6 | Acrobat ≥ X / PDF 2.0 (2012), default |===

CLI overrides (take precedence over YAML):

[source,shell]

crystal-combine-pdf -u "secret" # output user_password crystal-combine-pdf -w "owner-pwd" # owner_password crystal-combine-pdf -l aes_128 # level crystal-combine-pdf --encrypt # turn on without YAML section crystal-combine-pdf --no-encrypt # disable even if YAML enables it

==== Reading encrypted source PDFs

Global case (one password tried on every encrypted source):

[source,yaml]

input_password: "source-secret" files:

  • confidential-1.pdf
  • confidential-2.pdf

CLI equivalent (avoid storing the password in versioned YAML):

[source,shell]

crystal-combine-pdf -I "source-secret"

or: --input-password=source-secret


Per-file case (each source with its own password) — inline mapping:

[source,yaml]

files:

  • public.pdf # simple entry
  • report.pdf: "Annual report" # with title
  • {path: confidential.pdf, password: "secret-A"} # with password
  • {name: backup.pdf, pass: "secret-B", title: "Annex"}

Aliases accepted:

  • pathnamefile
  • titlelabel
  • passwordpasspwd

Precedence: entry.password (per-file YAML) > input_password (global YAML) > --input-password CLI > empty.

A source PDF's owner password works just as well as the user password (the pdf shard tries both automatically via Algorithm 7 for V<5 and Algorithm 2.A for V=5).

[NOTE]

Security recommendation: if the YAML is versioned (Git), leave the user_password, owner_password, input_password fields and per-file password: entries empty in the file, and pass the secrets via CLI only. For CI/CD workflows, use environment variables and a shell wrapper.

==== Preventing modification (without preventing reading)

That's exactly what permissions do on the encryption side. The PDF is readable by everyone (empty user password), but Acrobat / Foxit / Preview / pdftk refuse any modification until the user enters the owner password.

Recipe:

[source,yaml]

encrypt: enabled: true level: aes_256 user_password: "" # ← empty: free read access owner_password: "secret" # ← required to modify permissions: [print] # ← only printing allowed

Or via CLI:

[source,shell]

crystal-combine-pdf encrypt rapport.pdf -w "secret" -p print

Available permissions:

[cols="1,3"] |=== | Permission | Effect when listed

| print | Print the document (low + high res) | copy | Select / copy text or images | modify | Edit page content | annotate | Add or modify annotations (comments, highlights) |===

CLI shortcuts: none (no permissions), all (every permission — equivalent to omitting the flag).

[IMPORTANT]

Trust model. PDF permissions are honor-system: the spec asks viewers and editors to respect them, and Acrobat/Foxit/Preview/pdftk do. But a determined attacker can strip /Encrypt from a PDF; permissions are not cryptographically enforced.

For cryptographic integrity protection, you need a digital signature (see below).

==== Digital signatures (PAdES) — roadmap

Signatures detect any modification of the PDF using asymmetric cryptography:

. The signer computes a SHA-256 / SHA-384 hash over the PDF bytes (excluding the signature itself). . They encrypt that hash with their private key (PKCS#7 / CMS / CAdES) and embed it in a /Sig dictionary inside the PDF. . Any verifier can recompute the hash and compare it with the hash decrypted using the certificate's public key. . Any single-byte change invalidates the signature — Acrobat shows it in red.

Difference with encryption:

[cols="1,1,1"] |=== | | Encryption (/Encrypt) | Signature (/Sig PAdES) | Prevents reading | ✅ | ❌ | Detects modification | ⚠️ via permissions | ✅ (cryptographic) | Proves authenticity | ❌ | ✅ | Long-term archival | ❌ | ✅ (PAdES B-LTA) |===

PAdES levels (ETSI EN 319 142):

  • B-B — basic signature (detached PKCS#7)
  • B-T — B-B + TSA timestamp
  • B-LT — B-T + long-term validation material (CRL/OCSP)
  • B-LTA — B-LT + archive timestamp (indefinitely renewable)

Current state of aloli-crystal/pdf: reading detects /Sig but doesn't validate the signature; writing doesn't produce one. Support is on the roadmap (see prod-crystal/crystal-combine-pdf/ROADMAP.adoc § "PAdES signatures"). Estimated effort: M for B-B (PKCS#7 via OpenSSL), L for B-T (TSA HTTP), XL for B-LT/LTA (CRL/OCSP collection, archival).

In the meantime: for a PDF that needs to prove its integrity in a legal context, sign it after the build with a third-party tool (gpg --detach-sign, Adobe Acrobat, eSign Suite, etc.).

== Sub-commands historiques

The original number, merge, assemble sub-commands remain available for scripted workflows.

=== assemble — the booklet workflow (most users want this)

[source,shell]

Take three partition PDFs, merge them, number the result with

auto-detected partition sizes (so each partition gets its

intra-partition "1/4", "2/4" marks where relevant).

crystal-combine-pdf assemble
partition1.pdf partition2.pdf partition3.pdf
-o livret.pdf

=== merge — pure concatenation, no numbering

[source,shell]

crystal-combine-pdf merge p1.pdf p2.pdf p3.pdf -o output.pdf

=== number — number an already-assembled PDF

[source,shell]

Default: format "N/T" in the bottom-right corner.

crystal-combine-pdf number booklet.pdf

Same, with intra-partition marks. Partition sizes are given in

1-based page order — the sum must equal the total page count.

Here: pages 1-4 = partition 1 (so each gets "1/4" through "4/4"),

pages 5-6 = partition 2, page 7 = partition 3 (single-page → no

intra-partition mark by default).

crystal-combine-pdf number booklet.pdf --partitions 4,2,1

Skip the cover (page 1) so it stays untouched.

crystal-combine-pdf number booklet.pdf --skip 1

Custom format and red colour.

crystal-combine-pdf number booklet.pdf
--global-format "Page %page% of %total%"
--partition-format "(%page% / %total%)"
--color "0.7,0.0,0.0"
--font-size 12

=== Options

[cols="1,3"] |=== | Option | Description

| -o FILE, --output FILE | Output path. Defaults to <input>-numbered.pdf next to the input.

| --partitions N,N,… | Comma-separated list of partition sizes, in 1-based page order. Sum must equal the PDF page count.

| --skip PAGES | Comma-separated list of 1-based page indices to leave un-numbered (typical: --skip 1 to spare the cover).

| --font-size SIZE | Point size of the rendered numbers. Default 10.

| --margin PT | Inset from the page edge, in points. Default 24.

| --color R,G,B | RGB triplet, components 0.0–1.0. Default 0.2,0.2,0.2.

| --global-format FMT | Format string for the global page number. Placeholders: %page%, %total%. Default "%page%/%total%".

| --partition-format FMT | Format string for the intra-partition number. Same placeholders. Default "%page%/%total%".

| --show-single-partitions | Render the intra-partition number even when the partition has a single page. Off by default.

| -h, --help, -v, --version | Standard. |===

== API

=== ISO-compatible API (mirrors the Ruby gem)

[source,crystal]

require "crystal-combine-pdf"

Module entry points

CombinePDF.new # => CombinePDF::PDF (empty) CombinePDF.load(path : String) # => CombinePDF::PDF (from file) CombinePDF.parse(data : Bytes) # => CombinePDF::PDF (from bytes)

Instance API

pdf << other # append (path or PDF) — chainable pdf >> other # prepend (path or PDF) — chainable pdf.insert(location, other) # location: Int32 (-1 = append, 0 = prepend) pdf.remove(page_index) # negative indices supported pdf.pages # => Array(::PDF::Objects::Reference) pdf.page_count # => Int32 pdf.new_page(mediabox = [0, 0, 612, 792], location = -1) pdf.title = "…" pdf.author = "…" pdf.number_pages(partitions, options) # in-place pdf.save("out.pdf") pdf.to_pdf # => Bytes

=== Crystal-specific bonus helpers

[source,crystal]

One-shot multi-PDF concatenation

CombinePDF.merge(["a.pdf", "b.pdf"], "out.pdf")

Number an already-assembled PDF on disk

CombinePDF.number( input: "booklet.pdf", output: "booklet-numbered.pdf", partitions: [4, 2, 1], options: CombinePDF::Options.new( font_size: 11.0, color: {0.2, 0.2, 0.2}, margin: 24.0, skip_pages: [1], ), )

End-to-end booklet assembly (merge + number with auto-detected

partition sizes — each input file = one partition)

CombinePDF.assemble( inputs: ["partition1.pdf", "partition2.pdf", "partition3.pdf"], output: "livret.pdf", )

== Limitations (v1.0.31.1)

  • No deduplication of common resources when merging — a font or image embedded in N source PDFs is copied N times to the output. Result is correct but slightly larger than an optimising merger would produce. Targeted for v0.3.
  • Numbering is rendered with the standard Type1 Helvetica (no embedded font, no Unicode beyond ASCII digits + /).
  • The output of number uses an incremental update (PDF spec § 7.5.6) ; merge and assemble write a fresh PDF from scratch. Both are handled correctly by pdf::Reader v0.3.4+ and every real-world PDF reader.

== Development

[source,shell]

shards install crystal spec bin/ameba crystal tool format src/ spec/

== License

MIT — see link:LICENSE[LICENSE].

Repository

crystal-combine-pdf

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 4
  • 14 days ago
  • April 25, 2026
License

MIT License

Links
Synced at

Sat, 09 May 2026 13:36:29 GMT

Languages