pdf-validate

Declarative PDF conformance validator for the ALOLI ISO PDF suite (J5)

= pdf-validate :toc: macro :icons: font

Declarative PDF conformance validator for the ALOLI ISO PDF suite — milestone J5 (see pdf/doc/RATIONALE.adoc). A generic engine evaluates rule sets (YAML, one per profile) against a parsed PDF; every rule traces to its ISO clause, so a report doubles as an audit trail. A focused, veraPDF-style validator restricted to the profiles ALOLI targets.

toc::[]

== Status

The engine parses the PDF via PDF::Reader plus three lower-level analysers (raw-byte scanner, content-stream scanner, font-program parsers) and evaluates a YAML rule set against it. Every clause traces to the open veraPDF encoding of the ISO standard, and coverage is measured against the matching veraPDF 1.30.1 profile.

.Supported profiles [cols="1,3,1",options="header"] |=== | Profile | Standard | Clause coverage vs veraPDF

| pdf-a-1b | PDF/A-1b (ISO 19005-1:2005) | 118/129 = 91 % | pdf-a-2b | PDF/A-2b (ISO 19005-2:2011) | 143/144 = 99 % | pdf-a-3b | PDF/A-3b (ISO 19005-3:2012) | 145/146 = 99 % | pdf-ua-1 | PDF/UA-1 (ISO 14289-1) | per-element checks |===

Discipline: every rule is validated against the real veraPDF oracle with zero false positives on conformant corpora. Where a rule's exact semantics cannot be confirmed against veraPDF, it is not shipped but documented as a principled exclusion (see doc/iso-19005-2-gap-analysis.adoc).

== Installation

[source,yaml]

dependencies: pdf-validate: github: aloli-crystal/pdf-validate version: "~> 0.1.0"

== Library usage

[source,crystal]

require "pdf-validate"

report = PDF::Validate.file("doc.pdf", profile: "pdf-a-2b") puts report # human-readable puts report.to_json # machine-readable report.conformant? # => true / false report.failures # => Array(Result) of failed rules

== CLI

[source,console]

$ pdf-validate doc.pdf -p pdf-a-2b $ pdf-validate doc.pdf --json $ pdf-validate help

Exit codes : 0 conformant, 1 violations found, 2 usage / IO error.

== Adding rules

A rule set is a YAML list. Each rule names a primitive from the engine's check vocabulary and traces to its ISO clause:

[source,yaml]

  • id: pdfa2-6.2.2-output-intent clause: "ISO 19005-2 § 6.2.2" title: "Document declares an OutputIntent" severity: error check: catalog_key_present args: ["OutputIntents"]

Check vocabulary (current) : catalog_key_present, catalog_key_absent, trailer_key_present, trailer_key_absent, xmp_contains, xmp_matches. New ISO requirements are expressed by adding primitives in src/pdf-validate/checks.cr.

== License

MIT. See LICENSE.

Repository

pdf-validate

Owner
Statistic
  • 0
  • 0
  • 0
  • 1
  • 2
  • about 8 hours ago
  • June 2, 2026
License

MIT License

Links
Synced at

Thu, 25 Jun 2026 06:32:47 GMT

Languages