dms-cr

DMS

dms-cr

Crystal parser for DMS, a data syntax with strong typing, ordered maps, multi-line heredocs, and front-matter metadata.

This shard is a line-for-line port of the Python reference (dms-py), which itself follows the Rust reference (dms-rs). All seven ports check against the same fixture corpus.

What DMS looks like

A medium-size tier-0 document, exercising every feature you'd touch in a real config — front matter, comments (line + trailing), nested tables, list-of-tables with the + marker, flow forms, distinct types, and a heredoc with a trim modifier:

+++
title:    "DMS feature tour"
version:  "1.0.0"
updated:  2026-04-24T09:30:00-04:00
+++

# Hash and // line comments both work.
// Bare keys allow full Unicode; quoted keys take any string.

database:
  host:    "db.internal"
  port:    5432            # bumped after the LB change
  pool:    { size: 10, idle_timeout_s: 30 }   # flow table

servers:
  + name: "web1"
    disks:
      + mount: "/"
        size_gb: 100
      + mount: "/var"
        size_gb: 500
  + name: "web2"

regions: ["us-east-1", "eu-west-1", "ap-south-1"]

sql: """SQL _trim("\n", ">")
    SELECT id, email
      FROM users
     WHERE active = true
    SQL

Tier 1 layers structured decorators on top of the value tree. Sigils bind to families published by a dialect; here is dms+html carrying an HTML fragment as a DMS document:

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
+++

+ |html(lang: "en")
  + |head
    + |title "DMS feature tour"
    + |meta(charset: "UTF-8")
  + |body(class: "main")
    + |h1 "Welcome to DMS"
    + |p(class: "lede")
      + "Click "
      + |a(href: "/spec.html") "here"
      + " to read the spec."

Full feature tour, format comparison, and dialect index on the DMS website.

Requirements

  • Crystal 1.10 or newer

Install

In your shard.yml:

dependencies:
  dms:
    gitlab: flo-labs/pub/dms-cr
    version: ~> 0.5.2

Then:

shards install

Usage

require "dms"

src = File.read("config.dms")

# Body-only (drops front matter and comments after decode).
body = Dms.decode(src)

# Full document (preserves comments + literal forms for `encode` round-trip).
doc  = Dms.decode_document(src)
doc.meta            # Dms::Table | Nil  — nil when there is no `+++` block
doc.body            # the decoded root value (Dms::Value)
doc.comments        # Array(Dms::AttachedComment)
doc.original_forms  # round-trip side-channel records

# Re-emit DMS source. Raises Dms::EncodeError if doc carries an
# UnorderedTable in full mode (use encode_lite for that case).
out = Dms.encode(doc)

Front-matter-only decode

For callers that need only the document's metadata — config loaders checking _dms_tier, indexers harvesting user keys, dispatchers choosing a downstream decoder — Dms.decode_front_matter parses the +++ ... +++ block and stops, leaving body bytes untokenized. SPEC tier 0 requires this entry point. Validation inside the FM block is identical to a full decode (open/close on their own lines, _-prefix namespace enforced, unterminated FM is an error); body errors are silently skipped.

case meta = Dms.decode_front_matter(src)
when Nil
  # document has no `+++` front-matter block
when Dms::Table
  # empty Hash => present-but-empty FM (`+++\n+++`),
  # distinguishable from `nil` above.
  if (title = meta["title"]?).is_a?(String)
    puts "title: #{title}"
  end
end

The pre-v0.14 names (Dms.parse, Dms.parse_document, Dms.to_dms, Dms.to_dms_lite, …) remain as @[Deprecated] wrappers and continue to work; new code should use decode / encode.

Public API

Top-level entry points on Dms:

Method Purpose
Dms.decode(src) Body-only decode → Dms::Value
Dms.decode_document(src) Full decode (body + meta + comments + forms)
Dms.decode_lite(src) Body-only, no comment/form sidecar
Dms.decode_lite_document(src) Lite full decode
Dms.decode_document_unordered(src) Full decode with UnorderedTable (HashMap-style)
Dms.decode_lite_document_unordered Lite + unordered
Dms.decode_front_matter(src) FM-only, body untokenized → Dms::Table?
Dms.encode(doc) Re-emit DMS source (full round-trip)
Dms.encode_lite(doc) Re-emit canonical form (lossy: drops comments/forms)
Dms::Tier1.decode_t1(src) Tier-1 decode → Dms::DocumentT1
Dms::ConformanceEncoder.encode(doc) DMS → tagged-JSON for the conformance runner

Capability flags: Dms::SUPPORTS_LITE_MODE, Dms::SUPPORTS_IGNORE_ORDER.

Value shape

DMS type Crystal type
bool Bool
integer Int64
float Float64
string String
local-date Dms::LocalDate
local-time Dms::LocalTime
local-datetime Dms::LocalDateTime
offset-datetime Dms::OffsetDateTime
table Dms::Table (= Hash(String, Dms::Value))
list Dms::List (= Array(Dms::Value))
unordered table Dms::UnorderedTable (subclass of Table)

The Dms::Value alias unions every variant above. Datetime structs (Dms::LocalDate and friends) wrap the source lexeme as a String — already SPEC-validated by the parser, so you never re-parse to inspect them. Tables use Crystal's insertion-ordered Hash. UnorderedTable is a marker subclass: pattern-match it before Hash in case … when ordering, since the more-specific subtype must win.

Working with comments and original forms

DMS preserves comments through decode → mutate → re-emit (SPEC §Comments). The Document carries them on a side-channel keyed by breadcrumb path; the same shape lets you attach a comment to a value after decoding and have it round-trip through encode:

require "dms"

doc = Dms.decode_document("db:\n  port: 8080\n")

# Mutate a value in place.
if (t = doc.body).is_a?(Dms::Table)
  if (db = t["db"]?).is_a?(Dms::Table)
    db["port"] = 5432_i64
  end
end

# Attach a leading line comment to db.port.
doc.comments << Dms::AttachedComment.new(
  Dms::Comment.new("# bumped after LB change", Dms::CommentKind::Line),
  Dms::CommentPosition::Leading,
  ["db".as(Dms::PathSeg), "port".as(Dms::PathSeg)],
)

puts Dms.encode(doc)

Forcing a heredoc on emit

Strings parse and re-emit in their source form. To switch a basic-quoted string to a heredoc (or to construct one from scratch), push an OriginalLiteral.string(...) record onto doc.original_forms keyed by the value's path:

form = Dms::StringForm.heredoc(
  Dms::HeredocFlavor::BasicTriple,    # or LiteralTriple for '''
  nil,                                 # label, e.g. "EOF"
  [] of Dms::HeredocModifierCall,      # _trim(...), _fold_paragraphs(), …
)
doc.original_forms << {
  ["db".as(Dms::PathSeg), "greeting".as(Dms::PathSeg)],
  Dms::OriginalLiteral.string(form),
}

Round-trip rules (SPEC §Round-trip semantics): comments stick to still-present nodes; deleting a node drops its comments; newly inserted nodes start with no comments. The first original_forms entry per path wins, so override the parser-recorded form by replacing rather than appending if the key is already present.

Tier 1: decorators and dialects

Tier-1 source carries dialect imports + decorator calls (|tag(...), @expr(...), etc.). See TIER1.md for the full spec. dms-cr currently ships a tier-1 batch decoder; the encoder side is tracked for a future release.

src = File.read("page.dms.html")

doc = Dms::Tier1.decode_t1(src)
doc.t0          # Dms::Document — the underlying tier-0 tree
doc.imports     # Array(Dms::ImportSpec)
doc.decorators  # Array(Dms::DecoratorEntry) — sidecar keyed by path

Errors

Decode-side failures raise Dms::DecodeError, which carries one-based line and column getters and formats its message as line:col: message:

begin
  doc = Dms.decode_document(src)
rescue e : Dms::DecodeError
  STDERR.puts "parse failed at #{e.line}:#{e.column}: #{e.message}"
end

Encode-side failures raise Dms::EncodeError — currently raised only by full-mode encode when the input Document carries an UnorderedTable (those have arbitrary iteration order, so a stable round-trip cannot be promised). Use Dms.encode_lite for canonical emit on unordered Documents.

The pre-v0.3.0 name Dms::ParseError survives as a deprecated alias of Dms::DecodeError.

When to use which decoder

Goal Entry point
Read config, no re-emit Dms.decode
Read + re-emit, preserving comments / heredoc form Dms.decode_document + Dms.encode
Read only the FM block (dispatch, schema check, index) Dms.decode_front_matter
Tier-1 source (decorators, dialect imports) Dms::Tier1.decode_t1
Speed over round-trip fidelity Dms.decode_lite / decode_lite_document
Don't care about table order (HashMap-ish) Dms.decode_document_unordered

Build & test

shards install         # pulls the toml dep used by the bench harness
shards build           # produces bin/dms-encoder + bench targets
crystal spec           # runs the spec suite

Build targets declared in shard.yml:

Target Source Purpose
dms-encoder src/dms-encoder.cr DMS → tagged-JSON conformance encoder
bench-parse-dms bench/parse_dms.cr Parse-only DMS micro-benchmark
bench-parse-json bench/parse_json.cr Parse-only JSON benchmark (baseline)
bench-parse-yaml bench/parse_yaml.cr Parse-only YAML benchmark (baseline)
bench-parse-toml bench/parse_toml.cr Parse-only TOML benchmark (baseline)
bench-formats-cr bench/bench_formats.cr Cross-format wall-clock comparison

Conformance

The fixture corpus lives in dms-tests (4500+ pairs). Clone it once as a sibling:

cd ..
git clone https://gitlab.com/flo-labs/pub/dms-tests.git

Then build the encoder and run the sweep:

shards build --release dms-encoder
python3 ../dms-tests/run_conformance.py bin/dms-encoder

The dms-encoder binary reads DMS from stdin and writes tagged JSON to stdout, matching the format the conformance runner consumes. dms-tests can also drive every implementation in one shot — see its README for the cross-language workflow.

Companion projects

Repo / shard Purpose
dms Spec, fixtures index, and the dialect registry
dms-tests Cross-language conformance corpus + runner
dms-rs Rust reference implementation
dms-py Python reference (this port follows it line-for-line)

SPEC compliance

Every tier-0 feature in SPEC.md is implemented and exercised by the dms-tests corpus. Behavioural drift between ports is caught at the conformance gate, not at runtime. Tier-1 (decorators / dialects) is partially implemented — batch decode is shipped; encode is in progress.

License

Dual-licensed at your option:

Repository

dms-cr

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 0
  • 15 days ago
  • April 27, 2026
License

Apache License 2.0

Links
Synced at

Sun, 10 May 2026 04:08:29 GMT

Languages