toon-crystal v0.2.0

Crystal implementation of the Token-Oriented Object Notation(TOON) format

TOON for Crystal

Crystal CI GitHub release License

Token-Oriented Object Notation is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.

This is a Crystal port of the TOON library originally written in TypeScript, and ported from Ruby library.

TOON excels at uniform complex objects โ€“ multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.

Why TOON?

AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money โ€“ and standard JSON is verbose and token-expensive:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

TOON conveys the same information with fewer tokens:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Format Comparison

Format familiarity matters as much as token count.

  • CSV: best for uniform tables.
  • JSON: best for non-uniform data.
  • TOON: best for uniform complex (but not deeply nested) objects.

TOON switches to list format for non-uniform arrays. In those cases, JSON can be cheaper at scale.

Key Features

  • ๐Ÿ’ธ Token-efficient: typically 30โ€“60% fewer tokens than JSON
  • ๐Ÿคฟ LLM-friendly guardrails: explicit lengths and field lists help models validate output
  • ๐Ÿฑ Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
  • ๐Ÿ“ Indentation-based structure: replaces braces with whitespace for better readability
  • ๐Ÿงบ Tabular arrays: declare keys once, then stream rows without repetition

Installation

Add this to your shard.yml:

dependencies:
  toon:
    github: mamantoha/toon-crystal

Then run:

shards install

Quick Start

require "toon"

data = {
  "user" => {
    "id" => 123,
    "name" => "Ada",
    "tags" => ["reading", "gaming"],
    "active" => true,
    "preferences" => [] of String
  }
}

puts Toon.encode(data)

Output:

user:
  id: 123
  name: Ada
  tags[2]: reading,gaming
  active: true
  preferences[0]:

You can also decode TOON back to Crystal values:

toon = <<-TOON
  user:
    id: 123
    name: Ada
    tags[2]: reading,gaming
    active: true
    preferences[0]:
  TOON

value = Toon.decode(toon)
# => {"user" => {"id" => 123, "name" => "Ada", "tags" => ["reading", "gaming"], "active" => true, "preferences" => []}}

Canonical Formatting Rules

TOON formatting is deterministic and minimal:

  • Indentation: 2 spaces per nesting level.
  • Lines:
    • key: value for primitives (single space after colon).
    • key: for nested/empty objects (no trailing space on that line).
  • Arrays:
    • Delimiter encoding: Comma delimiters are implicit in array headers (e.g., tags[3]:, items[2]{id,name}:). Tab and pipe delimiters are explicitly shown in array headers (e.g., tags[3|]:, items[2 ]{id name}:).
    • Primitive arrays inline: key[N]: v1,v2 (comma) or key[N<delim>]: v1<delim>v2 (tab/pipe).
    • Tabular arrays: key[N]{f1,f2}: โ€ฆ (comma) or key[N<delim>]{f1<delim>f2}: โ€ฆ (tab/pipe).
    • List items: two spaces, hyphen, space (" - โ€ฆ").
  • Whitespace invariants:
    • No trailing spaces at end of any line.
    • No trailing newline at end of output.

Format Overview

Objects

Simple objects with primitive values:

Toon.encode({
  "id" => 123,
  "name" => "Ada",
  "active" => true
})
id: 123
name: Ada
active: true

Nested objects:

Toon.encode({
  "user" => {
    "id" => 123,
    "name" => "Ada"
  }
})
user:
  id: 123
  name: Ada

Arrays

Tip: TOON includes the array length in brackets (e.g., items[3]). When using comma delimiters (default), the delimiter is implicit. When using tab or pipe delimiters, the delimiter is explicitly shown in the header (e.g., tags[2|] or [2 ]). This encoding helps LLMs identify the delimiter and track the number of elements, reducing errors when generating or validating structured output.

Primitive Arrays (Inline)

Toon.encode({ "tags" => ["admin", "ops", "dev"] })
tags[3]: admin,ops,dev

Arrays of Objects (Tabular)

When all objects share the same primitive fields, TOON uses an efficient tabular format:

Toon.encode({
  "items" => [
    { "sku" => "A1", "qty" => 2, "price" => 9.99 },
    { "sku" => "B2", "qty" => 1, "price" => 14.5 }
  ]
})
items[2]{sku,qty,price}:
  A1,2,9.99
  B2,1,14.5

API

Toon.encode(value, *, indent = 2, delimiter = ',', length_marker = false)

Converts any value to TOON format.

Parameters:

  • value โ€“ Any value to encode (Hash, Array, primitives, or nested structures)
  • indent โ€“ Number of spaces per indentation level (default: 2)
  • delimiter โ€“ Delimiter for array values and tabular rows: ',', '\t', or '|' (default: ',')
  • length_marker โ€“ Optional marker to prefix array lengths: '#' or false (default: false)

Returns:

A TOON-formatted string with no trailing newline or spaces.

Examples:

# Basic usage
Toon.encode({ "id" => 1, "name" => "Ada" })
# => "id: 1\nname: Ada"

# Tabular arrays
items = [
  { "sku" => "A1", "qty" => 2, "price" => 9.99 },
  { "sku" => "B2", "qty" => 1, "price" => 14.5 }
]
Toon.encode({ "items" => items })
# => "items[2]{sku,qty,price}:\n  A1,2,9.99\n  B2,1,14.5"

# Custom delimiter (tab)
Toon.encode({ "items" => items }, delimiter: '\t')
# => "items[2	]{sku	qty	price}:\n  A1\t2\t9.99\n  B2\t1\t14.5"

# Length marker
Toon.encode({ "tags" => ["a", "b", "c"] }, length_marker: '#')
# => "tags[#3]: a,b,c"

Toon.decode(input, *, indent = 2, strict = true)

Parses a TOON-formatted string into native Crystal values.

Parameters:

  • input โ€“ TOON-formatted string
  • indent โ€“ Number of spaces per indentation level (default: 2)
  • strict โ€“ Enable validations for indentation, tabs, blank lines, and extra rows/items (default: true)

Returns:

A Crystal value (Nil | Bool | Int64 | Float64 | String | Array | Hash(String, _)).

Examples:

Toon.decode("tags[3]: a,b,c")
# => {"tags" => ["a", "b", "c"]}

Toon.decode("[2]{id}:\n  1\n  2")
# => [{"id" => 1}, {"id" => 2}]

Toon.decode("items[2]:\n  - id: 1\n    name: First\n  - id: 2\n    name: Second")
# => {"items" => [{"id" => 1, "name" => "First"}, {"id" => 2, "name" => "Second"}]}

Development

After checking out the repo, run:

shards install

Run the test suite:

crystal spec

Contributing

  1. Fork it (https://github.com/mamantoha/toon-crystal/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

License

The project is available as open source under the terms of the MIT License.

Credits

This is a Crystal port of the original TOON library by Johann Schopplich.

Repository

toon-crystal

Owner
Statistic
  • 1
  • 0
  • 0
  • 0
  • 1
  • about 18 hours ago
  • October 30, 2025
License

MIT License

Links
Synced at

Fri, 31 Oct 2025 10:37:20 GMT

Languages