Markout

Convert HTML to clean, readable Markdown in Crystal. Designed for content migration, static site generators, documentation tools, and LLM data preparation pipelines.

Features

Fast: 25-40x faster than Python alternatives (markdownify, html2text)
Clean output: Strips navigation, scripts, styles, and UI chrome automatically
LLM-ready: Produces clean markdown ideal for RAG pipelines and context windows
Configurable: ATX/Setext headings, bullet styles, reference links, and more
Comprehensive: Tables, nested lists, blockquotes, code blocks, images, links
CLI included: Command-line tool for Unix-style pipelines

Installation

As a Library

Add the dependency to your shard.yml:

dependencies:
  markout:
    github: amscotti/markout

Then run:

shards install

CLI Binary

Download pre-built binaries from the releases page or build from source:

shards build --production
# Binary will be at bin/markout

Or use Docker:

docker pull ghcr.io/amscotti/markout:latest

Quick Start

CLI Usage

The markout CLI follows Unix philosophy - read from stdin or file, write to stdout or file:

# Convert from stdin (pipe)
curl -s https://example.com | markout > article.md

# Convert file to stdout
markout input.html

# Convert file to file
markout input.html -o output.md

# Chain with other tools
cat page.html | markout | grep "^#" > headings.txt

CLI Options

Flag	Description
`-o, --output FILE`	Output file (default: stdout)
`--heading-style=STYLE`	`atx` (#) or `setext` (underlined)
`--bullet-char=CHAR`	`-`, `*`, or `+` (default: `-`)
`--link-style=STYLE`	`inline` or `referenced`
`--strip-document`	Strip HTML document wrapper (default: true)
`--no-strip-document`	Keep document wrapper
`-h, --help`	Show help
`-v, --version`	Show version

CLI Examples

# Fetch HN article and convert
curl -sL https://amplifying.ai/research/claude-code-picks | markout

# Use Setext headings and asterisk bullets
markout --heading-style=setext --bullet-char="*" article.html

Library Usage

require "markout"

html = "<h1>Hello</h1><p>This is <strong>bold</strong> text.</p>"
markdown = Markout.convert(html)
# => "# Hello\n\nThis is **bold** text."

Usage Examples

Basic Conversion

require "markout"

# Simple HTML to Markdown
html = <<-HTML
  <h1>Welcome</h1>
  <p>This is a <strong>test</strong> with a <a href="https://example.com">link</a>.</p>
  <ul>
    <li>Item one</li>
    <li>Item two</li>
  </ul>
HTML

puts Markout.convert(html)
# Output:
# # Welcome
#
# This is a **test** with a [link](https://example.com).
#
# - Item one
# - Item two

With Options

require "markout"

html = "<h1>Title</h1><ul><li>Item</li></ul>"

# Use Setext-style headings and asterisk bullets
options = Markout::Options.new
options.heading_style = Markout::Options::HeadingStyle::Setext
options.bullet_char = '*'

puts Markout.convert(html, options)
# Output:
# Title
# =====
#
# * Item

Reference-Style Links

require "markout"

html = <<-HTML
  <p>Visit <a href="https://example.com">Example</a> and
  <a href="https://test.com">Test</a> for more info.</p>
HTML

options = Markout::Options.new
options.link_style = Markout::Options::LinkStyle::Referenced

puts Markout.convert(html, options)
# Output:
# Visit [Example][1] and [Test][2] for more info.
#
# [1]: https://example.com
# [2]: https://test.com

Processing Web Pages

require "markout"
require "http/client"

# Fetch and convert a web page
response = HTTP::Client.get("https://example.com/article")
markdown = Markout.convert(response.body)

# Navigation, scripts, and styles are automatically stripped
# Only the article content remains

Reusable Converter

require "markout"

# Create a converter instance for multiple documents
options = Markout::Options.new
options.code_fence = "~~~"

converter = Markout::Converter.new(options)

docs = ["<p>Doc 1</p>", "<p>Doc 2</p>", "<p>Doc 3</p>"]
results = docs.map { |html| converter.convert(html) }

Configuration Options

Option	Type	Default	Description
`heading_style`	`HeadingStyle`	`ATX`	`ATX` (#) or `Setext` (underline)
`bullet_char`	`Char`	`'-'`	Bullet character for unordered lists
`emphasis_char`	`Char`	`'*'`	Character for italic text
`strong_char`	`String`	`"**"`	Characters for bold text
`code_fence`	`String`	"```"	Code fence delimiter
`hr_style`	`String`	`"---"`	Horizontal rule style
`link_style`	`LinkStyle`	`Inline`	`Inline` or `Referenced` links
`autolinks`	`Bool`	`true`	Use `<url>` when link text matches URL
`strip_document`	`Bool`	`true`	Strip leading/trailing whitespace

Performance

Benchmarks against Python's markdownify and html2text (599KB Wikipedia page):

Library	Time	Output Size
Markout	7.5ms	166KB
html2text	83ms	204KB
markdownify	203ms	210KB

Markout is typically 10-30x faster than Python alternatives and produces 20-30% smaller output by stripping non-content elements. Reproducible scripts can be found in the benchmarks/ directory.

Use Cases

RAG Pipelines: Extract clean content from web pages for vector databases
Content Migration: Convert HTML documentation to Markdown
LLM Context: Maximize useful content in context windows
Static Sites: Process HTML for Jekyll, Hugo, or other generators
Web Scraping: Clean article extraction from web pages

Development

# Install dependencies
shards install

# Run tests
crystal spec

# Run linter
bin/ameba

Contributing

Fork it (https://github.com/amscotti/markout/fork)
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

Building from Source

Prerequisites

Crystal >= 1.10.0
LLVM (for compilation)

Build Commands

# Install dependencies
shards install

# Build CLI binary
shards build --production

# Run tests
crystal spec

# Run linter
bin/ameba

# Format code
crystal tool format

# Generate documentation
crystal docs

Cross-Platform Builds

The project includes GitHub Actions workflows for building release binaries:

Linux (amd64, arm64) - statically linked
macOS (amd64, arm64)
Windows (amd64)

See .github/workflows/release.yml for details.

Docker

Using the Image

# Pull from GitHub Container Registry
docker pull ghcr.io/amscotti/markout:latest

# Convert a file
docker run --rm -v $(pwd):/data ghcr.io/amscotti/markout /data/input.html -o /data/output.md

# Pipe through docker
curl -s https://example.com | docker run --rm -i ghcr.io/amscotti/markout

Building the Image

docker build -t markout .

License

MIT License - see LICENSE for details.

Author

Anthony Scotti - creator and maintainer

Repository

markout

Owner

amscotti

Statistic

0
0
0
1
2
2 days ago
January 17, 2026

License

MIT License

Links

Synced at

Mon, 09 Mar 2026 00:13:38 GMT

Languages

Crystal 99.18% Dockerfile 0.82%

markout v0.3.0

Markout

Features

Installation

As a Library

CLI Binary

Quick Start

CLI Usage

CLI Options

CLI Examples

Library Usage

Usage Examples

Basic Conversion

With Options

Reference-Style Links

Processing Web Pages

Reusable Converter

Configuration Options

Performance

Use Cases

Development

Contributing

Building from Source

Prerequisites

Build Commands

Cross-Platform Builds

Docker

Using the Image

Building the Image

License

Author