markout v0.3.0
Markout
Convert HTML to clean, readable Markdown in Crystal. Designed for content migration, static site generators, documentation tools, and LLM data preparation pipelines.
Features
- Fast: 25-40x faster than Python alternatives (markdownify, html2text)
- Clean output: Strips navigation, scripts, styles, and UI chrome automatically
- LLM-ready: Produces clean markdown ideal for RAG pipelines and context windows
- Configurable: ATX/Setext headings, bullet styles, reference links, and more
- Comprehensive: Tables, nested lists, blockquotes, code blocks, images, links
- CLI included: Command-line tool for Unix-style pipelines
Installation
As a Library
Add the dependency to your shard.yml:
dependencies:
markout:
github: amscotti/markout
Then run:
shards install
CLI Binary
Download pre-built binaries from the releases page or build from source:
shards build --production
# Binary will be at bin/markout
Or use Docker:
docker pull ghcr.io/amscotti/markout:latest
Quick Start
CLI Usage
The markout CLI follows Unix philosophy - read from stdin or file, write to stdout or file:
# Convert from stdin (pipe)
curl -s https://example.com | markout > article.md
# Convert file to stdout
markout input.html
# Convert file to file
markout input.html -o output.md
# Chain with other tools
cat page.html | markout | grep "^#" > headings.txt
CLI Options
| Flag | Description |
|---|---|
-o, --output FILE |
Output file (default: stdout) |
--heading-style=STYLE |
atx (#) or setext (underlined) |
--bullet-char=CHAR |
-, *, or + (default: -) |
--link-style=STYLE |
inline or referenced |
--strip-document |
Strip HTML document wrapper (default: true) |
--no-strip-document |
Keep document wrapper |
-h, --help |
Show help |
-v, --version |
Show version |
CLI Examples
# Fetch HN article and convert
curl -sL https://amplifying.ai/research/claude-code-picks | markout
# Use Setext headings and asterisk bullets
markout --heading-style=setext --bullet-char="*" article.html
Library Usage
require "markout"
html = "<h1>Hello</h1><p>This is <strong>bold</strong> text.</p>"
markdown = Markout.convert(html)
# => "# Hello\n\nThis is **bold** text."
Usage Examples
Basic Conversion
require "markout"
# Simple HTML to Markdown
html = <<-HTML
<h1>Welcome</h1>
<p>This is a <strong>test</strong> with a <a href="https://example.com">link</a>.</p>
<ul>
<li>Item one</li>
<li>Item two</li>
</ul>
HTML
puts Markout.convert(html)
# Output:
# # Welcome
#
# This is a **test** with a [link](https://example.com).
#
# - Item one
# - Item two
With Options
require "markout"
html = "<h1>Title</h1><ul><li>Item</li></ul>"
# Use Setext-style headings and asterisk bullets
options = Markout::Options.new
options.heading_style = Markout::Options::HeadingStyle::Setext
options.bullet_char = '*'
puts Markout.convert(html, options)
# Output:
# Title
# =====
#
# * Item
Reference-Style Links
require "markout"
html = <<-HTML
<p>Visit <a href="https://example.com">Example</a> and
<a href="https://test.com">Test</a> for more info.</p>
HTML
options = Markout::Options.new
options.link_style = Markout::Options::LinkStyle::Referenced
puts Markout.convert(html, options)
# Output:
# Visit [Example][1] and [Test][2] for more info.
#
# [1]: https://example.com
# [2]: https://test.com
Processing Web Pages
require "markout"
require "http/client"
# Fetch and convert a web page
response = HTTP::Client.get("https://example.com/article")
markdown = Markout.convert(response.body)
# Navigation, scripts, and styles are automatically stripped
# Only the article content remains
Reusable Converter
require "markout"
# Create a converter instance for multiple documents
options = Markout::Options.new
options.code_fence = "~~~"
converter = Markout::Converter.new(options)
docs = ["<p>Doc 1</p>", "<p>Doc 2</p>", "<p>Doc 3</p>"]
results = docs.map { |html| converter.convert(html) }
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
heading_style |
HeadingStyle |
ATX |
ATX (#) or Setext (underline) |
bullet_char |
Char |
'-' |
Bullet character for unordered lists |
emphasis_char |
Char |
'*' |
Character for italic text |
strong_char |
String |
"**" |
Characters for bold text |
code_fence |
String |
"```" |
Code fence delimiter |
hr_style |
String |
"---" |
Horizontal rule style |
link_style |
LinkStyle |
Inline |
Inline or Referenced links |
autolinks |
Bool |
true |
Use <url> when link text matches URL |
strip_document |
Bool |
true |
Strip leading/trailing whitespace |
Performance
Benchmarks against Python's markdownify and html2text (599KB Wikipedia page):
| Library | Time | Output Size |
|---|---|---|
| Markout | 7.5ms | 166KB |
| html2text | 83ms | 204KB |
| markdownify | 203ms | 210KB |
Markout is typically 10-30x faster than Python alternatives and produces 20-30% smaller output by stripping non-content elements. Reproducible scripts can be found in the benchmarks/ directory.
Use Cases
- RAG Pipelines: Extract clean content from web pages for vector databases
- Content Migration: Convert HTML documentation to Markdown
- LLM Context: Maximize useful content in context windows
- Static Sites: Process HTML for Jekyll, Hugo, or other generators
- Web Scraping: Clean article extraction from web pages
Development
# Install dependencies
shards install
# Run tests
crystal spec
# Run linter
bin/ameba
Contributing
- Fork it (https://github.com/amscotti/markout/fork)
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
Building from Source
Prerequisites
- Crystal >= 1.10.0
- LLVM (for compilation)
Build Commands
# Install dependencies
shards install
# Build CLI binary
shards build --production
# Run tests
crystal spec
# Run linter
bin/ameba
# Format code
crystal tool format
# Generate documentation
crystal docs
Cross-Platform Builds
The project includes GitHub Actions workflows for building release binaries:
- Linux (amd64, arm64) - statically linked
- macOS (amd64, arm64)
- Windows (amd64)
See .github/workflows/release.yml for details.
Docker
Using the Image
# Pull from GitHub Container Registry
docker pull ghcr.io/amscotti/markout:latest
# Convert a file
docker run --rm -v $(pwd):/data ghcr.io/amscotti/markout /data/input.html -o /data/output.md
# Pipe through docker
curl -s https://example.com | docker run --rm -i ghcr.io/amscotti/markout
Building the Image
docker build -t markout .
License
MIT License - see LICENSE for details.
Author
- Anthony Scotti - creator and maintainer
markout
- 0
- 0
- 0
- 1
- 2
- 2 days ago
- January 17, 2026
MIT License
Mon, 09 Mar 2026 00:13:38 GMT