charconv

libiconv rewrite in crystal

charconv

A pure Crystal implementation of GNU libiconv. Converts text between 150+ character encodings using Unicode (UCS-4) as a pivot, with performance-first design.

Features

  • 150+ encodings: ASCII, UTF-8, UTF-16/32, ISO-8859-*, Windows codepages, Mac encodings, CJK (Shift_JIS, EUC-JP, GBK, Big5, EUC-KR, GB18030, ...), EBCDIC, and more
  • Fast: 8-byte ASCII scanner with memcpy for ASCII-superset pairs, enum-based dispatch compiling to jump tables, table-driven single-byte codecs, zero allocations in the hot path
  • Correct: Exhaustive byte-level tests against system iconv for every encoding
  • Streaming: Buffer-based API for zero-copy conversion, plus IO wrapper for convenience
  • GNU iconv compatible: Supports //IGNORE, //TRANSLIT, and combined flags

Installation

Add to your shard.yml:

dependencies:
  charconv:
    github: jackthorne/charconv

Usage

One-shot conversion

require "charconv"

# String/Bytes → Bytes
result = CharConv.convert("Hello, World!", "UTF-8", "ISO-8859-1")
result = CharConv.convert(input_bytes, "Shift_JIS", "UTF-8")

# With flags
result = CharConv.convert(input, "UTF-8", "ASCII//TRANSLIT")   # transliterate
result = CharConv.convert(input, "UTF-8", "ASCII//IGNORE")     # skip failures

Streaming (buffer-based)

converter = CharConv::Converter.new("EUC-JP", "UTF-8")

# You provide the buffers — zero allocations
src_consumed, dst_written = converter.convert(input_bytes, output_bytes)
# Call repeatedly until input is exhausted

IO streaming

File.open("input.txt", "r") do |input|
  File.open("output.txt", "w") do |output|
    CharConv.convert(input, output, "Shift_JIS", "UTF-8")
  end
end

# Or with a Converter instance for more control
converter = CharConv::Converter.new("GB18030", "UTF-8")
converter.convert(input_io, output_io, buffer_size: 16384)

Querying encodings

CharConv.encoding_supported?("UTF-8")       # => true
CharConv.encoding_supported?("NONEXISTENT") # => false
CharConv.list_encodings                      # => ["ASCII", "UTF-8", ...]

Supported Encodings

Unicode: ASCII, UTF-8, UTF-16BE/LE/BOM, UTF-32BE/LE/BOM, UCS-2, UCS-4, UTF-7, C99, Java

Western European: ISO-8859-1/15, CP1252, MacRoman, HP-ROMAN8, NEXTSTEP

Central/Eastern European: ISO-8859-2/3/4/10/13/14/16, CP1250, MacCentralEurope

Cyrillic: ISO-8859-5, CP1251, KOI8-R, KOI8-U, KOI8-RU, MacCyrillic, MacUkraine

Greek: ISO-8859-7, CP1253, MacGreek

Turkish: ISO-8859-9, CP1254, MacTurkish

Hebrew: ISO-8859-8, CP1255, MacHebrew

Arabic: ISO-8859-6, CP1256, MacArabic, CP864

Thai: ISO-8859-11, TIS-620, CP874, MacThai

Vietnamese: VISCII, TCVN, CP1258

Japanese: EUC-JP, Shift_JIS, CP932, ISO-2022-JP, ISO-2022-JP-1, ISO-2022-JP-2

Chinese (Simplified): GB2312, GBK, GB18030, EUC-CN, HZ, ISO-2022-CN

Chinese (Traditional): Big5, CP950, Big5-HKSCS, EUC-TW

Korean: EUC-KR, CP949, ISO-2022-KR, JOHAB

DOS/IBM: CP437, CP737, CP775, CP850, CP852, CP855, CP857, CP858, CP860-CP866, CP869

EBCDIC: CP037, CP273, CP277, CP278, CP280, CP284, CP285, CP297, CP423, CP424, CP500, CP905, CP1026

Other: ARMSCII-8, Georgian-Academy, Georgian-PS, PT154, KOI8-T, KZ-1048, MULELAO-1, ATARIST, RISCOS-LATIN1

Replacing libiconv in Crystal's stdlib

charconv can transparently replace Crystal's libiconv dependency for all stdlib encoding operations (String#encode, String.new(bytes, encoding), IO#set_encoding).

require "charconv/stdlib"

# All stdlib encoding now uses charconv — no libiconv calls at runtime
"café".encode("ISO-8859-1")
String.new(bytes, "Shift_JIS")

io = File.open("data.txt")
io.set_encoding("EUC-JP")
io.gets_to_end  # decoded through charconv

By default, libiconv is still linked but never called. To fully remove the libiconv dependency, compile with -Dwithout_iconv:

crystal build app.cr -Dwithout_iconv

Performance

charconv vs system libiconv, 1 MB input, --release mode.

Conversion charconv system iconv Speedup
ASCII → ASCII 73.39 µs 11.89 ms 162.0×
UTF-8 → ISO-8859-1 (mixed Latin) 3.43 ms 14.62 ms 4.3×
ISO-8859-1 → UTF-8 2.08 ms 14.24 ms 6.9×
UTF-8 → UTF-8 (mixed widths) 4.92 ms 11.98 ms 2.4×
CP1252 → UTF-8 2.50 ms 17.24 ms 6.9×
UTF-8 → CP1252 (mixed Latin) 3.50 ms 14.50 ms 4.1×
UTF-16BE → UTF-8 (mixed widths) 3.73 ms 10.83 ms 2.9×
UTF-8 → UTF-16LE 4.57 ms 10.11 ms 2.2×

Measured on Apple M3 Pro, Crystal 1.19.1, macOS. Run crystal spec spec/bench_spec.cr --release to reproduce.

Architecture

Every conversion goes through a Unicode pivot:

Source bytes → UCS-4 codepoint → Target bytes
  decode()        (pivot)         encode()

For ASCII-superset encoding pairs (the vast majority), an 8-byte word scanner identifies ASCII runs and memcpys them directly, only falling back to the decode-pivot-encode loop for non-ASCII characters. This means ASCII-heavy text converts at memory bandwidth.

See ARCHITECTURE.md for the full design rationale.

Development

crystal spec                        # run all tests
crystal spec spec/bench_spec.cr --release  # run benchmarks

License

MIT

Repository

charconv

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 0
  • 2 minutes ago
  • March 10, 2026
License

MIT License

Links
Synced at

Wed, 11 Mar 2026 02:47:03 GMT

Languages