charconv
charconv
A pure Crystal implementation of GNU libiconv. Converts text between 150+ character encodings using Unicode (UCS-4) as a pivot, with performance-first design.
Features
- 150+ encodings: ASCII, UTF-8, UTF-16/32, ISO-8859-*, Windows codepages, Mac encodings, CJK (Shift_JIS, EUC-JP, GBK, Big5, EUC-KR, GB18030, ...), EBCDIC, and more
- Fast: 8-byte ASCII scanner with memcpy for ASCII-superset pairs, enum-based dispatch compiling to jump tables, table-driven single-byte codecs, zero allocations in the hot path
- Correct: Exhaustive byte-level tests against system iconv for every encoding
- Streaming: Buffer-based API for zero-copy conversion, plus IO wrapper for convenience
- GNU iconv compatible: Supports
//IGNORE,//TRANSLIT, and combined flags
Installation
Add to your shard.yml:
dependencies:
charconv:
github: jackthorne/charconv
Usage
One-shot conversion
require "charconv"
# String/Bytes → Bytes
result = CharConv.convert("Hello, World!", "UTF-8", "ISO-8859-1")
result = CharConv.convert(input_bytes, "Shift_JIS", "UTF-8")
# With flags
result = CharConv.convert(input, "UTF-8", "ASCII//TRANSLIT") # transliterate
result = CharConv.convert(input, "UTF-8", "ASCII//IGNORE") # skip failures
Streaming (buffer-based)
converter = CharConv::Converter.new("EUC-JP", "UTF-8")
# You provide the buffers — zero allocations
src_consumed, dst_written = converter.convert(input_bytes, output_bytes)
# Call repeatedly until input is exhausted
IO streaming
File.open("input.txt", "r") do |input|
File.open("output.txt", "w") do |output|
CharConv.convert(input, output, "Shift_JIS", "UTF-8")
end
end
# Or with a Converter instance for more control
converter = CharConv::Converter.new("GB18030", "UTF-8")
converter.convert(input_io, output_io, buffer_size: 16384)
Querying encodings
CharConv.encoding_supported?("UTF-8") # => true
CharConv.encoding_supported?("NONEXISTENT") # => false
CharConv.list_encodings # => ["ASCII", "UTF-8", ...]
Supported Encodings
Unicode: ASCII, UTF-8, UTF-16BE/LE/BOM, UTF-32BE/LE/BOM, UCS-2, UCS-4, UTF-7, C99, Java
Western European: ISO-8859-1/15, CP1252, MacRoman, HP-ROMAN8, NEXTSTEP
Central/Eastern European: ISO-8859-2/3/4/10/13/14/16, CP1250, MacCentralEurope
Cyrillic: ISO-8859-5, CP1251, KOI8-R, KOI8-U, KOI8-RU, MacCyrillic, MacUkraine
Greek: ISO-8859-7, CP1253, MacGreek
Turkish: ISO-8859-9, CP1254, MacTurkish
Hebrew: ISO-8859-8, CP1255, MacHebrew
Arabic: ISO-8859-6, CP1256, MacArabic, CP864
Thai: ISO-8859-11, TIS-620, CP874, MacThai
Vietnamese: VISCII, TCVN, CP1258
Japanese: EUC-JP, Shift_JIS, CP932, ISO-2022-JP, ISO-2022-JP-1, ISO-2022-JP-2
Chinese (Simplified): GB2312, GBK, GB18030, EUC-CN, HZ, ISO-2022-CN
Chinese (Traditional): Big5, CP950, Big5-HKSCS, EUC-TW
Korean: EUC-KR, CP949, ISO-2022-KR, JOHAB
DOS/IBM: CP437, CP737, CP775, CP850, CP852, CP855, CP857, CP858, CP860-CP866, CP869
EBCDIC: CP037, CP273, CP277, CP278, CP280, CP284, CP285, CP297, CP423, CP424, CP500, CP905, CP1026
Other: ARMSCII-8, Georgian-Academy, Georgian-PS, PT154, KOI8-T, KZ-1048, MULELAO-1, ATARIST, RISCOS-LATIN1
Replacing libiconv in Crystal's stdlib
charconv can transparently replace Crystal's libiconv dependency for all stdlib encoding operations (String#encode, String.new(bytes, encoding), IO#set_encoding).
require "charconv/stdlib"
# All stdlib encoding now uses charconv — no libiconv calls at runtime
"café".encode("ISO-8859-1")
String.new(bytes, "Shift_JIS")
io = File.open("data.txt")
io.set_encoding("EUC-JP")
io.gets_to_end # decoded through charconv
By default, libiconv is still linked but never called. To fully remove the libiconv dependency, compile with -Dwithout_iconv:
crystal build app.cr -Dwithout_iconv
Performance
charconv vs system libiconv, 1 MB input, --release mode.
| Conversion | charconv | system iconv | Speedup |
|---|---|---|---|
| ASCII → ASCII | 73.39 µs | 11.89 ms | 162.0× |
| UTF-8 → ISO-8859-1 (mixed Latin) | 3.43 ms | 14.62 ms | 4.3× |
| ISO-8859-1 → UTF-8 | 2.08 ms | 14.24 ms | 6.9× |
| UTF-8 → UTF-8 (mixed widths) | 4.92 ms | 11.98 ms | 2.4× |
| CP1252 → UTF-8 | 2.50 ms | 17.24 ms | 6.9× |
| UTF-8 → CP1252 (mixed Latin) | 3.50 ms | 14.50 ms | 4.1× |
| UTF-16BE → UTF-8 (mixed widths) | 3.73 ms | 10.83 ms | 2.9× |
| UTF-8 → UTF-16LE | 4.57 ms | 10.11 ms | 2.2× |
Measured on Apple M3 Pro, Crystal 1.19.1, macOS. Run crystal spec spec/bench_spec.cr --release to reproduce.
Architecture
Every conversion goes through a Unicode pivot:
Source bytes → UCS-4 codepoint → Target bytes
decode() (pivot) encode()
For ASCII-superset encoding pairs (the vast majority), an 8-byte word scanner identifies ASCII runs and memcpys them directly, only falling back to the decode-pivot-encode loop for non-ASCII characters. This means ASCII-heavy text converts at memory bandwidth.
See ARCHITECTURE.md for the full design rationale.
Development
crystal spec # run all tests
crystal spec spec/bench_spec.cr --release # run benchmarks
License
charconv
- 0
- 0
- 0
- 0
- 0
- 2 minutes ago
- March 10, 2026
MIT License
Wed, 11 Mar 2026 02:47:03 GMT