crystal-parquet
crystal-parquet
A Crystal shard for reading and converting Apache Parquet files to JSON and other formats.
Features
- ✅ Read Apache Parquet files
- ✅ Convert Parquet to JSON
- ✅ Extract file metadata
- ✅ Display schema information
- ✅ Pretty-print JSON output
- ✅ Command-line interface (CLI)
- ✅ Library API for programmatic access
Installation
- Add this to your application's
shard.yml:
dependencies:
parquet:
github: ober/crystal-parquet
-
Run
shards install -
Ensure Python 3 and
pyarroware installed:
pip3 install pyarrow
Usage
Command Line Interface
The shard provides a parquet command-line tool for converting Parquet files to JSON and inspecting their structure.
Convert Parquet to JSON
# Basic conversion to stdout
parquet input.parquet
# Pretty-printed JSON
parquet --pretty input.parquet
# Save to file
parquet -o output.json input.parquet
# Pretty-printed to file
parquet --pretty -o output.json input.parquet
View File Metadata
parquet --metadata input.parquet
Output:
Parquet File Metadata:
Number of rows: 5
Number of row groups: 1
Number of columns: 6
Created by: parquet-cpp-arrow version 22.0.0
View Schema
parquet --schema input.parquet
Output:
Schema:
id: int64
name: string
age: int64
score: double
active: bool
created_at: timestamp[us]
Help
parquet --help
Library API
Use the Parquet shard programmatically in your Crystal applications:
require "parquet"
# Read a Parquet file
reader = Parquet.read("data.parquet")
# Convert to JSON
json = reader.to_json(pretty: true)
puts json
# Get metadata
metadata = reader.read_metadata
puts "Number of rows: #{metadata.num_rows}"
puts "Number of row groups: #{metadata.num_row_groups}"
puts "Number of columns: #{metadata.num_columns}"
# Get schema
schema = reader.read_schema
puts "Schema:\n#{schema}"
Simple Conversion
require "parquet"
# One-liner conversion
json = Parquet.to_json("input.parquet", pretty: true)
File.write("output.json", json)
Building from Source
# Build the CLI tool
shards build
# The executable will be in bin/parquet
./bin/parquet --version
Development
Running Tests
crystal spec
Creating Sample Data
A helper script is provided to create sample Parquet files:
python3 examples/create_sample.py
Architecture
This shard uses a hybrid approach:
- Crystal provides the API interface and CLI
- Python's pyarrow handles the actual Parquet file reading (via subprocess)
This approach provides:
- Full compatibility with the Parquet file format
- Support for all compression codecs (Snappy, GZIP, LZ4, etc.)
- Support for all data types and encodings
- Reliable parsing of complex schemas
Examples
Example 1: Convert Parquet to JSON
require "parquet"
# Read and convert
reader = Parquet.read("users.parquet")
json_data = reader.to_json(pretty: true)
# Write to file
File.write("users.json", json_data)
puts "Conversion complete!"
Example 2: Inspect File Structure
require "parquet"
reader = Parquet.read("data.parquet")
# Show metadata
metadata = reader.read_metadata
puts "File contains #{metadata.num_rows} rows"
puts "Organized in #{metadata.num_row_groups} row groups"
puts "With #{metadata.num_columns} columns"
# Show schema
puts "\nSchema:"
puts reader.read_schema
Example 3: Batch Processing
require "parquet"
Dir.glob("data/*.parquet").each do |file|
puts "Processing: #{file}"
json_file = file.sub(".parquet", ".json")
json = Parquet.to_json(file, pretty: true)
File.write(json_file, json)
puts " -> #{json_file}"
end
Supported Data Types
The library supports all standard Parquet data types:
- Boolean
- Integer (8, 16, 32, 64-bit, signed and unsigned)
- Float and Double
- String (UTF-8)
- Binary
- Date
- Timestamp (milliseconds and microseconds)
- Decimal
- Lists and Arrays
- Maps
- Nested structures
Requirements
- Crystal 1.0+
- Python 3.7+
- pyarrow Python library
Contributing
- Fork it (https://github.com/ober/crystal-parquet/fork)
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
License
MIT License - see LICENSE file for details
Credits
Built with Crystal and powered by Apache Arrow's PyArrow library.
Repository
crystal-parquet
Owner
Statistic
- 0
- 0
- 0
- 0
- 0
- about 1 month ago
- January 13, 2026
License
MIT License
Links
Synced at
Mon, 26 Jan 2026 21:47:01 GMT
Languages