crystal-parquet

A crystal shard to convert and manage parquet files.

crystal-parquet

A Crystal shard for reading and converting Apache Parquet files to JSON and other formats.

Features

  • ✅ Read Apache Parquet files
  • ✅ Convert Parquet to JSON
  • ✅ Extract file metadata
  • ✅ Display schema information
  • ✅ Pretty-print JSON output
  • ✅ Command-line interface (CLI)
  • ✅ Library API for programmatic access

Installation

  1. Add this to your application's shard.yml:
dependencies:
  parquet:
    github: ober/crystal-parquet
  1. Run shards install

  2. Ensure Python 3 and pyarrow are installed:

pip3 install pyarrow

Usage

Command Line Interface

The shard provides a parquet command-line tool for converting Parquet files to JSON and inspecting their structure.

Convert Parquet to JSON

# Basic conversion to stdout
parquet input.parquet

# Pretty-printed JSON
parquet --pretty input.parquet

# Save to file
parquet -o output.json input.parquet

# Pretty-printed to file
parquet --pretty -o output.json input.parquet

View File Metadata

parquet --metadata input.parquet

Output:

Parquet File Metadata:
  Number of rows: 5
  Number of row groups: 1
  Number of columns: 6
  Created by: parquet-cpp-arrow version 22.0.0

View Schema

parquet --schema input.parquet

Output:

Schema:
id: int64
name: string
age: int64
score: double
active: bool
created_at: timestamp[us]

Help

parquet --help

Library API

Use the Parquet shard programmatically in your Crystal applications:

require "parquet"

# Read a Parquet file
reader = Parquet.read("data.parquet")

# Convert to JSON
json = reader.to_json(pretty: true)
puts json

# Get metadata
metadata = reader.read_metadata
puts "Number of rows: #{metadata.num_rows}"
puts "Number of row groups: #{metadata.num_row_groups}"
puts "Number of columns: #{metadata.num_columns}"

# Get schema
schema = reader.read_schema
puts "Schema:\n#{schema}"

Simple Conversion

require "parquet"

# One-liner conversion
json = Parquet.to_json("input.parquet", pretty: true)
File.write("output.json", json)

Building from Source

# Build the CLI tool
shards build

# The executable will be in bin/parquet
./bin/parquet --version

Development

Running Tests

crystal spec

Creating Sample Data

A helper script is provided to create sample Parquet files:

python3 examples/create_sample.py

Architecture

This shard uses a hybrid approach:

  • Crystal provides the API interface and CLI
  • Python's pyarrow handles the actual Parquet file reading (via subprocess)

This approach provides:

  • Full compatibility with the Parquet file format
  • Support for all compression codecs (Snappy, GZIP, LZ4, etc.)
  • Support for all data types and encodings
  • Reliable parsing of complex schemas

Examples

Example 1: Convert Parquet to JSON

require "parquet"

# Read and convert
reader = Parquet.read("users.parquet")
json_data = reader.to_json(pretty: true)

# Write to file
File.write("users.json", json_data)
puts "Conversion complete!"

Example 2: Inspect File Structure

require "parquet"

reader = Parquet.read("data.parquet")

# Show metadata
metadata = reader.read_metadata
puts "File contains #{metadata.num_rows} rows"
puts "Organized in #{metadata.num_row_groups} row groups"
puts "With #{metadata.num_columns} columns"

# Show schema
puts "\nSchema:"
puts reader.read_schema

Example 3: Batch Processing

require "parquet"

Dir.glob("data/*.parquet").each do |file|
  puts "Processing: #{file}"
  
  json_file = file.sub(".parquet", ".json")
  json = Parquet.to_json(file, pretty: true)
  File.write(json_file, json)
  
  puts "  -> #{json_file}"
end

Supported Data Types

The library supports all standard Parquet data types:

  • Boolean
  • Integer (8, 16, 32, 64-bit, signed and unsigned)
  • Float and Double
  • String (UTF-8)
  • Binary
  • Date
  • Timestamp (milliseconds and microseconds)
  • Decimal
  • Lists and Arrays
  • Maps
  • Nested structures

Requirements

  • Crystal 1.0+
  • Python 3.7+
  • pyarrow Python library

Contributing

  1. Fork it (https://github.com/ober/crystal-parquet/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

License

MIT License - see LICENSE file for details

Credits

Built with Crystal and powered by Apache Arrow's PyArrow library.

Repository

crystal-parquet

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 0
  • about 1 month ago
  • January 13, 2026
License

MIT License

Links
Synced at

Mon, 26 Jan 2026 21:47:01 GMT

Languages