fqix v0.0.2
fqix
fqix is a small command-line tool for fetching FASTQ records by read name from ordinary fastq.gz files. It builds a .fqix index so lookup can resume gzip inflation near the requested read instead of decompressing from the beginning.
⚗️ Early Prototype
Installation
Prebuilt binaries are available from GitHub Releases.
To build from source:
git clone https://github.com/kojix2/fqix.git
cd fqix
make release=1
The binary is written to:
bin/fqix
Usage
Build the default index next to a FASTQ file:
fqix index reads.fastq.gz
Fetch one or more reads by name. Matching FASTQ records are written to stdout:
fqix get reads.fastq.gz read_001 read_002 > hits.fastq
Useful variants:
fqix index -o reads.fqix reads.fastq.gz
fqix get -i reads.fqix reads.fastq.gz read_001
fqix get --first reads.fastq.gz duplicate_name
fqix get --count --list names.txt reads.fastq.gz
fqix show reads.fastq.gz.fqix
fqix show --entries reads.fastq.gz.fqix
fqix check reads.fastq.gz
Checkpoint density can be tuned when needed:
fqix index --checkpoint-span 4194304 reads.fastq.gz
Run fqix --help or fqix <command> --help for the full option list. If any requested read is missing, fqix get writes a message to stderr and exits with code 2.
FASTQ Assumptions
fqix expects ordinary four-line FASTQ records in a .fastq.gz file. Read names do not need to be sorted.
@read_001 optional comment
ACGTACGT
+
IIIIIIII
Multiline sequence or quality fields are not supported. The read name is the text after the header's first @ up to the first space or tab. Query names are bare read names; a leading @ in the query is treated as part of the name.
How It Works
A .fqix index stores:
- zran-style checkpoints for resuming gzip inflation
- one hash-sorted entry for every FASTQ record, plus a read-name string table
fqix get hashes the query name, checks hash-matching entries with an exact name comparison, resumes from the nearest gzip checkpoint, and extracts the indexed record size.
Limitations
- Multiline FASTQ is not supported.
fqix checkcompares source file size and second-resolution mtime.- Parallel lookup is not implemented.
Development
Run tests:
make test
Tests link Mark Adler's zran example as a reference implementation, so a C compiler is required.
License
fqix is licensed under the MIT License.
The files under spec/support/ and the implementation in src/fqix/zran.cr are based on Mark Adler's zran from zlib, and are distributed under the zlib License.
fqix
- 0
- 0
- 0
- 0
- 0
- about 2 hours ago
- June 17, 2026
MIT License
Thu, 18 Jun 2026 08:18:56 GMT