wgsim.cr v0.0.4.alpha

wgsim.cr

build

Reimplement wgsim in Crystal and add extra features.

:yarn: :black_cat: Please note that this project is being created for personal study and experimental purposes and is not currently provided for practical purposes.

  • mut : Adding mutations to the reference genome
    • SNPs
    • Insertion (any length)
    • Deletion (any length)
    • Fasta Output
  • seq : Simulation of short lead sequencing
    • Uniform substitution sequencing errors
    • Fastq Output

Installation

GitHub Releases

Compiling from source code

git clone https://github.com/kojix2/wgsim.cr
cd wgsim.cr
shards build --release -Dpreview_mt src/wgsim.cr

Homebrew

wgsim (macos) wgsim (ubuntu)

brew install kojix2/brew/wgsim

Usage

Program: wgsim (Crystal implementation of wgsim)
Version: 0.0.2.alpha
    mut          Add mutations to reference sequences
    seq          Simulate pair-end sequencing
About: Add mutations to reference sequences
Usage: wgsim mut [options] <in.ref.fa>

    -s, --sub-rate FLOAT             Rate of base substitutions [0.001]
    -i, --ins-rate FLOAT             Rate of insertions [0.0001]
    -d, --del-rate FLOAT             Rate of deletions [0.0001]
    -I, --ins-ext-prob FLOAT         Probability an insertion is extended [0.3]
    -D, --del-ext-prob FLOAT         Probability a deletion is extended [0.3]
    -p, --ploidy UINT8               Number of chromosome copies in output fasta [2]
    -S, --seed UINT64                Seed for random generator
About: Simulate pair-end sequencing
Usage: wgsim seq [options] <in.ref.fa> <out.read1.fq> <out.read2.fq>

    -e, --error-rate FLOAT           Base error rate [0.02]
    -d, --distance INT               Outer distance between the two ends [500]
    -s, --std-dev FLOAT              Standard deviation of the insert size [50]
    -D, --depth FLOAT                Average sequencing depth [10.0]
    -1, --size-left INT              Length of the first read [100]
    -2, --size-right INT             Length of the second read [100]
    -A, --ambiguous-ratio FLOAT      Discard if the fraction of N(ambiguous) bases higher than FLOAT [0.05]
    -S, --seed UINT64                Seed for random generator

NOTE

  • The key point is to include the complete DNA sequence of the cell's genome in the Fasta file. In the case of diploid cells, two Fasta records should be added for each pair of homologous chromosomes. When there is an increase in chromosome copy number due to extrachromosomal DNA, additional records must be included in the Fasta file to reflect this amplification. If a chromosome undergoes inversion or fusion, the Fasta file should contain a record that accurately represents these changes. This means that the genome should not be represented in any compressed form on the computer. Consequently, there will be as many UInt8 or RefBase structures as there are nucleotides. While this approach may reduce processing speed and increase disk and memory usage, it helps to avoid many complications.
  • wgsimのコードを眺める [JA]

Development

Dependencies:

Contributing

  1. Fork it (https://github.com/kojix2/wgsim/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request
Repository

wgsim.cr

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 3
  • 6 days ago
  • January 14, 2024
License

MIT License

Links
Synced at

Thu, 16 May 2024 21:08:05 GMT

Languages