wgsim.cr v0.0.4.alpha
wgsim.cr
Reimplement wgsim in Crystal and add extra features.
:yarn: :black_cat: Please note that this project is being created for personal study and experimental purposes and is not currently provided for practical purposes.
mut
: Adding mutations to the reference genome- SNPs
- Insertion (any length)
- Deletion (any length)
- Fasta Output
seq
: Simulation of short lead sequencing- Uniform substitution sequencing errors
- Fastq Output
Installation
Compiling from source code
git clone https://github.com/kojix2/wgsim.cr
cd wgsim.cr
shards build --release -Dpreview_mt src/wgsim.cr
Homebrew
brew install kojix2/brew/wgsim
Usage
Program: wgsim (Crystal implementation of wgsim)
Version: 0.0.2.alpha
mut Add mutations to reference sequences
seq Simulate pair-end sequencing
About: Add mutations to reference sequences
Usage: wgsim mut [options] <in.ref.fa>
-s, --sub-rate FLOAT Rate of base substitutions [0.001]
-i, --ins-rate FLOAT Rate of insertions [0.0001]
-d, --del-rate FLOAT Rate of deletions [0.0001]
-I, --ins-ext-prob FLOAT Probability an insertion is extended [0.3]
-D, --del-ext-prob FLOAT Probability a deletion is extended [0.3]
-p, --ploidy UINT8 Number of chromosome copies in output fasta [2]
-S, --seed UINT64 Seed for random generator
About: Simulate pair-end sequencing
Usage: wgsim seq [options] <in.ref.fa> <out.read1.fq> <out.read2.fq>
-e, --error-rate FLOAT Base error rate [0.02]
-d, --distance INT Outer distance between the two ends [500]
-s, --std-dev FLOAT Standard deviation of the insert size [50]
-D, --depth FLOAT Average sequencing depth [10.0]
-1, --size-left INT Length of the first read [100]
-2, --size-right INT Length of the second read [100]
-A, --ambiguous-ratio FLOAT Discard if the fraction of N(ambiguous) bases higher than FLOAT [0.05]
-S, --seed UINT64 Seed for random generator
NOTE
- The key point is to include the complete DNA sequence of the cell's genome in the Fasta file. In the case of diploid cells, two Fasta records should be added for each pair of homologous chromosomes. When there is an increase in chromosome copy number due to extrachromosomal DNA, additional records must be included in the Fasta file to reflect this amplification. If a chromosome undergoes inversion or fusion, the Fasta file should contain a record that accurately represents these changes. This means that the genome should not be represented in any compressed form on the computer. Consequently, there will be as many
UInt8
orRefBase
structures as there are nucleotides. While this approach may reduce processing speed and increase disk and memory usage, it helps to avoid many complications. - wgsimのコードを眺める [JA]
Development
Dependencies:
- kojix2/nworkers.cr - Set the number of worker threads at runtime.
- kojix2/randn.cr - Normal random number generator.
- kojix2/fastx.cr - Fasta file reader.
Contributing
- Fork it (https://github.com/kojix2/wgsim/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Repository
wgsim.cr
Owner
Statistic
- 0
- 0
- 0
- 0
- 3
- 9 days ago
- January 14, 2024
License
MIT License
Links
Synced at
Sat, 18 May 2024 14:46:38 GMT
Languages