RepeatAnalysisTools
Description
This repository contains instructions for processing and repeat analysis of sequence data generated with the PacBio No-Amp Targeted Sequencing Protocol with simplified double Cas9 cut.
UPDATE: RepeatAnalysis Tools in this repository now use Python 3.
Outputs from the analysis scripts include high-accuracy (>=QV20) CCS sequences for target regions so that users can easily analyze the results with other third party tools as necessary.
Environment Modules
Run module spider RepeatAnalysisTools
to find out what environment modules are available for this application.
System Variables
- HPC_RATOOLS_DIR - installation directory
- HPC_RATOOLS_BIN - executable directory
Additional Information
To utilize any of the BASH or Python scripts provided with the tools (i.e. the files ending with ".sh" or ".py", you will need to prefix the script name with ${HPC_RATOOLS_DIR}/
For example, a preprocess.sh
command might look something like:
${HPC_RATOOLS_DIR}/preprocess.sh \ m64012_191221_044659.subreads.bam \ m64012_191221_044659.adapters.fasta \ /data/reference/genomes/human/hs37d5/hs37d5.fa \ ./output \ 16 \ 16 \ local
Citation
If you publish research that uses RepeatAnalysisTools you have to cite it as follows:
@software{tange_2021_5013933, author = {Tange, Ole}, title = {GNU Parallel 20210622 ('Protasevich')}, month = Jun, year = 2021, note = {{GNU Parallel is a general parallelizer to run multiple serial command line programs in parallel without changing them.}}, publisher = {Zenodo}, doi = {10.5281/zenodo.5013933}, url = {https://doi.org/10.5281/zenodo.5013933}
}