Bioinformatic sequence recovery for universal target-capture bait kits can be substantially improved by appropriate tailoring of target files to the group under study. To enable the best possible locus recovery from Angiosperms353 capture data, we have developed an expanded target file (mega353.fasta) incorporating sequences from over 550 transcriptomes from the 1KP project. To maximise computational efficiency we provide the script filter_mega353.py, which can be used to subsample the mega353.fasta file based on user-selected taxa or taxon groups. These groups can be defined using unique 1KP transcriptome codes, species, families, orders, or broader groups (e.g. Basal Eudicots, Monocots, etc). In addition, we provide the script BYO_transcriptome.py, which can be used to add sequences from any transcriptome to any protein-coding nucleotide target file. These tailored and customised target files can be used directly in target-capture pipelines such as HybPiper.
module spider newtargets to find out what environment modules are available for this application.
- HPC_NEWTARGETS_DIR - installation directory
- HPC_NEWTARGETS_BIN - executable directory