The Dextractor commands allow one to pull exactly and only the information needed for assembly and reconstruction from the source .bax.h5 HDF5 files produced by the PacBio RS II sequencer. Generally speaking, this information is the sequence of all the reads coded in the .bax.h5 file and a number of quality value (QV) streams needed by Quiver to produce a highly accurate consensus sequence as the last step in the assembly process. The Dextractor therefore produces a .fasta file of the sequence of all the reads, and a .quiva file containing the QV stream information in a .fastq readable format. For each of these two file types the library contains commands to compress the given file type, and to decompress it, which is a reversible process delivering the original uncompressed file. In this way, users of a PacBio can keep the data needed for assembly spooled up on disk in 1/14th the space occupied by the .bax.h5 files which can be archived to a cheap backup medium such as tape, should the raw data ever need to be consulted again (we expect never unless the spooled up data is compromised or lost in some way). The compressor/decompressor pairs are endian-aware so moving compressed files between machines is possible.
module spider dextractor to find out what environment modules are available for this application.
- HPC_DEXTRACTOR_DIR - installation directory