Difference between revisions of "SRA"

Revision as of 19:03, 30 November 2016

Description

This is the NCBI Short Read Archive Toolkit.

Note: sra will create a $HOME/ncib/public directory and cache the prefetched data files there. However, home directory has a 20gb limit and its use for job data storage is a violation of the UFRC storage policy. You must change that location to a directory in your ufrc space before running the sra toolkit. The official approach is to use the vdb-config tool

vdb-config -i

and change the directory to, for example, /ufrc/$GROUP/$USER/ncbi/public.

Required Modules

modules documentation

Serial

sra

System Variables

HPC_{{#uppercase:sra}}_DIR - installation directory
HPC_SRA_BIN - location of the executables directory
HPC_SRA_DOC - location of the documentation directory

Additional Information

Aspera Connect

To download SRA data you can use the "ascp" utility from the Aspera Connect browser plugin package. We have a copy installed and provided by the sra module. A wrapper script ascp.sh that automatically uses the ssh key is available. For instance:

ascp.sh -QT anonftp@ftp-private.ncbi.nlm.nih.gov:/genomes/Bacteria/all.faa.tar.gz faa

will download the all.faa.tar.gz archive to the faa directory.

Note: if the download fails to start on the first try with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))" error just re-run the command. It's a DNS (host name resolution) problem, which will resolve itself.

@@ Line 20: / Line 20: @@
 This is the NCBI Short Read Archive Toolkit.
-''Release notes:''
+;Note: sra will create a $HOME/ncib/public directory and cache the prefetched data files there. However, home directory has a 20gb limit and its use for job data storage is a violation of the [https://www.rc.ufl.edu/about/policies/storage/ UFRC storage policy]. You must change that location to a directory in your ufrc space before running the sra toolkit. The official approach is to use the vdb-config tool
+ vdb-config -i
+and change the directory to, for example, /ufrc/$GROUP/$USER/ncbi/public.
-SRA Toolkit 2.1.7a includes new features in sam-dump tool and vdb-dump tools.
-Sam-dump now supports slicing across multiple sequences, and dumping
-cSRA files to fasta and fastq formats. In addition, sam-dump has three
-new parameters:
- -=|--hide-identical              Output '=' if base is identical to reference
- --gzip                           Compress output using gzip
- --bzip2                          Compress output using bzip2
-vdb-dump has two new parameters
- -o|--column_enum_short           enumerates columns in short form
- -b|--boolean                     defines how boolean's are printed (1,T)
-We have combined the functionality of two scripts, config-assistant.perl and reference-assistant.perl
-into a single script, configuration-assistant.perl that helps users download the correct references
-for a given cSRA file and configure the user environment for the SRA Toolkit.
 <!--Modules-->
 ==Required Modules==