Difference between revisions of "SRA"

Revision as of 19:05, 6 January 2022

Description

This is the NCBI Short Read Archive Toolkit.

Note: sra will create a $HOME/ncib/public directory and cache the prefetched data files there. However, home directory has a 20gb limit and its use for job data storage is a violation of the UFRC storage policy. You must change that location to a directory in your ufrc space before running the sra toolkit. The official approach is to use the vdb-config tool

vdb-config -i

and change the directory to, for example, /blue/$GROUP/$USER/ncbi/public. See the SRA Toolkit Configuration Documentation for more details.

Alternatively, create an 'ncbi' directory in your /blue space and symlink it to ~/ncbi. E.g.

$ mkdir /blue/mygroup/$USER/ncbi
$ ln -s /blue/mygroup/$USER/ncbi ~/ncbi

Uploads

It appears that data uploads to NCBI only work from login servers. Start a screen session before beginning an upload if there are any concerns about being disconnected.

Required Modules

modules documentation

Serial

sra

System Variables

HPC_SRA_DIR - installation directory
HPC_SRA_BIN - location of the executables directory
HPC_SRA_DOC - location of the documentation directory

Additional Information

Aspera Connect

To download SRA data you can use the "ascp" utility from the Aspera Connect browser plugin package. We have a copy installed and provided by the sra module. A wrapper script ascp.sh that automatically uses the ssh key is available. For instance:

ascp.sh -QT anonftp@ftp-private.ncbi.nlm.nih.gov:/genomes/Bacteria/all.faa.tar.gz faa

will download the all.faa.tar.gz archive to the faa directory.

Note: if the download fails to run with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))" error please submit a support request. This means that the remote site has not been allowed through the firewall. Please be sure to include the path to a script you used to run the data transfer command into the request. Do not put any sensitive information like passwords, keys, and such into the request.

@@ Line 2: / Line 2: @@
 __NOEDITSECTION__
 [[Category:Software]]
-<!-- ########  Template Configuration ######## -->
+{|<!--Main settings - REQUIRED-->
-<!--Edit definitions of the variables used in template calls
-Required variables:
-app - lowercase name of the application e.g. "amber"
-url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
-Optional variables:
-INTEL - Version of the Intel Compiler e.g. "11.1"
-MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
--->
-{|
-<!--Main settings - REQUIRED-->
 |{{#vardefine:app|sra}}
 |{{#vardefine:url|http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software}}
-<!--Compiler and MPI settings - OPTIONAL -->
+|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
-|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
-|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
-<!--Choose sections to enable - OPTIONAL-->
-|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
-|{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 |{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
 |{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
@@ Line 35: / Line 20: @@
 This is the NCBI Short Read Archive Toolkit.
-''Release notes:''
+;Note: sra will create a $HOME/ncib/public directory and cache the prefetched data files there. However, home directory has a 20gb limit and its use for job data storage is a violation of the [https://www.rc.ufl.edu/about/policies/storage/ UFRC storage policy]. You must change that location to a directory in your ufrc space before running the sra toolkit. The official approach is to use the vdb-config tool
+ vdb-config -i
+and change the directory to, for example, /blue/$GROUP/$USER/ncbi/public. See the [https://github.com/ncbi/sra-tools/wiki/Toolkit-Configuration SRA Toolkit Configuration Documentation] for more details.
-SRA Toolkit 2.1.7a includes new features in sam-dump tool and vdb-dump tools.
+Alternatively, create an 'ncbi' directory in your /blue space and symlink it to ~/ncbi. E.g.
-Sam-dump now supports slicing across multiple sequences, and dumping
+ $ mkdir /blue/mygroup/$USER/ncbi
-cSRA files to fasta and fastq formats. In addition, sam-dump has three
+ $ ln -s /blue/mygroup/$USER/ncbi ~/ncbi
-new parameters:
- -=|--hide-identical              Output '=' if base is identical to reference
+==Uploads==
- --gzip                           Compress output using gzip
+It appears that data uploads to NCBI only work from login servers. Start a screen session before beginning an upload if there are any concerns about being disconnected.
- --bzip2                          Compress output using bzip2
+<!--Modules-->
+==Required Modules==
-vdb-dump has two new parameters
+[[Modules|modules documentation]]
- -o|--column_enum_short           enumerates columns in short form
+===Serial===
- -b|--boolean                     defines how boolean's are printed (1,T)
+*{{#var:app}}
+==System Variables==
-We have combined the functionality of two scripts, config-assistant.perl and reference-assistant.perl
+* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
-into a single script, configuration-assistant.perl that helps users download the correct references
-for a given cSRA file and configure the user environment for the SRA Toolkit.
-<!--Location-->
-==Available versions==
-* 2.1.7
-<!-- -->
-{{#if: {{#var: mod}}|==Running the application using modules==
-{{App_Module|app={{#var:app}}|intel={{#var:intel}}|mpi={{#var:mpi}}}}|}}
 * HPC_SRA_BIN - location of the executables directory
 * HPC_SRA_DOC - location of the documentation directory
+<!--Additional-->
-==Aspera Connect==
+{{#if: {{#var: exe}}|==Additional Information==
+===Aspera Connect===
 To download SRA data you can use the "ascp" utility from the [http://asperasoft.com/downloads/ Aspera Connect] browser plugin package. We have a copy installed and provided by the sra module. A wrapper script ''ascp.sh'' that automatically uses the ssh key is available. For instance:
   ascp.sh -QT anonftp@ftp-private.ncbi.nlm.nih.gov:/genomes/Bacteria/all.faa.tar.gz faa
 will download the all.faa.tar.gz archive to the faa directory.
-'''Note:''' if the download fails to start on the first try with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))"
+'''Note:''' if the download fails to run with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))"
-error just re-run the command. It's a DNS (host name resolution) problem, which will resolve itself.
+error please submit a support request. This means that the remote site has not been allowed through the firewall. Please be sure to include the path to a script you used to run the data transfer command into the request. Do not put any sensitive information like passwords, keys, and such into the request.
-{{#if: {{#var: exe}}|==How To Run==
+|}}
-WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
 {{#if: {{#var: conf}}|==Configuration==
 See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
 {{#if: {{#var: pbs}}|==PBS Script Examples==
 See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
-{{#if: {{#var: policy}}|==Usage policy==
+{{#if: {{#var: policy}}|==Usage Policy==
 WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 {{#if: {{#var: testing}}|==Performance==