Difference between revisions of "SRA"

From UFRC
Jump to navigation Jump to search
m (Fix formatting)
(8 intermediate revisions by 2 users not shown)
Line 20: Line 20:
 
This is the NCBI Short Read Archive Toolkit.
 
This is the NCBI Short Read Archive Toolkit.
  
''Release notes:''
+
;Note: sra will create a $HOME/ncib/public directory and cache the prefetched data files there. However, home directory has a 20gb limit and its use for job data storage is a violation of the [https://www.rc.ufl.edu/about/policies/storage/ UFRC storage policy]. You must change that location to a directory in your ufrc space before running the sra toolkit. The official approach is to use the vdb-config tool
 +
vdb-config -i
 +
and change the directory to, for example, /blue/$GROUP/$USER/ncbi/public. See the [https://github.com/ncbi/sra-tools/wiki/Toolkit-Configuration SRA Toolkit Configuration Documentation] for more details.
  
SRA Toolkit 2.1.7a includes new features in sam-dump tool and vdb-dump tools.
+
Alternatively, create an 'ncbi' directory in your /blue space and symlink it to ~/ncbi. E.g.
  
Sam-dump now supports slicing across multiple sequences, and dumping
+
$ mkdir /blue/mygroup/$USER/ncbi
cSRA files to fasta and fastq formats. In addition, sam-dump has three
+
$ ln -s /blue/mygroup/$USER/ncbi ~/ncbi
new parameters:
 
  
-=|--hide-identical              Output '=' if base is identical to reference
+
==Uploads==
--gzip                          Compress output using gzip
+
It appears that data uploads to NCBI only work from login servers. Start a screen session before beginning an upload if there are any concerns about being disconnected.
--bzip2                          Compress output using bzip2
 
 
 
vdb-dump has two new parameters
 
-o|--column_enum_short          enumerates columns in short form
 
-b|--boolean                    defines how boolean's are printed (1,T)
 
 
 
We have combined the functionality of two scripts, config-assistant.perl and reference-assistant.perl
 
into a single script, configuration-assistant.perl that helps users download the correct references
 
for a given cSRA file and configure the user environment for the SRA Toolkit.
 
 
<!--Modules-->
 
<!--Modules-->
 
==Required Modules==
 
==Required Modules==
Line 45: Line 37:
 
*{{#var:app}}
 
*{{#var:app}}
 
==System Variables==
 
==System Variables==
* HPC_{{#uppercase:{{#var:app}}}}_DIR - installation directory
+
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 
* HPC_SRA_BIN - location of the executables directory
 
* HPC_SRA_BIN - location of the executables directory
 
* HPC_SRA_DOC - location of the documentation directory
 
* HPC_SRA_DOC - location of the documentation directory
Line 55: Line 47:
 
will download the all.faa.tar.gz archive to the faa directory.  
 
will download the all.faa.tar.gz archive to the faa directory.  
  
'''Note:''' if the download fails to start on the first try with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))"
+
'''Note:''' if the download fails to run with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))"
error just re-run the command. It's a DNS (host name resolution) problem, which will resolve itself.
+
error please submit a support request. This means that the remote site has not been allowed through the firewall. Please be sure to include the path to a script you used to run the data transfer command into the request. Do not put any sensitive information like passwords, keys, and such into the request.
 
|}}
 
|}}
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==

Revision as of 19:05, 6 January 2022

Description

sra website  

This is the NCBI Short Read Archive Toolkit.

Note
sra will create a $HOME/ncib/public directory and cache the prefetched data files there. However, home directory has a 20gb limit and its use for job data storage is a violation of the UFRC storage policy. You must change that location to a directory in your ufrc space before running the sra toolkit. The official approach is to use the vdb-config tool
vdb-config -i

and change the directory to, for example, /blue/$GROUP/$USER/ncbi/public. See the SRA Toolkit Configuration Documentation for more details.

Alternatively, create an 'ncbi' directory in your /blue space and symlink it to ~/ncbi. E.g.

$ mkdir /blue/mygroup/$USER/ncbi
$ ln -s /blue/mygroup/$USER/ncbi ~/ncbi

Uploads

It appears that data uploads to NCBI only work from login servers. Start a screen session before beginning an upload if there are any concerns about being disconnected.

Required Modules

modules documentation

Serial

  • sra

System Variables

  • HPC_SRA_DIR - installation directory
  • HPC_SRA_BIN - location of the executables directory
  • HPC_SRA_DOC - location of the documentation directory

Additional Information

Aspera Connect

To download SRA data you can use the "ascp" utility from the Aspera Connect browser plugin package. We have a copy installed and provided by the sra module. A wrapper script ascp.sh that automatically uses the ssh key is available. For instance:

ascp.sh -QT anonftp@ftp-private.ncbi.nlm.nih.gov:/genomes/Bacteria/all.faa.tar.gz faa

will download the all.faa.tar.gz archive to the faa directory.

Note: if the download fails to run with a "Session Stop (Error: Client unable to connect to server (check UDP port and firewall))" error please submit a support request. This means that the remote site has not been allowed through the firewall. Please be sure to include the path to a script you used to run the data transfer command into the request. Do not put any sensitive information like passwords, keys, and such into the request.