Genomics software repository

About the software repository

The Genomics Software Repository is basically a collection of software located in a shared folder on Iceberg, the High Performance Computing Cluster at the University of Sheffield. Although any Iceberg user can access and use it, it has been primarily established as a way to develop and support the analytical infrastructure of the Molecular Ecology Group at the Department of Animal and Plant Sciences. The repository comprises a collection of ready-to-use programs for NGS and population genetic/genomic analysis (e.g. BEAST, Colony, STACKS, GATK). Its main advantage is that it removes the necessity for the user to install commonly used programs in their home folder.

How to use the repository

The repository is only accessible from the working nodes, so you will not be able to access any program from the head nodes. To fiddle with the programs before submitting any jobs, you will need to work from an interactive session. Remember you can launch it with the command: qrsh

The easiest way to do use the repository is to execute the following line from the command line (remember: you must be logged into one of the Iceberg working nodes): source /usr/local/extras/Genomics/.bashrc That will set up your environment variables to use the repository.

If you want to make it permanent, you will need to add the following lines at the end of your $HOME/.bash_profile:
 

if [[ -e '/usr/local/extras/Genomics' ]]; 
then
    source /usr/local/extras/Genomics/.bashrc
fi

This can also be done by executing the following command: echo -e "if [[ -e '/usr/local/extras/Genomics' ]];\nthen\n\tsource /usr/local/extras/Genomics/.bashrc\nfi" >> $HOME/.bash_profile

If the repository is set up correctly, you should see the following message every time you log in a working node:
 

  Your account is set up to use the Genomics Software Repository
    More info: http://soria-carrasco.staff.shef.ac.uk/softrepo

If you want to use a specific version of the software (which is safer, as versions could be updated while you are still working on a project), you need to look for the full path to your program in (see the ‘Structure of the repository section’ below):

To ensure this environment variable is inherited when submitting jobs on Iceberg or ShARC, use the option -V to pass all environment variables or better load the repository by placing source /usr/local/extras/Genomics/.bashrc at the beginning of the submission script.

Software installed

The list of installed programs, libraries, R packages, and Perl modules (including versions) can be obtained with the command softrepo. For more info with regards to the path to the programs, see next section.

Structure of the repository

In general, the common structure for a installed program is:
 

/usr/local/extras/Genomics/apps/program/version/program

or
 

/usr/local/extras/Genomics/apps/program/version/bin/program

However, be aware that this might be different for some particular programs. If you are not able to find an installed program, you can look for it with find. For example, if you are looking for stacks, you could run something like this:
 

find /usr/local/extras/Genomics/apps -type f -executable -name "*stacks*"

Submission scripts

There are examples of job submission scripts available for various analyses. This scripts can’t be modified, and the user will need to copy them to their home folder and modified accordingly. All submission scripts are located in:
 

/usr/local/extras/Genomics/submit_scripts

Tutorials and guidelines

Tutorials and guidelines on generic aspects as well as on how to run some specific programs are available on the resources section.

Repository administration

At present, the Genomics Software Repository is administered with minimal support and response times may be slow. You may request any software to be installed, but, in general, no support will be provided on how to run any specific program. It is strongly recommended to subscribe to the mailing list to ensure you are kept updated.

For general enquiries regarding the genomics software repository please contact the Genomics Software Repository administrator Victor Soria-Carrasco.

For general enquiries regarding Iceberg please contact the Iceberg and HPC Systems administrator
Anthony Brookfield

Stay updated

Please join the mailing list to stay informed on what programs are added to the repository, as well as updates of installed ones.