StarBLAST-HPC: HPC Deployment for Large Classes (>100)¶
The StarBLAST-HPC Setup is designed to distribute BLAST searches across multiple nodes on a High-Performance Computer and uses a Master-Worker set-up similar to StarBLAST-Docker (an atmosphere instance as the Master, and the HPC as the Worker). It is suggested that the Worker is set up ahead of time.
Some command line knowledge is required for setup.
HPC Requirements and Setup¶
It is important that the following software are installed on the HPC:
- iRODS version 4.0 or newer
- ncbi-BLAST+ version 2.9.0 or newer
- CCTools version 7.0.21 or newer
- glibc version 2.14 or newer
- Support for CentOS7
- CyVerse user account
iRODS, ncbi-BLAST+ and CCTools should be available in your home directory, which can be found using
cd
pwd
It should output something similar to
/home/<U_NUMBER>/<USER>/
iRODS Installation Guide¶
- From your home directory, obtain and install iRODS with the command
wget https://files.renci.org/pub/irods/releases/4.1.10/ubuntu14/irods-icommands-4.1.10-ubuntu14-x86_64.deb
apt-get install ./irods-icommands-4.1.10-ubuntu14-x86_64.deb
- Upon installation, set up the iCommands (requires a CyVerse account):
iinit
- You will be prompted to connect to the CyVerse with:
host name (DNS): data.cyverse.org
port #: 1247
username: <CyVerse_ID>
zone: iplant
password: <CyVerse_password>
iRODS should be installed and configured. If problems persists, a more in depth tutorial on iRODS and iCommands installation can be found here.
ncbi-BLAST+ Installation Guide¶
- From your home directory, obtain and decompress ncbi-BLAST+ with
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.9.0/ncbi-blast-2.9.0+-x64-linux.tar.gz
tar -xvf ncbi-blast-2.9.0+-x64-linux.tar.gz
- Add ncbi-BLAST+ to the path (change the path to reflect the correct location of the ncbi-BLAST+ bin files):
export PATH=$HOME</PATH/TO/BLAST/BIN/>:$PATH
At this point, ncbi-BLAST+ should be installed and accessible.
- BLAST databases need to be downloaded in a
<DATABASE>/
directory in the home folder.
/home/<U_NUMBER>/<USER>/<DATABASE>/
Note
An example of BLAST databases can be downloaded with iRODS here: /iplant/home/cosimichele/200503_Genomes_n_p
. Read more on installing iRODS and iCommands above.
CCTools Installation Guide¶
- From your home directory, obtain and decompress CCTools with
wget https://ccl.cse.nd.edu/software/files/cctools-7.1.6-source.tar.gz
tar -xvf cctools-7.1.6-source.tar.gz
- Add CCTools to the path (change the path to reflect the correct location of the CCTools bin files):
export PATH=$HOME</PATH/TO/CCTOOLS/BIN/>:$PATH
At this point, CCTools should be installed and accessible.
Note
CCTools only works if your HPC has glibc version 2.14 or newer. In the following examples, glibc and BLAST+ are loaded through module load
. module load
is not necessary if the HPC system already supports glibc 2.14 and if ncbi-BLAST+ has been added to the path as described above.
Launching Workers on the HPC¶
The HPC uses a .pbs and qsub system to submit jobs.
- Create a
.pbs
file that contains the following code and change the<VARIABLES>
to preferred options:
#!/bin/bash
#PBS -W group_list=<GROUP_LIST>
#PBS -q windfall
#PBS -l select=<N_OF_NODES>:ncpus=<N_OF_CPUS>:mem=<N_MEMORY>gb
#PBS -l place=pack:shared
#PBS -l walltime=<MAX_TIME>
#PBS -l cput=<MAX_TIME>
module load blast
module load unsupported
module load ferng/glibc
module load singularity
export CCTOOLS_HOME=/home/<U_NUMBER>/<USER>/<CCTOOLS_DIRECTORY>
export PATH=${CCTOOLS_HOME}/bin:$PATH
cd /home/<U_NUMBER>/<USER>/<WORKERS_DIRECTORY>
MASTER_IP=<MASTER_IP>
MASTER_PORT=<PORT_NUMBER>
TIME_OUT_TIME=<TIME_OUT_TIME>
PROJECT_NAME=<PROJECT_NAME>
/home/<U_NUMBER>/<USER>/<CCTOOLS_DIRECTORY>/bin/work_queue_factory -T local -M $PROJECT_NAME --cores <N_CORES> -w <MIN_N_WORKERS> -W <MAX_N_WORKERS> -t $TIME_OUT_TIME
An example of a .pbs
file running on the University of Arizona HPC:
#!/bin/bash
#PBS -W group_list=lyons-lab
#PBS -q windfall
#PBS -l select=2:ncpus=12:mem=24gb
#PBS -l place=pack:shared
#PBS -l walltime=02:00:00
#PBS -l cput=02:00:00
module load blast
module load unsupported
module load ferng/glibc
module load singularity
export CCTOOLS_HOME=/home/u12/cosi/cctools-7.0.19-x86_64-centos7
export PATH=${CCTOOLS_HOME}/bin:$PATH
cd /home/u12/cosi/cosi-workers
MASTER_IP=128.196.142.13
MASTER_PORT=9123
TIME_OUT_TIME=1800
PROJECT_NAME="starBLAST"
/home/u12/cosi/cctools-7.0.19-x86_64-centos7/bin/work_queue_factory -T local -M $PROJECT_NAME --cores 12 -w 1 -W 8 -t $TIME_OUT_TIME
In the example above, the user already has blast installed (calls it using module load blast
). The script will submit to the HPC nodes a minimum of 1 and a maximum of 8 workers per node.
- Submit the
.pbs
script with
qsub <NAME_OF_PBS>.pbs
Setting Up the Master VM on the Cloud Service¶
Set up the Master instance for starBLAST-HPC by following the same steps as for StarBLAST-Docker, but without adding the Master deployment script. Additionally, BLAST databases need to be loaded manually onto the <DATABASE>/
folder.
Once the VM is running, access it through ssh or by using the Web Shell (“Open Web Shell” button on your VM’s page). Once inside follow the next steps.
Note
IMPORTANT: THE PATH TO THE DATABASE ON THE MASTER NEED TO BE THE SAME AS THE ONE ON THE WORKER
- Ensure the databases on both the Master VM and Worker HPC are in the same directory. On the Worker HPC go to the
<DATABASE>/
directory and do
pwd
Then, on your Master VM, create the directory with the same path output above
mkdir -p SAME/PATH/TO/HPC/DATABASE/DIRECTORY/
- Now the
<DATABASE>/
directories have been set up to contain the desired databases. You can use the same databases preset for StarBLAST-Docker or make your own from a.fasta (or .fa, .faa, .fna)
file using BLAST+’s makeblastdb referenced in StarBLAST-VICE. Both require iRODS (JetStream comes with iRODS pre-installed) and a CyVerse account.
Access iRODS using:
iinit
You will be prompted to connect to the CyVerse with:
host name (DNS): data.cyverse.org
port #: 1247
username: <CyVerse_ID>
zone: iplant
password: <CyVerse_password>
- Once connected, retreive and move the databases to your
<DATABASE>/
folder (shown for preset):
iget -rKVP /iplant/home/cosimichele/200503_Genomes_n_p
mv GCF_* /DATABASE/DIRECTORY/
- Move the databases to the HPC using either
sftp
or the steps as above if your HPC system has iRODS. - Use this code within the Master instance to launch sequenceServer:
docker run --rm --name sequenceserver-scale -p 80:3000 -p 9123:9123 -e PROJECT_NAME=<PROJECT_NAME> -e WORKQUEUE_PASSWORD=<PASSWORD> -e BLAST_NUM_THREADS=<N THREADS> -e SEQSERVER_DB_PATH="/home/<U_NUMBER>/<USER>/<DATABASE_DIRECTORY>" -v /DATABASE/ON/MASTER:/DATABASE/ON/WORKER zhxu73/sequenceserver-scale:no-irods
An example is:
docker run --rm --name sequenceserver-scale -p 80:3000 -p 9123:9123 -e PROJECT_NAME=starBLAST -e WORKQUEUE_PASSWORD= -e BLAST_NUM_THREADS=2 -e SEQSERVER_DB_PATH="/home/u12/cosi/DATABASE" -v /home/u12/cosi/DATABASE:/home/u12/cosi/DATABASE zhxu73/sequenceserver-scale:no-irods
Note
The custom Database folder on the Master needs to have read and write permissions
Start BLASTING! Now anyone can enter the <MASTER_IP_ADDRESS>
in their browser to access SequenceServer.