StarBLAST-HPC: HPC Deployment for Large Classes (>100)¶
The StarBLAST-HPC Setup is designed to distribute BLAST searches across multiple nodes on a High-Performance Computer and uses a Master-Worker set-up similar to StarBLAST-Docker (an atmosphere instance as the Master, and the HPC as the Worker). It is suggested that the Worker is set up ahead of time.
Some command line knowledge is required for setup.
HPC Requirements and Setup¶
It is important that the following software are installed on the HPC:
- iRODS
- ncbi-blast+ version 2.9.0 or newer
- CCTools version 7.0.21 or newer
- glibc version 2.14 or newer
- Support for CentOS7
- CyVerse user account
1. Make both ncbi-blast+ and CCTools available in your home directory, which can be found using
cd
pwd
It should output something similar to
/home/<U_NUMBER>/<USER>/
2. Download the software (BLAST+ and CCTools), un-tar, and add to path using (links above)
wget <BLAST_URL or CCTOOLS_URL>
tar -xvf <BLAST_repo.tar.gz or CCTOOLS_repo.tar.gz>
export PATH=$HOME</PATH/TO/BLAST/BIN/>:$PATH
export PATH=$HOME</PATH/TO/CCTOOLS/BIN/>:$PATH
Note
CCTools only works if your HPC has glibc version 2.14 or newer. In the following examples, glibc and BLAST+ are loaded through module load.
3. BLAST databases need to be downloaded in a <DATABASE>/
directory in the home folder.
/home/<U_NUMBER>/<USER>/<DATABASE>/
Note
An example of BLAST databases can be downloaded with iRODS here: /iplant/home/cosimichele/200503_Genomes_n_p
.
Read how to initiate iRODS below.
Launching Workers on the HPC¶
The HPC uses a .pbs and qsub system to submit jobs.
1. Create a .pbs
file that contains the following code and change the <VARIABLES>
to preferred options:
#!/bin/bash
#PBS -W group_list=<GROUP_LIST>
#PBS -q windfall
#PBS -l select=<N_OF_NODES>:ncpus=<N_OF_CPUS>:mem=<N_MEMORY>gb
#PBS -l place=pack:shared
#PBS -l walltime=<MAX_TIME>
#PBS -l cput=<MAX_TIME>
module load blast
module load unsupported
module load ferng/glibc
module load singularity
export CCTOOLS_HOME=/home/<U_NUMBER>/<USER>/<CCTOOLS_DIRECTORY>
export PATH=${CCTOOLS_HOME}/bin:$PATH
cd /home/<U_NUMBER>/<USER>/<WORKERS_DIRECTORY>
MASTER_IP=<MASTER_IP>
MASTER_PORT=<PORT_NUMBER>
TIME_OUT_TIME=<TIME_OUT_TIME>
PROJECT_NAME=<PROJECT_NAME>
/home/<U_NUMBER>/<USER>/<CCTOOLS_DIRECTORY>/bin/work_queue_factory -T local -M $PROJECT_NAME --cores <N_CORES> -w <MIN_N_WORKERS> -W <MAX_N_WORKERS> -t $TIME_OUT_TIME
An example of a .pbs
file running on the University of Arizona HPC:
#!/bin/bash
#PBS -W group_list=lyons-lab
#PBS -q windfall
#PBS -l select=2:ncpus=12:mem=24gb
#PBS -l place=pack:shared
#PBS -l walltime=02:00:00
#PBS -l cput=02:00:00
module load blast
module load unsupported
module load ferng/glibc
module load singularity
export CCTOOLS_HOME=/home/u12/cosi/cctools-7.0.19-x86_64-centos7
export PATH=${CCTOOLS_HOME}/bin:$PATH
cd /home/u12/cosi/cosi-workers
MASTER_IP=128.196.142.13
MASTER_PORT=9123
TIME_OUT_TIME=1800
PROJECT_NAME="starBLAST"
/home/u12/cosi/cctools-7.0.19-x86_64-centos7/bin/work_queue_factory -T local -M $PROJECT_NAME --cores 12 -w 1 -W 8 -t $TIME_OUT_TIME
In the example above, the user already has blast installed (calls it using module load blast
). The script will submit to the HPC nodes a minimum of 1 and a maximum of 8 workers per node.
2. Submit the .pbs
script with
qsub <NAME_OF_PBS>.pbs
Setting Up the Master VM on the Cloud Service¶
Set up the Master instance for starBLAST-HPC by following the same steps as for StarBLAST-Docker, but without adding the Master deployment script. Additionally, BLAST databases need to be loaded manually onto the <DATABASE>/
folder.
Once the VM is running, access it through ssh or by using the Web Shell (“Open Web Shell” button on your VM’s page). Once inside follow the next steps.
Note
IMPORTANT: THE PATH TO THE DATABASE ON THE MASTER NEED TO BE THE SAME AS THE ONE ON THE WORKER
1. Ensure the databases on both the Master VM and Worker HPC are in the same directory. On the Worker HPC go to the <DATABASE>/
directory and do
pwd
Then, on your Master VM, create the directory with the same path output above
mkdir -p SAME/PATH/TO/HPC/DATABASE/DIRECTORY/
2. Now the <DATABASE>/
directories have been set up to contain the desired databases. You can use the same databases preset for StarBLAST-Docker or make your own from a .fasta (or .fa, .faa, .fna)
file using BLAST+’s makeblastdb referenced in StarBLAST-VICE. Both require iRODS (JetStream comes with iRODS pre-installed) and a CyVerse account.
Access iRODS using:
iinit
You will be prompted to connect to the CyVerse with:
host name (DNS): data.cyverse.org
port #: 1247
username: <CyVerse_ID>
zone: iplant
password: <CyVerse_password>
3. Once connected, retreive and move the databases to your <DATABASE>/
folder (shown for preset):
iget -rKVP /iplant/home/cosimichele/200503_Genomes_n_p
mv GCF_* /DATABASE/DIRECTORY/
4. Move the databases to the HPC using either sftp
or the steps as above if your HPC system has iRODS.
5. Use this code within the Master instance to launch sequenceServer:
docker run --rm --name sequenceserver-scale -p 80:3000 -p 9123:9123 -e PROJECT_NAME=<PROJECT_NAME> -e WORKQUEUE_PASSWORD=<PASSWORD> -e BLAST_NUM_THREADS=<N THREADS> -e SEQSERVER_DB_PATH="/home/<U_NUMBER>/<USER>/<DATABASE_DIRECTORY>" -v /DATABASE/ON/MASTER:/DATABASE/ON/WORKER zhxu73/sequenceserver-scale:no-irods
An example is:
docker run --rm --name sequenceserver-scale -p 80:3000 -p 9123:9123 -e PROJECT_NAME=starBLAST -e WORKQUEUE_PASSWORD= -e BLAST_NUM_THREADS=2 -e SEQSERVER_DB_PATH="/home/u12/cosi/DATABASE" -v /home/u12/cosi/DATABASE:/home/u12/cosi/DATABASE zhxu73/sequenceserver-scale:no-irods
Note
The custom Database folder on the Master needs to have read and write permissions
Start BLASTING! Now anyone can enter the <MASTER_IP_ADDRESS>
in their browser to access SequenceServer.