Setup of Blythe Lab Resources on Quest p31603

To run our standard analyses, we need to build resources. We like to use TrimGalore/cutadapt to trim adapters. We need bowtie (and HiSat) indices to map our reads to. This document just indicates what steps were taken to customize the lab's p31603 environment to do our analyses.

Note

It should not be necessary for any lab user to need to repeat any of these operations. This information is provided mainly as a record of what was done. Any future customization will be logged in this document.

installing TrimGalore and Cutadapt

TrimGalore and cutadapt run within a virtual environment that we need to set up.

I checked that virtualenv was installed by running which virtualenv. I have it in my Quest environment, but it may be from a prior custom installation, and may not be included in new Quest accounts.

If necessary to install virtualenv:

pip install virtualenv 

To install TrimGalore, I first built within p31603 a python3 virtual environment using

virtualenv -p python3 TrimGaloreEnv

Next, I changed to the new directory TrimGaloreEnv and activated the environment:

source bin/activate

I then installed TrimGalore using git:

git clone https://github.com/FelixKrueger/TrimGalore.git

I then switched to the TrimGalore directory and copied the executable trim_galore to the bin directory of the virtual environment by entering

cp trim_galore ../bin

We next need to install cutadapt and its dependencies. First, we need cython. To install it, we run

pip install cython

Next, we need setuptools_scm. This is also installed via pip in a manner similar to the line above.

We then install cutadapt:

python3 -m pip install --upgrade cutadapt

Finally, to allow for parallel processing of TrimGalore jobs, we need to install pigz (parallel gzip).

git clone https://github.com/madler/pigz.git

We also have to switch to the pigz directory, 'make' it, and copy the executable into the parental bin directory.

cd pigz
make
cp pigz ../bin

Note

TrimGalore likes to use FastQC as well, but this is available as a Quest Module. (module add fastqc/0.11.5), so we do not need to install it.

downloading the Drosophila genome

I have made a folder called "Genomes" in the p31603 directory. From within Genomes, I downloaded the fly genome using

wget --timestamping 'ftp://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gz'

Building a bowtie2 index

Bowtie2 indices will be put in subdirectories within p31603/Bowtie_Indices/.

To build the indices, it is a good idea to start an interactive session on b1042.

srun -A b1042 --partition=genomicsguestA -N 1 -n 24 --mem=64G --time=12:00:00 --pty bash -i

Then,

module add bowtie2
cd Bowtie_Indices
mkdir dm6
cd dm6
bowtie2-build --threads 16 ../../Genomes/dm6.fa.gz dm6

Downloading the Khost/Larracuente Repbase and Bowtie indexing.

This database is published in Khost... Larracuente Genome Research 2017.

The data is in supplemental file 9.

(https://genome.cshlp.org/content/suppl/2017/04/03/gr.213512.116.DC1/Supplemental_file_S9.txt)

This was downloaded and placed on Quest. As above, the indices were built...

srun -A b1042 --partition=genomicsguestA -N 1 -n 24 --mem=64G --time=12:00:00 --pty bash -i

Then,

module add bowtie2
cd Bowtie_Indices
mkdir Dmel_Repbase
cd Dmel_Repbase
bowtie2-build --threads 16 ../../Genomes/Khost_Larracuente_2017_Repbase.txt dmRep

This goes super quick. All done.

Downloading the Clogima albipunctata genome

I navigated to the DNAZoo website. The link under the Clogmia albipunctata genome for the current assembly (v6) seemed to be broken. I managed to find a download at:

(https://dnazoo.s3.wasabisys.com/index.html?prefix=Clogmia_albipunctata__clogmia.6/)

Date: 2/25/2023

This was manually downloaded and copied to the Genomes directory in p31603, filename clogmia.6_HiC.fasta.gz

Bowtie index for C. alb

I set up an interactive session on b1042 as described above. Then:

module add bowtie2
cd Bowtie_Indices
mkdir Calb6
bowtie2-build --threads 16 ../../Genomes/clogmia.6_HiC.fasta.gz calb6