Data Analysis Sessions
The bioinformatics core hosts a data analysis session (Data-ton) every other Friday from 10:00 am to 12:00 pm. The format is to briefly introduce you to the available pipelines followed by analysis of your dataset.
If you have an RNA-Seq, ChIP-Seq, ATAC-Seq or smallRNA-Seq dataset that you would like to have an analysis jump-start, and would like to participate in a session, send an email to Biocore. There are a limited number of openings for each session.
The idea is for you to get a jolt with hands-on sequence analysis. Datasets can be generated by you or they can be downloaded from the SRA (http://www.ncbi.nlm.nih.gov/Traces/sra/). For the data-ton, we will assume that you know the biology, the purpose is to help you with any questions you have related to the analysis.
For datasets that are not standard ChIP-Seq, RNA-Seq or smallRNA-Seq, we may not be as helpful, but we will use the experience to design new pipelines.
Prerequisites to participating in a data-ton:
1.) Register for access to the HPCC (High Performance Computing Cluster). The registration form can be found at MGHPC. Once the HPCC Admins group receives your registration form, they will send an email to your PI requesting the PI’s permission to give you access. After it’s approved you will receive an email from the HPCC Admins group with your HPCC account user name.
3.) After joining the Galaxy group, make sure you can access the services, run the programs and pipelines. Log in to our in-house galaxy mirror using your UMMS log in credentials.
4.) Galaxy Keys: To use the pipelines under UMass Tools in galaxy, you need to run a script in the cluster. This time you are going to use your HPCC user ID.
From your terminal, connect to the cluster:
Run the script below;
This is a one time script that will allow galaxy to submit future jobs to the cluster on your behalf. Send “the output” of this script to Biocore. We will make sure that you successfully added the keys to your cluster system.
5.) Project Space Requirements: Consult HPCC-Admins for your project space requirements. For example; typically 6 RNA-Seq libraries (5G to 10G each) require at least 500G of space to store the data and run the pipelines. Confirm you have the necessary space for your project.
6.) RStudio Service: Log in to Rstudio service using your UMMS log in credentials.
We recommend you begin to learn R. There is a tutorial available at tryr.codeschool.com. Learning R, or at least familiarizing yourself with it, will speed up adapting to the analysis results.
7.) Next you will need to know, what you have in your directories in the cluster. There are three directories:
- (~/galaxy directory) this will be your home directory for Rstudio
- (~/galaxy/pub directory) this will be visible from. http://galaxyweb.umassmed.edu/galaxy/your_cluster_user
- (~/galaxy/pub/uploaddir directory) this can be used for larger files in galaxy