These are in no particular order ... they are posted as they come in, with the most recent at the top

Adapters? Primers? what works with what ????
PE (Paired-End) Genomic DNA primers work for GAII and
HiSeq and can be read as a single read or paired read. The side 1 paired read site is the same as the single read site. We like these, we stock them at the core lab and sell them to you at our cost.  TruSeq primers will work for the assays they indicate on the Illumina site and in the product protocol. For example, TruSeq DNA primers can be used for single or paired reads. However, TruSeq RNA primers can only be used for a single forward read.  IF YOU USE THE ILLUMINA ADAPTERS WITH THE INTERNAL BARCODES, AND YOU WANT THOSE READ, YOU MUST ORDER A MULTIPLEX READ when you submit your sample. This is an extra priming and sequencing read and will cost extra.  See the section just below this if you are interested in doing the barcoding within the main fragment read (we like that since it's cheaper and quicker and works well).

Barcodes? Indexes? How do I use them?  -- SEE BARCODE VS INDEX PAGE
Both Illumina and ABI-SOLiD instruments can read barcodes and indexes. There are 2 ways this works (usually). You can add an index or barcode to your sequence when you build. This is generally added to the 3' end of the upstream adapater so that this is the first 4-6 bases read during the sequencing run, then you sort the data by these first bases into your groups.  If you do this, keep them to 6 or less bases and please design them to be as clear as possible. We recommend that you design in such a way that if one base were mis-read or the first base was missed, you could still tell which bar code it is (you can also look at commercially available or previously published lists and use those). Additionally, please design a sequence (and check) that is NOT part of the sequencing primer or part of your PCR primers for obvious reasons.  The other way to do barcoding/indexing is to insert the index sequence into the attachment oligo/linker and perform and additional sequencing step to prime and read the sequence in the linker. This will cost more of course since there is an additional sequencing read. It will also take longer to collect data and do the basecalling. If you can spare the first 4-6 bases of sequence read, we recommend the first option. It's faster and cheaper. If you use the linker-based barcodes with the additional sequence read, please use the sequences provided in the support documents from ABI and Illumina (depending on which platform you are building for).  Both Illumina and ABI provide the sequences necessary to add these to your library construction.

I heard there is a special discounted smallRNA option for DeepSeq?
Yes, we are running 30 base reads on the GAII for $500 per lane (1 sample per lane).  We need 6 (or 7) samples to do this, so if you have a bunch or there are others in the queue to join in, we can do this type of analysis for you. The library must be built using the small RNA library protocol on our webpage (in the services section, Mello Lab protocol). Additionally, the linker sequences used for this are only compatible with the GAII instruments.

Why are you sending me the BioAnalyzer data for my sample? Am I billed for that?
The BioAnalyzer trace on a High-Sensitivity DNA Chip is part of the initial QC done by the core lab before your sample is put on one of the next-gen sequencers. We send it to you for your information and as part of your data set. If there is a problem with your sample which might affect it's performance, we note it in the email and you have the opportunity to adjust the sample (concentrate it, further size-select, etc). A quick response (within 48 hours please) to this is important. Unless your sample was a total bust, we will keep moving forward with the analysis unless we hear from you within 48 hours.  You are not billed for BioAnalyzer analysis which is part of sample QC, however if you want to have samples run on the BioAnalyzer for any reason, the MBCL offers a very economical BioAnalyzer service for RNA and DNA samples (
http://www.umassmed.edu/MBCL).  If a user submits samples "to see if they are good" in order to get BioAnalyzer info before moving forward or to assist in the selection of which samples to have sequenced and which to withdraw, we will invoice for the BioAnalyzer run(s) as these use reagents and resources. If you need to see how your library looks when finished, or better yet, during stages of construction, we strongly recommend using the BioAnalyzer service at the MBCL (see link above).

Why do I see lots of "BBBB" in my fastq quality info when the sequences map great and appear to be good reads?
This is a very good question, one which we at the core lab have been asking as well. It appears to be an "undocumented feature"!! There is some chat about it online (try this for a start, thanks David,
http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/ ). We see it most often when there are non-random bases at the ends of sequences. Non-random bases (e.g. linkers) raise flags for the analysis software. The base calls maybe (and usually are) good, but they don't look like the rest of the run to the instrument. We welcome feedback about your experiences with this so we can share details with Illumina. Please continue to keep us updated about your data analysis experiences.

Why do HiSeq run types (SR100, SR200, PE50, PE100, PE200) cost more than the runs on the GAIIs?
The reagents cost more, the flow cells cost more and overall they cost more to operate. On the other hand if you build a good library, you'll get a lot more data! As with the other instruments the number of good reads is determined by the LENGTH of the library (length is figured into how many clusters can be seeded), how tightly the library is sized (see below), and the content and quality of the library. On average, during the validation runs we are seeing 4-10X more reads per sample, some even higher.

What are the prices for the various types of runs/analyses available?
Pricing for UMass investigators can be obtained by sending an email to
DeepSequencingCoreLabs@umassmed.edu and asking for a price list. Investigators at other institutions should contact us for pricing as there are different schedules set by the administration for outside users. WHY AREN'T THE PRICES ON THE WEBSITE? These web pages are viewable outside UMassMed and our agreements with vendors for discounted reagents etc (which we pass on to our users) prevent us from publishing those prices outside our group.

How long is the queue? (and all variations on this question)....
The queue (or wait) has several components. If you are asking "How long from the time I give you my sample until I get my data?" Then we must include the time to perform QC on the new sample and any additional clean up required, the time spent building clusters/seeding flow cells/preparing emulsions and doing the WFA (work flow analysis for quality control), the duration of the imaging process on the instrument (varies with the length of the reads, multiplex read and any paired end read), the time to transfer the raw information to the High Performance Computing Cluster and the computational time required to resolve image information into basecalls and preliminary alignment to reference genome. The "queue" for a short-insert, well made library is usually a couple of weeks. The queue for a library with long inserts which is poorly made or which has a wide range of insert sizes can be over a month, especially if there needs to be effort spent on getting it into a useable condition. We send you the results of the BioAnalyzer evaluation for your records and if there are concerns at this point we contact you immediately. The sooner you can respond and help with any adjustments, the quicker the sample can move to the next step. Our new LIMS (part of the summer expansion of the DSCL) will have a status bar which indicates where your sample is in this process.
How can I get my sample through the queue faster?
Build a good, tightly-sized, clean library and share information with us about any residual linkers or primer sequences, barcodes, adapters, polyA stretches, even the results of your test sequencing of the Topo clones can be helpful.
What about the number of samples in the queue before mine?
We operate on a first-in first-on basis. With our new instrument additions and the retasking of certain instruments for specific run types, we are not experiencing significant delays due to backlogs of samples. If a user needs to analyze an inordinate number of samples which would set back everyone else, we will work with them to create a schedule that does not negatively impact the other users. (So if you have 120 samples, email us well in advance please).

What's up with the new data pick up system and WHERE the %#@& is Isilon?
The isilon "farline" storage is attached to HPCC03 (one of the head nodes of the High Performance Computing Cluster). HPCC03 is dedicated to moving data and is the place where all your rsync, rcp, scp, copy etc should be done. You can get there from the other locations on HPCC by
$ ssh hpcc03
your passwords etc are still the same as in other locations on the cluster. Once you are on hpcc03 you can cd into /isilon/deepseq/{LABNAME} and copy your data to your own space in /nearline or the R drive or directly to /scratch. Remember to ssh back to hpcc01 before you need to run anything or submit jobs to the cluster.

What's up with the SOLiD? I heard ABI is replacing them with 5500's and that there is some cool new RNA analysis stuff they can do?
Our SOLiD has been upgraded to a SOLiD4, the last upgrade before being totally converted to something else! The newly renovated instrument has been renamed "The HMS Beagle" in honor of Charles Darwin's science discovery vessel (SOLiD info can be reached from the nemo home page on the SOLiD link). A user workshop on May 4th will cover the changes and present the new tools available. Briefly, the Gene Expression tools and Transcriptome Analysis tools are being set up and we are working with investigators who have projects that would benefit from initial deep sequencing of a few samples to determine which Taqman Assays or microarray tools would be best suited for a high throughput analysis of many samples. The new SOLiD tools include new internal controls for more accurate copy number calculations as well as rRNA depletions and library building kits (including paired end!). Other analysis types are still supported.  

How can I get some assistance or consultation on using SOLiD for Gene Expression or Transcriptome analysis?
Our Applications Specialist is on campus the 3rd Friday of each month and is available to meet with you and/or your lab. Please take advantage of this special service. If you'd like to meet with him, please email Ellie and get on the schedule (
Ellen.Kittler@umassmed.edu)

When will my sample be run? How long will it take? Why don't I have my data yet? Can I move up in the queue? etc etc
We operate on a first in, first on basis for our investigators. However we cannot always control the SPEED at which the queue moves forward. If instruments are not performing up to specs, they are taken off-line and serviced (otherwise you get bad data). When there are issues related to infrastructure, computing issues, instrument failure, power failure, etc. and we need to reset a run, this also delays the queue. The lab staff cannot change the queue, so please don't ask them.

What can I do about getting my data as fast as possible?
Make the best library you can! The guidelines for library construction and validation are there to ensure good performance. The Topo cloning and pre-sequencing is important to determine whether the linkers, adapters, sequence priming site, and attachment sequences are in place. The sizing guidelines are extremely important (see FAQ below about sizing). If your sample covers a wide size range, it's better to make several size cuts and turn several sub libraries. When libraries don't sequence well, they end up getting reworked and rerun, which translates to more expense for the investigator and more samples in the queue.  

In the ChIP protocol from the core lab, what do you mean by "reversing the orientation of the column" when you elute?
This refers to rotating the plate or column 180 degrees so that the area near the spindle is now at the furthest point from the center of the rotor. This does not mean to turn it upside down!

Is it important to gel purify the sample after ligation of the Illumina adapters and before amplification with the PCR primers?
Yes.  It is important to run the ligation products on a gel and collect the insert plus adapters, EVEN IF YOU HAVE TO GUESS where it is based on the size markers. If this step is skipped, there are many more primer-dimers in the library! ALso, run the markers and other samples with at least one empty well in between.

Is it necessary to order the adapters and primers for the Illumina libraries from the vendor or the core lab or can I make my own?
Adapters can be home-made and in fact this is the best way to add an internal barcode or index, just be sure the phosphate is on the 5' end of the downstream adapter. The PCR Primers bring the attachment sites to the library. They also contain modifications (e.g. LNAs) to strength attachment during bridge amplification, and to provide the appropriate cleavage during the reverse bridging step (paired end analysis). The PCR primers should be obtained from Illumina. The core lab does buy them in bulk and will sell them to you at our cost.

Which version of the pipeline was used on my data? How do I tell? Is there documentation for the different versions of the pipeline?
The numbers after Bustard or Firecrest indicate which version was used. The documentation for these is available through the BioTools website. There is also information on the UMass Wiki. Links follow.
http://wiki.umassmed.edu/rc/wiki/index.php/IlluminaPipeline
http://biotools.umassmed.edu/BioCore/nextgen/

If my run fails, do I have to pay for it? Why am I billed for a run if it wasn't as much data as I need?
You pay for a "run" not a fixed amount of data. Billing is not determined by the output, in other words you don't pay by the megabase. Billing is also not dependent upon performance. We are an at-cost not-for-profit facility. Running your sample uses reagents and supplies so you are billed for the sample-run. If there is a failure due to reagents or instrument performance we will re-run THAT sample again at no additional charge. If the failure was due to library construction issues, e.g. you used non-modified home-made primers, or there was some other problem related to the sample, you pay for any re-runs (after the problem is fixed, hopefully).

What is the difference in the Single-Read and Paired-End adapters and which is best?
Single-Read Primers and Paired-End Primers are the same price. The Paired-End Primers can be used for Single-Read analysis, but you must tell us when you submit your sample that you only want the forward read!!!! The Single-Read Primers can only be used for single reads. If you are not sure, get the PE Primers, they go both ways! If you're building small RNA libraries, check the services page for the protocol and use the primers described there.

Is there any way I can evaluate the run quality? Do you have anything I can run on my own computer to look at the data without using the High Performance Cluster (HPCC)?
We are starting a collection of things, and looking for suggestions, click Resources in the left nav. There is also information at BioTools and on the Wiki.

What types of Illumina and SOLiD analyses are available right now?
At this time we have 2 Illumina GAII's, 1 ABI SOLiD 4, and 2 Illumina HiSeq 2000's (single read 100 and 200 bases and paired-end reads 100 bases). There is another instruments coming into service later in 2011 (stay tuned).  The SOLiD 4 is available for either an entire run or a mixed barcoded run. All ABI kits are supported.

Do I really need to size my library and keep it +/-25bp from the median size?
YES, it's a very good thing to do if you want the most and best sequence possible.
WHY?
Cluster formation is more even if all fragments are the same size, we load based on the largest fragment in the group so if the size spread is wide you'll get fewer clusters/sequences, and cluster detection is more accurate when all clusters are the same size (wide range of fragment sizes results in wide range of cluster sizes). Additionally, very large fragments don't stay denatured as long and will often just roll through or even interfere with the annealing of the smaller fragments.

Why are you so picky about all these gel purifications and being so "clean" during library prep?
Because a little bit of garbage going into the library means MILLIONS of junk sequences coming out of your analysis!