DeepSeq FAQs

These FAQs are in no particular order ... they are posted as they come in, with the most recent at the top.

Adapters? Primers? What works with what?

TrueSeq DNA/RNA, the old PE (Paired-End) Genomic DNA, Nextera, and the old Paired-End Multiplexing adapters work on all Illumina platforms and can be read as a single read or paired read. (The side 1 paired-read site is the same as the single-read site.)  TruSeq small RNA adapters will work on all platforms, but can only be used for a single forward read. However, the OLD small RNA adapters work only for a forward read on Single Read flowcells on the HiSeq2000 or GA, and we strongly recommend you do not use them to build new libraries.
IF YOU USE THE ILLUMINA ADAPTERS WITH THE INTERNAL INDEXES, AND YOU WANT THOSE READ, YOU MUST ORDER A MULTIPLEX READ when you submit your sample. This is an extra priming and sequencing read and will cost extra. See the FAQ about indexes vs barcodes if you are interested in doing the barcoding within the main fragment read (we like that since it's cheaper and quicker and works well).

Barcodes? Indexes? How do I use them?

(See the MULTIPLEXING and BARCODING document found under Helpful Links on the main page.)
Illumina instruments can read barcodes and indexes. There are 2 ways this works (usually). You can add an index or barcode to your sequence when you build. This is generally added to the 3' end of the upstream adapter (next to the insert) so that this is the first 4-6 bases read during the sequencing run, then you sort the data by these first bases into your groups.  If you do this, keep them to 6 or less bases and please design them to be as clear as possible. We recommend that you design in such a way that if one base were mis-read or the first base was missed, you could still tell which barcode it is (you can also look at commercially available or previously published lists and use those). Additionally, please design a sequence (and check) that is NOT part of the sequencing primer or part of your PCR primers for obvious reasons.
The other way to do barcoding/indexing is to insert the index sequence into the attachment oligo/linker/adpter and perform an additional sequencing step to prime and read the sequence in the linker. This will cost more since there is an additional sequencing read. It will also take longer to collect data and do the basecalling. If you can spare the first 4-6 bases of sequence read, we recommend the first option. It's faster and cheaper. If you use the linker-based barcodes with the additional sequence read, please use the sequences provided in the support documents from Illumina.

Where is the Fragment Analyzer data for my sample? Am I billed for that?

The Fragment Analyzer trace is part of the initial QC done by the core lab before your sample is put on one of the next-gen sequencers. If you request it, we will send it to you for your information and as part of your data set. If there is a problem with your sample which might affect it's performance, we will contact you before running and you have the opportunity to adjust the sample (concentrate it, further size-select, etc). A quick response (within 48 hours please) to this is important to keep your sample moving through the queue. If we do not hear back from you, the Core will make a judgement call on moving forward with the analysis.  You are not billed for Fragment Analyzer analysis, which is part of sample QC. However if you want to have samples run on the Fragment Analyzer for any reason, the MBCL offers a very economical service for RNA and DNA/genomic samples (http://www.umassmed.edu/nemo/mbcl/fragment-analyzer-service/).  If a user submits samples "to see if they are good" in order to get Fragment Analyzer info before moving forward or to assist in the selection of which samples to have sequenced and which to withdraw, we will invoice for the Fragment Analyzer run(s), as these use reagents and resources. If you need to see how your library looks when finished, or better yet during stages of construction, we strongly recommend using the MBCL service (see link above).

Why do I see lots of "BBBB" in my fastq quality info when the sequences map great and appear to be good reads?

This is a very good question, one which we at the core lab have been asking as well. It appears to be an "undocumented feature"!! There is some chat about it online (try this for a start, thanks David, http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/). We see it most often when there are non-random bases at the ends of sequences. Non-random bases (e.g. linkers) raise flags for the analysis software. The base calls maybe (and usually are) good, but they don't look like the rest of the run to the instrument. We welcome feedback about your experiences with this so we can share details with Illumina. Please continue to keep us updated about your data analysis experiences.

Why do HiSeq4000 run types cost more than the runs on the HS2000 and MiSeq?

The reagents cost more, the flow cells cost more and overall they cost more to operate. On the other hand if you build a good library, you'll get a lot more data! As with the other instruments the number of good reads is determined by the LENGTH of the library (length is figured into how many clusters can be seeded), how tightly the library is sized (see below), and the content and quality of the library.

What are the prices for the various types of runs/analyses available?

Pricing for UMass investigators can be obtained by sending an email to DeepSequencingCoreLabs@umassmed.edu and asking for a price list. Investigators at other institutions should contact us for pricing as there are different schedules set by the administration for outside users.
WHY AREN'T THE PRICES ON THE WEBSITE? These web pages are viewable outside UMassMed and our agreements with vendors for discounted reagents etc (which we pass on to our users) prevent us from publishing those prices outside our group

When will my sample be run? How long will it take? Why don't I have my data yet? Can I move up in the queue? etc etc

We operate on a first in, first on basis for our investigators. However we cannot always control the SPEED at which the queue moves forward. If instruments are not performing up to specs, they are taken off-line and serviced (otherwise you get bad data). When there are issues related to infrastructure, computing issues, instrument failure, power failure, etc. and we need to reset a run, this also delays the queue. The lab staff cannot change the queue, so please don't ask them.
See the Queue-Specific FAQ page for more details.

What can I do about getting my data as fast as possible?

Make the best library you can! The guidelines for library construction and validation are there to ensure good performance. The Topo cloning and pre-sequencing is important to determine whether the linkers, adapters, sequence priming site, and attachment sequences are in place. The sizing guidelines are extremely important (see FAQ below about sizing). If your sample covers a wide size range, it's better to make several size cuts and turn in several sub-libraries. When libraries don't sequence well, they end up getting reworked and rerun, which translates to more expense for the investigator and longer queue times.

In the ChIP protocol from the core lab, what do you mean by "reversing the orientation of the column" when you elute?

This refers to rotating the plate or column 180 degrees so that the area near the spindle is now at the furthest point from the center of the rotor. This does not mean to turn it upside down!

Is it necessary to order the adapters and primers for the Illumina libraries from the vendor or the core lab or can I make my own?

Adapters can be home-made and in fact this is the best way to add an internal barcode. Just be sure the phosphate is on the 5' end of the downstream adapter. Email the Core or use your Illumina account to obtain a copy of the latest adapter sequence list. If using custom adapters, remember that acceptable Tm differs by instrument, so you must consider that during the design stage. They also should not interfere with any of the primers in the standard Illumina mixes. It is recommended that you consult with the Core on custom adapter sequences.

If my run fails, do I have to pay for it? Why am I billed for a run if it wasn't as much data as I need?

You pay for a "run", not a fixed amount of data. Billing is not determined by the output, in other words you don't pay by the megabase. Billing is also not dependent upon performance. We are an at-cost not-for-profit facility. Running your sample uses reagents and supplies so you are billed for the sample/run. If there is a failure due to reagents or instrument performance we will re-run THAT sample again at no additional charge. If the failure was due to library construction issues, e.g. you used non-modified home-made primers, or there was some other problem related to the sample, you pay for any re-runs (after the problem is fixed, hopefully).

Is there any way I can evaluate the run quality? Do you have anything I can run on my own computer to look at the data without using the High Performance Cluster (HPCC)?

We are starting a collection of things, and looking for suggestions; click Resources in the left nav. There is also information at BioTools and on the Wiki.

Do I really need to size my library and keep it +/-25bp from the median size? WHY?

YES, it's a very good thing to do if you want the most and best sequence possible.
Cluster formation is more even if all fragments are the same size, we load based on the largest fragment in the group so if the size spread is wide you'll get fewer clusters/sequences, and cluster detection is more accurate when all clusters are the same size (wide range of fragment sizes results in wide range of cluster sizes). Additionally, very large fragments don't stay denatured as long and will often just roll through or even interfere with the annealing of the smaller fragments.

Why are you so picky about all these gel purifications and being so "clean" during library prep?

Because a little bit of garbage going into the library means MILLIONS of junk sequences coming out of your analysis!

▴ Back To Top
Section Menu To Top