Laboratories are able to generate dozens of samples per day and hundreds of libraries every month. The bottleneck becomes the processing and analysis of this ever increasing stream of data. Sequence data processing usually involves multiple programs to perform analysis, e.g. read alignment, peak calling, genome or transcript assembly and quantification.
- are not designed to process data from “end to end” and take raw input to usable results; instead they are designed and optimized for specific steps in the process.
- Approaches such as Galaxy, GenePattern and GeneProf attempt to solve this problem by allowing users to build “pipelines” that string specialized programs into end-to-end processes that take raw data into a form that is suitable for analysis. As a result they handle a single sample at a time.
- make no effort to keep experimental details (i.e. metadata). Consequently, they are not well suited to handle the large experiments that are now commonplace.
- a parallel platform designed to process raw sequence data with the specific goal of handling large datasets.
- Dolphin keeps metadata information about the experimental conditions and provides an integrated processing and analysis platform.
- It allows users with limited bioinformatics experience to analyze large numbers of samples on High Performance Computing (HPC) systems through a user-friendly web interface.
- The UI allows searching, viewing metadata and controlling pipeline (re)execution. Visualization modules show quality results and allow sample comparisons with various plots.
- Dolphin can back up the files to cloud based storage such as Amazon S3 for easy data sharing, and all files can be uploaded to NCBI Geo and the ENCODE project upon publication.