Primary Data Pipeline

The IGB has developed an automated pipeline to manage and analyze the high-throughput sequencing data produced by the GHTF for its clients at UCI and elsewhere. The pipeline manages the data generated during the runs of the high-throughput sequencers in the facility and does the primary analysis of the data to extract the sequences and corresponding quality scores for each sample in a given run.  The pipeline uses a combination of commercial and open source software, as well as software developed in house in the Bioinformatics Laboratory. The scripts in the first part of the pipeline monitor permanently the sequencing runs at the facility and the data sent to the IGB servers via the campus network during the runs. The intensities of the spots on the images and the base calls are computed directly during the run using Illumina's Real Time Analysis (RTA) software installed on a computer located in the GHTF. The results are sent to the IGB servers on a dedicated partition. Once the sequencer completes a run, the pipeline automatically generates the FastQ files containing the sequencing reads and the corresponding quality scores. The FastQ files are then compressed and made available for download to the client via the web server.  The client receives an email when their data are ready for download with a web address and instructions for retrieving the data.

 

 

Basic Pipeline Specifications

  • Secured 30TB high-quality storage
  • Data transferred from Illumina instrument to IGB via campus network (high speed compared to Illumina machine)
  • Data generated:
  • Image [~2-4TBs] 
  • Base calls with intensities [~100-200 GB compressed]
  • FASTQ files corresponding to read sequences with quality scores [~15GB compressed] generated by IGB using Illumina GERALD software

 

 Client Services

  • User friendly web interface to enter basic information (genome, experiment type, etc.)
  • Automatic notifications
  • Ability to download results from secure web ftp portal
  • Storage duration [Duration available free of charge]
  • Image [2 weeks]
  • Base Calls [2 months]
  • FASTQ files [1 year/downloadable]

 

IGB web interface for high-throughput sequencing