IQSeq: integrated isoform quantification analysis based on next-generation sequencing.

TitleIQSeq: integrated isoform quantification analysis based on next-generation sequencing.
Publication TypeJournal Article
Year of Publication2012
AuthorsDu J, Leng J, Habegger L, Sboner A, McDermott D, Gerstein M
JournalPLoS One
Volume7
Issue1
Paginatione29175
Date Published2012
ISSN1932-6203
KeywordsAlgorithms, Animals, Annelida, Computational Biology, Computer Simulation, Embryo, Nonmammalian, Gene Expression Regulation, Developmental, High-Throughput Nucleotide Sequencing, Humans, Likelihood Functions, Models, Biological, Models, Theoretical, RNA Isoforms, Software, Systems Integration, T Cell Transcription Factor 1, Transcriptome
Abstract

With the recent advances in high-throughput RNA sequencing (RNA-Seq), biologists are able to measure transcription with unprecedented precision. One problem that can now be tackled is that of isoform quantification: here one tries to reconstruct the abundances of isoforms of a gene. We have developed a statistical solution for this problem, based on analyzing a set of RNA-Seq reads, and a practical implementation, available from archive.gersteinlab.org/proj/rnaseq/IQSeq, in a tool we call IQSeq (Isoform Quantification in next-generation Sequencing). Here, we present theoretical results which IQSeq is based on, and then use both simulated and real datasets to illustrate various applications of the tool. In order to measure the accuracy of an isoform-quantification result, one would try to estimate the average variance of the estimated isoform abundances for each gene (based on resampling the RNA-seq reads), and IQSeq has a particularly fast algorithm (based on the Fisher Information Matrix) for calculating this, achieving a speedup of ~ 500 times compared to brute-force resampling. IQSeq also calculates an information theoretic measure of overall transcriptome complexity to describe isoform abundance for a whole experiment. IQSeq has many features that are particularly useful in RNA-Seq experimental design, allowing one to optimally model the integration of different sequencing technologies in a cost-effective way. In particular, the IQSeq formalism integrates the analysis of different sample (i.e. read) sets generated from different technologies within the same statistical framework. It also supports a generalized statistical partial-sample-generation function to model the sequencing process. This allows one to have a modular, "plugin-able" read-generation function to support the particularities of the many evolving sequencing technologies.

DOI10.1371/journal.pone.0029175
Alternate JournalPLoS One
PubMed ID22238592
PubMed Central IDPMC3253133
Related Faculty: 
Andrea Sboner, Ph.D.

Pathology & Laboratory Medicine 1300 York Avenue New York, NY 10065 Phone: (212) 746-6464
Surgical Pathology: (212) 746-2700