Header

UZH-Logo

Maintenance Infos

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences


Soneson, Charlotte; Love, Michael I; Robinson, Mark D (2015). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research, 4:1521.

Abstract

High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Abstract

High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Statistics

Citations

Altmetrics

Downloads

24 downloads since deposited on 29 Jun 2016
14 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Molecular Life Sciences
Dewey Decimal Classification:570 Life sciences; biology
Language:English
Date:29 February 2015
Deposited On:29 Jun 2016 14:52
Last Modified:20 Aug 2017 02:25
Publisher:Faculty of 1000 Ltd.
ISSN:2046-1402
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.12688/f1000research.7563.2
PubMed ID:26925227

Download

Download PDF  'Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 1MB
View at publisher