GenomicAlignments: counting transcripts versus counting genes
2
0
Entering edit mode
brt381 • 0
@brt381-9339
Last seen 6.6 years ago
Canada

I used Tophat to map some RNA-seq reads from four different samples against a reference genome (Arabidopsis), and then used SummarizeOverlaps to make a table of raw counts.  I tried to generate these counts for both genes and transcripts; to extract the gene annotations from my GFF file, I used the following line of code:

exbygene = exonsBy(TxDb_obj, "gene")

and for transcripts:

exbytranscript = exonsBy(TxDb_obj, "tx", use.names=TRUE)

Then I run:

se_gene = summarizeOverlaps(exbygene, files, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand=TRUE)

se_transcript = summarizeOverlaps(exbytranscript, files, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand=TRUE)

write.table(assays(se_gene), "bygene.txt")

write.table(assays(se_transcript), "bytranscript.txt")

As an example of my problem/question, in bygene.txt, I get the following counts for gene ID AT1G01040 (each number represents the count from one of my four samples)

954 723 895 614

In bytranscript.txt, I get the following counts for AT1G01040.1 and AT1G01040.2

AT1G01040.1: 38 19 25 14

AT1G01040.2: 3 2 3 6

My question is, why is there such a discrepancy?  Shouldn't the counts for the two transcripts sum to the counts for the gene?

Thanks in advance for any help!

genomicalignments • 2.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Almost certainly not. See page 3 in the read counting vignette. Any read that overlaps two or more exons in its entirety will be ignored. If you want to do things at the transcript level, you will probably be better off using kallisto or salmon. There is a readKallisto() function in the devel version of SummarizedExperiments if you want to be all cutting edge and stuff. Or you can use sleuth or Rob Patro's fork (for salmon) to analyze your data, which I guess is cutting edge as well.
 

ADD COMMENT
1
Entering edit mode

Mike Love is also working on this package, which is relevant: https://github.com/mikelove/tximport

 

ADD REPLY
0
Entering edit mode

The findCompatibleOverlaps function in GenomicAlignments might be helpful in this context.

ADD REPLY
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 8 months ago
United States

If you want to obtain valid exon-level counts for transcripts with summarizeOverlaps() then most likely you want to set inter.feature=FALSE. The default is TRUE, meaning the reads mapping to exons ranges shared among more than one transcript will be ignored in the read counts. The latter will be much more frequently the case for your exbytranscript range set than exbygene.

ADD COMMENT

Login before adding your answer.

Traffic: 568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6