RNA-seq: EdgeR's togtags table not correlating to htseq-count's CPM values
1
0
Entering edit mode
romsdahl • 0
@romsdahl-12059
Last seen 7.3 years ago

I am analyzing RNA-seq data using a hisat2 --> htseq-count --> edgeR pipeline. To do this, I am feeding CPM values generated by htseq-count into edgeR, which produces a "toptags" table, or list of differentially expressed genes with their associated logFC, logCPM, and p-value. I have noticed that often times genes that are differentially expressed in the toptags table don't seem to correlate to htseq-count's CPM values. For example, the toptags table generated by edgeR shows that for one gene there is a logFC increase of 0.73, however it doesn't look like there are any significant differences in CPM values between conditions- control: 766, 874, 881 compared to test: 706, 766, 1095. I am not sure what to make of this, and it seems like every time I find an interesting differentially expressed gene I have this same problem.
 

Thank you in advance for your help.

edger htseqcounts rnaseq • 1.1k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 20 hours ago
The city by the bay

For starters, edgeR requires raw read counts as input, not the CPM values - have a look at Section 2.7.6 of the edgeR user's guide. Supplying CPM values will break edgeR's attempts to model the mean-variance relationship, especially at low "counts". Applying TMM normalization to CPM values won't make much sense, either, which might explain the inconsistency with the log-fold changes.

ADD COMMENT

Login before adding your answer.

Traffic: 632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6