How does DESeq2 calculate FPKM values when gene lengths are not supplied?
2
0
Entering edit mode
@hassebossenbroek-23193
Last seen 4.4 years ago

Hi,

I posted this question to Biostars previously, but realised it probably belongs here. Sorry about that.

I have a table of FPKM values generated by DEseq2, and I'm trying to find out what DEseq2 uses as gene lengths when these are not supplied (I'm trying to assess to what extent my results are likely to change by supplying these).

According to the manual, "feature length is calculated from the rowRanges of the dds object, if a column basepairs is not present in mcols(dds). The calculated length is the number of basepairs in the union of all GRanges assigned to a given row of object, e.g., the union of all basepairs of exons of a given gene."

Does that mean DEseq2 directly uses the ranges obtained by using rowRanges(dds)? I'm comparing these values to those obtained using the function getGeneLengthAndGCContent from the EDAseq package. The rowRanges values from DEseq2 are sometimes very close to those obtained with EDAseq, but sometimes they differ by a factor of 10. Can someone explain to me how this discrepancy is caused? Or am I simply looking at the wrong values?

Thank you, Best wishes, Hasse

deseq2 • 2.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
@mikelove
Last seen 7 days ago
United States

I think it's fairly clear from the text in the DESeq2 manual.

By the way, if you want better estimates of the gene length, I'd recommend to use tximport. The sum of the exonic basepairs doesn't necessarily capture the length of the expressed transcripts very well.

ADD COMMENT

Login before adding your answer.

Traffic: 1070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6