Question

How does DESeq2 calculate FPKM values when gene lengths are not supplied?

0

Entering edit mode

hasse.bossenbroek • 0

@hassebossenbroek-23193

Last seen 4.8 years ago

Hi,

I posted this question to Biostars previously, but realised it probably belongs here. Sorry about that.

I have a table of FPKM values generated by DEseq2, and I'm trying to find out what DEseq2 uses as gene lengths when these are not supplied (I'm trying to assess to what extent my results are likely to change by supplying these).

According to the manual, "feature length is calculated from the rowRanges of the dds object, if a column basepairs is not present in mcols(dds). The calculated length is the number of basepairs in the union of all GRanges assigned to a given row of object, e.g., the union of all basepairs of exons of a given gene."

Does that mean DEseq2 directly uses the ranges obtained by using rowRanges(dds)? I'm comparing these values to those obtained using the function getGeneLengthAndGCContent from the EDAseq package. The rowRanges values from DEseq2 are sometimes very close to those obtained with EDAseq, but sometimes they differ by a factor of 10. Can someone explain to me how this discrepancy is caused? Or am I simply looking at the wrong values?

Thank you, Best wishes, Hasse

deseq2 • 2.4k views

ADD COMMENT • link written 5.1 years ago by hasse.bossenbroek • 0

0

Entering edit mode

Biostars post: https://www.biostars.org/p/429296/

ADD REPLY • link 5.1 years ago Kevin Blighe ★ 4.0k

score 1 · Answer 1 · 2020-03-27

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

I think it's fairly clear from the text in the DESeq2 manual.

By the way, if you want better estimates of the gene length, I'd recommend to use tximport. The sum of the exonic basepairs doesn't necessarily capture the length of the expressed transcripts very well.

ADD COMMENT • link 5.1 years ago Michael Love 43k