Question

Dispersion estimates with DESeq2

1

Entering edit mode

Julia ▴ 10

@856081ce

Last seen 2.4 years ago

United Kingdom

Essentially, due to low input RNA (issue with sequencer), there's a high number of 0s in the gene count matrix with featurecounts, and this gives a poor dispersion estimate (shown below, with ~60,000 genes) enter image description here

Then to deal with this, I filtered out the lowly expressed genes (with idx <- rowSums( counts(dds, normalized=TRUE) >= 5 ) >= 3)... which gives a slightly better fit (now use ~7000 genes) enter image description here

Or alternatively, using HTSeq for assigning features and then also removing any genes where all counts were 0 (with ~14,000 genes) the dispersion estimate looks like this... enter image description here

Which looks odd to me as it goes up and then down, but the fit looks very good? But essentially, I don't know whether filtering out the low genes, or use HTSeq rather than featurecounts is a better way of handling the data - I get differences in terms of PCAs (outliers etc) and differential gene expression (with some overlap)

Many thanks for any help or advice!

dispersionestimates dispersion DESeq2 estimates deseq2 • 1.1k views

ADD COMMENT • link updated 2.5 years ago by Michael Love 43k • written 2.5 years ago by Julia ▴ 10

score 0 · Answer 1 · 2022-08-02

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The second plot seems fine, just change the y-axis to 1e-3 so you can see the data better.

ADD COMMENT • link 2.5 years ago Michael Love 43k