Question

Dispersion estimates with DESeq2

1

Entering edit mode

Julia ▴ 10

@856081ce

Last seen 20 months ago

United Kingdom

Essentially, due to low input RNA (issue with sequencer), there's a high number of 0s in the gene count matrix with featurecounts, and this gives a poor dispersion estimate (shown below, with ~60,000 genes) enter image description here

Then to deal with this, I filtered out the lowly expressed genes (with idx <- rowSums( counts(dds, normalized=TRUE) >= 5 ) >= 3)... which gives a slightly better fit (now use ~7000 genes) enter image description here

Or alternatively, using HTSeq for assigning features and then also removing any genes where all counts were 0 (with ~14,000 genes) the dispersion estimate looks like this... enter image description here

Which looks odd to me as it goes up and then down, but the fit looks very good? But essentially, I don't know whether filtering out the low genes, or use HTSeq rather than featurecounts is a better way of handling the data - I get differences in terms of PCAs (outliers etc) and differential gene expression (with some overlap)

Many thanks for any help or advice!

dispersionestimates dispersion DESeq2 estimates deseq2 • 885 views

ADD COMMENT • link updated 20 months ago by Michael Love 41k • written 21 months ago by Julia ▴ 10

score 0 · Answer 1 · 2022-08-02

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 7 hours ago

United States

The second plot seems fine, just change the y-axis to 1e-3 so you can see the data better.

ADD COMMENT • link 20 months ago Michael Love 41k