So I'm doing an RNA Seq experiment on a large (300+) dataset. I used DESeq2 like I usually do, and the dispersion plots look VERY different from what I've seen in the past and what is shown in the DESeq2 manual.
I'm not sure if this is due to the size of the dataset (hundreds rather than a dozen samples), or if something else is going on. But the dispersion plots look really odd and I'm not sure the modeling of the dispersion is functioning appropriately.
It also looks a different kind of weird depending on what terms I include in the model.
Has anyone else ever seen anything like this before?
My first recommendation would be to pre-filter genes that don't have a count of 10 in x or more samples (where for 300 samples, you might consider x=10 or some reasonable number that still allows detection of DE for one group compared to others). Usually this isn't necessary, but it seems like there are many such genes here and they are actually affecting the local fit, at least on the left side, so we want to remove those first.
Then, I would also want to look into the genes where you have very high dispersion estimates despite high mean value. You can do: