Bizarre Dispersion Plots
1
0
Entering edit mode
@stephenhartley-9155
Last seen 19 months ago
United States

So I'm doing an RNA Seq experiment on a large (300+) dataset. I used DESeq2 like I usually do, and the dispersion plots look VERY different from what I've seen in the past and what is shown in the DESeq2 manual.

I'm not sure if this is due to the size of the dataset (hundreds rather than a dozen samples), or if something else is going on. But the dispersion plots look really odd and I'm not sure the modeling of the dispersion is functioning appropriately.

It also looks a different kind of weird depending on what terms I include in the model.

Has anyone else ever seen anything like this before?

plot 1 plot 2 plot 3

deseq2 RNA RNASeq • 229 views
ADD COMMENT
0
Entering edit mode

I believe I have found and eliminated the problem.

Due to a miscommunication/misunderstanding, the person running the script put both the tumors AND the normals in one model. I assume that because the tumor expression is so wildly different in every way from the normal tissue expression that it massively violated the key assumptions of the models. Hence the weirdness.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

My first recommendation would be to pre-filter genes that don't have a count of 10 in x or more samples (where for 300 samples, you might consider x=10 or some reasonable number that still allows detection of DE for one group compared to others). Usually this isn't necessary, but it seems like there are many such genes here and they are actually affecting the local fit, at least on the left side, so we want to remove those first.

Then, I would also want to look into the genes where you have very high dispersion estimates despite high mean value. You can do:

with(mcols(dds), 
  head(which(baseMean > 1000 & dispersion > 10))
)

I would look at these genes with plotCounts to see why they have such a high dispersion value.

ADD COMMENT
0
Entering edit mode

I actually already did the >10 counts filter just as a matter of course.

But yeah. I'm going to take a closer look at these genes and what is going on.

But just a sanity check: This looks weird, right? I'm not just jumping at shadows?

ADD REPLY
0
Entering edit mode

Hmm, you still have a lot of genes above where the mean count is 1/100. You may want to just see what's going on there, as 10 counts of 10 would give at least a mean count of ~1/3 across 300 samples. Maybe consider raising x.

Yes, these dispersion plots don't look usable, you'd want to find out why the dispersion is so high for this bulk of genes.

ADD REPLY
0
Entering edit mode

You're right, I forgot that I had dropped that filter while I was looking at all this strangeness.

ADD REPLY
0
Entering edit mode

You're right, I forgot that I had dropped that filter while I was looking at all this strangeness.

ADD REPLY

Login before adding your answer.

Traffic: 165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6