Question

Bizarre Dispersion Plots

0

Entering edit mode

stephen.hartley • 0

@stephenhartley-9155

Last seen 4.0 years ago

United States

So I'm doing an RNA Seq experiment on a large (300+) dataset. I used DESeq2 like I usually do, and the dispersion plots look VERY different from what I've seen in the past and what is shown in the DESeq2 manual.

I'm not sure if this is due to the size of the dataset (hundreds rather than a dozen samples), or if something else is going on. But the dispersion plots look really odd and I'm not sure the modeling of the dispersion is functioning appropriately.

It also looks a different kind of weird depending on what terms I include in the model.

Has anyone else ever seen anything like this before?

plot 1 plot 2 plot 3

deseq2 RNA RNASeq • 2.1k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by stephen.hartley • 0

0

Entering edit mode

I believe I have found and eliminated the problem.

Due to a miscommunication/misunderstanding, the person running the script put both the tumors AND the normals in one model. I assume that because the tumor expression is so wildly different in every way from the normal tissue expression that it massively violated the key assumptions of the models. Hence the weirdness.

ADD REPLY • link 5.7 years ago stephen.hartley • 0

score 0 · Answer 1 · 2020-03-02

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

My first recommendation would be to pre-filter genes that don't have a count of 10 in x or more samples (where for 300 samples, you might consider x=10 or some reasonable number that still allows detection of DE for one group compared to others). Usually this isn't necessary, but it seems like there are many such genes here and they are actually affecting the local fit, at least on the left side, so we want to remove those first.

Then, I would also want to look into the genes where you have very high dispersion estimates despite high mean value. You can do:

with(mcols(dds), 
  head(which(baseMean > 1000 & dispersion > 10))
)

I would look at these genes with plotCounts to see why they have such a high dispersion value.

ADD COMMENT • link 5.7 years ago Michael Love 43k

0

Entering edit mode

I actually already did the >10 counts filter just as a matter of course.

But yeah. I'm going to take a closer look at these genes and what is going on.

But just a sanity check: This looks weird, right? I'm not just jumping at shadows?

ADD REPLY • link 5.7 years ago stephen.hartley • 0

0

Entering edit mode

Hmm, you still have a lot of genes above where the mean count is 1/100. You may want to just see what's going on there, as 10 counts of 10 would give at least a mean count of ~1/3 across 300 samples. Maybe consider raising x.

Yes, these dispersion plots don't look usable, you'd want to find out why the dispersion is so high for this bulk of genes.

ADD REPLY • link 5.7 years ago Michael Love 43k

0

Entering edit mode

You're right, I forgot that I had dropped that filter while I was looking at all this strangeness.

ADD REPLY • link 5.7 years ago stephen.hartley • 0

0

Entering edit mode

You're right, I forgot that I had dropped that filter while I was looking at all this strangeness.

ADD REPLY • link 5.7 years ago stephen.hartley • 0