Question

Unusual skewed MA plot at lower expression

1

Entering edit mode

adam.cribbs ▴ 10

@adamcribbs-11651

Last seen 5.5 years ago

Hello,

I have RNA seq data (paired end) isolated from exosomes from healthy and disease and have performed diff express using DESeq2. However, I have unusual skweing in my MA plot. I have not come accorss this before in any of my other experiments (or in the microvesicles from the same donors). There are 2984 DE genes. I was wondering whether this could be a consequence of uneven amounts of mRNA per vesicle? If so, what would be the best way to handle data of this type?

I noticed that someone else had an issue similar to this: DESeq2 MA plot of encode data

deseq2 normalization • 1.6k views

ADD COMMENT • link 7.5 years ago adam.cribbs ▴ 10

0

Entering edit mode

I would recommend re-running DESeq2 without the beta prior and then re-generating the MA plot in order to get a better picture of the skew. It's likely that the upward skew goes all the way to the left of the plot in actuality, but the left edge has been squeezed out by the beta prior.

ADD REPLY • link 7.5 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Those triangles indicate simply that the LFC goes off the plot as set by ylim, it's not the beta prior that squeezed them down.

That said, it would be interesting to at least plot the MLE fold changes in addition, which can be done simply with:

res <- results(dds, addMLE=TRUE)
plotMA(res, MLE=TRUE)

ADD REPLY • link 7.5 years ago Michael Love 41k

0

Entering edit mode

Oh, I wasn't referring to the triangles, I was referring to the left side of the plot, where it seems that the skew goes back down to zero. I think the apparent hump shape of the skew is probably a product of the beta prior rather than a property of the data.

ADD REPLY • link 7.5 years ago Ryan C. Thompson ★ 7.9k

1

Entering edit mode

Oh gotcha, yeah those are going to be positive count in the non-reference group and ~0 in the reference group, leading to very large or Inf LFC.

Which is why I'm suspecting there could be confounding of the library sizes with the condition, which is problematic for doing any kind of inference on the low count genes.

ADD REPLY • link 7.5 years ago Michael Love 41k

0

Entering edit mode

Thanks for the comments. I re-analysed the data without the beta prior and have the following MA plot:

The summary stats for read depth are as follows:
Disease:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  18.16   18.98   21.58   21.32   23.47   24.37
Healthy: 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  20.84   21.56   22.82   23.70   25.20   28.08

ADD REPLY • link 7.5 years ago adam.cribbs ▴ 10

0

Entering edit mode

So it looks like the sequencing depth is not confounded.

I don't know what it is about the many genes that have 4x, 16x, 64x counts in disease vs healthy. You can pick them out and do something like GO analysis, or simply look at some example genes using plotCounts() and then figuring out what these genes may have in common.

To test for genes with large fold change, you can do, e.g.:

res <- results(dds, lfcThreshold=4)

ADD REPLY • link 7.5 years ago Michael Love 41k

score 0 · Answer 1 · 2016-10-11

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

Can you summarize the sequencing depth across healthy and disease?

sapply(split(colSums(counts(dds))/1e6, condition), summary)

ADD COMMENT • link 7.5 years ago Michael Love 41k