Question

What problem does the MA plot diagnose? And how do you solve it?

2

Entering edit mode

ysdel ▴ 40

@ysdel-8147

Last seen 8.0 years ago

United States

I have an RNASeq experiment, and I am using DESeq2. After I get the results, I plot the MA plot. This is the output of plotMA:

And this is my attempt at the MA plot:

    res$significant = (res$padj < .05)
    res$significant = as.factor(res$significant)
    res$significant[is.na(res$significant)] = F
    ggplot(as.data.table(res), aes(x=log2(baseMean), y=log2FoldChange, color=significant)) +
        geom_point() +
        geom_hline(color = "blue3", yintercept = 0) +
        stat_smooth(se = FALSE, method = "loess", color = "red3") +
        scale_color_manual(values=c("Black","Red"))

There is a slight bias at the end, so genes with a high A, tend to have a high M, and we are detecting more up-regulation than down. Is this a problem? What might be causing this, and more importantly, is there something we can do to fix it?
Even if the slight effect is too little to be a problem, what causes problems like this? Imbalanced sampling depth at the two conditions? Why doesn't normalization (sample size factors) fix this?

Also, is there a reason why DESeq2::plotMA doesn't plot the best fit line?

Disclaimer: Cross posted to BioStars

deseq2 deseq rnaseq • 7.8k views

ADD COMMENT • link updated 8.0 years ago by Ryan C. Thompson ★ 7.9k • written 8.0 years ago by ysdel ▴ 40

score 2 · Accepted Answer · 2017-02-28

Depending on the biological effect you're studying, it might make perfect sense for more genes to be upregulated than downregulated, and if expression level is an indicator of importance to the tissue, then it might also make sense for many of the regulated genes to have high expression. If both of these assumptions are at least somewhat true, then your MA plot is exactly what you'd expect to see. Without knowing more about the experiment, I wouldn't say that your MA plot looks out of the ordinary. In other words, the non-zero correlation between M and A could very well be a real biological effect, in which case you would not want to normalize it away. Also consider that if you were to "center" the MA plot by subtracting from each gene's M value the M value of the loess curve at that gene's A value, you would still have many more up-regulated than down-regulated genes, so the imbalance cannot be explained purely by imperfect normalization.

As for why size factor normalization doesn't remove this effect, this is because normalizing all genes by the same size factors is equivalent to shifting the entire MA plot up or down by a constant amount. Such a normalization cannot change the shape or remove such a curve from the plot. If you are really convinced that this is a technical effect that you want to eliminate, you could certainly do so using a more heavy-handed method, such as quantile normalization. And DESeq2 doesn't plot the "best fit" line because just like there is no one normalization that works for every case, there is no method for generating a "best fit line" that is the best fit for every situation. I believe DESeq2 doesn't actually fit any kind of line to the data; it just plots the line y=0, probably as a visual reminder that the log fold changes were squeezed toward this value.