I have an RNASeq experiment, and I am using DESeq2. After I get the results, I plot the MA plot. This is the output of plotMA:
And this is my attempt at the MA plot:
res$significant = (res$padj < .05) res$significant = as.factor(res$significant) res$significant[is.na(res$significant)] = F ggplot(as.data.table(res), aes(x=log2(baseMean), y=log2FoldChange, color=significant)) + geom_point() + geom_hline(color = "blue3", yintercept = 0) + stat_smooth(se = FALSE, method = "loess", color = "red3") + scale_color_manual(values=c("Black","Red"))
- There is a slight bias at the end, so genes with a high A, tend to have a high M, and we are detecting more up-regulation than down. Is this a problem? What might be causing this, and more importantly, is there something we can do to fix it?
- Even if the slight effect is too little to be a problem, what causes problems like this? Imbalanced sampling depth at the two conditions? Why doesn't normalization (sample size factors) fix this?
Also, is there a reason why DESeq2::plotMA doesn't plot the best fit line?
Disclaimer: Cross posted to BioStars
Thank you very much for explaining this! You're right, there are more up-regulated genes even after considering the Loess line, and there is plausible biological reason for this. I'm still curious though - the MA plot is supposed to detect some artifacts. When do those artifacts arise?
"the MA plot is supposed to detect some artifacts"
I agree with Ryan. It's maybe counterproductive to consider the MA plot *only* as a tool for diagnosing problems.
And I wouldn't go down the path of trying to force the center of the LFCs to the y=0 line. That's definitely too heavy handed in my opinion for RNA-seq data.
It's simply the log fold changes due to condition over the mean expression.
And the y=0 line is drawn in simply to show what no change due to condition looks like.