Differing results from plotMDS and PCAtools
Entering edit mode
Fabian • 0
Last seen 7 days ago

Hi, I am trying out some packages for PCA in combination with the reads-to-genes edgeR-workflow for my RNA-seq analysis. I used the directions given from this post: Trying to do a PCA using edgeR with expression data

Group is equivalent to the Biomarkerquartiles_Q variable.

When I use the plotMDS function with the following code:

keep <- filterByExpr(y, group = y$samples$group)

y <- y[keep, , 
       keep.lib.sizes = F]

y <- calcNormFactors(y)

pch <- c(0,1,2,15,16,17)
colors <- rep(c("darkgreen", "red", "blue"), 2)
plotMDS(y, col=colors[factor(targets$Biomarkerquartiles_Q)],

legend("topleft", legend=levels(factor(targets$Biomarkerquartiles)),
       pch=pch, col=colors, ncol=2)

I get this plot:


And although there is no clear clustering of the quartiles, it is pretty clear that there is some clustering going on.

When I use the PCAtools package with the following code:

p <- pca(cpm(y, log = T),
         metadata = y$samples,
         removeVar = 0.1)
PCA.plot <- biplot(p,
    lab = paste0(p$metadata$group),
    colby = "group"

I get the following plot: PCAtools

I know both packages use different approaches but is it expected that the result can differ so much? Intriguingly, the x-axis and y-axis descriptions of both approaches seem to report comparable amounts of variations explained by the PCs.

Additionally, where could I begin to learn how to interpret the results from the plotMDS output?

edgeR PCAtools • 312 views
Entering edit mode

I looked into some other variables and noticed that sex is probably the root cause of this batch effect:

pch <- c(0,1,2,15,16,17)colors <- rep(c(#"darkgreen",
                "red", "blue"), 2)
plotMDS(y, col=colors[factor(targets$Male)],

legend("topleft", legend=levels(factor(targets$Male, labels = c("Female", "Male"))),
       pch=pch, col=colors, ncol=2)

.enter image description here

Would it be enough to include it into my design matrix of the experiment and adjust for it or do I have to use limma::removeBatchEffect?

Entering edit mode

The batch effect is not perfectly aligned with sex, with 3 samples not agreeing. I suggest you investigate the two male samples in the top right and the one female in the lower left. Either these samples are annotated incorrectly, or the batch effect has a different cause that is confounded with gender.

Yes, batch effects are always included in the design matrix. removeBatchEffect is only for graphical exploration (as the help page for removeBatchEffect will tell you).

Entering edit mode
Last seen 14 minutes ago
WEHI, Melbourne, Australia

The reason why the two plots are different is that the first plot is a PCA plot while the second is a biplot. Quite different plots.

The PCA plot from edgeR plotMDS is showing very clearly that your data has a massive batch effect, which is something that you need to investigate. The biplot doesn't seem to be showing any useful information.

Any batch effect should be included in the design matrix.

The results from plotMDS seem pretty clear. You can see examples of MDS plots and their interpretation in the limma and edgeR case studies. As the documentation explains, distances on the MDS plot correspond to log2 leading fold-changes.

Entering edit mode

I really like the visualizations and functionality of the PCAtools package.

I looked through this vignette PCAtools: everything Principal Component Analysis and could only find the biplot function for my purpose, which as you said is not the same. Is there some kind of modification of the biplot function to get a PCA plot?

Entering edit mode

Sorry, I can't help you with the PCAtools package.


Login before adding your answer.

Traffic: 387 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6