Question

Differing results from plotMDS and PCAtools

0

Entering edit mode

Fabian • 0

@25565e27

Last seen 3 months ago

Austria

Hi, I am trying out some packages for PCA in combination with the reads-to-genes edgeR-workflow for my RNA-seq analysis. I used the directions given from this post: Trying to do a PCA using edgeR with expression data

Group is equivalent to the Biomarkerquartiles_Q variable.

When I use the plotMDS function with the following code:

keep <- filterByExpr(y, group = y$samples$group)

y <- y[keep, , 
       keep.lib.sizes = F]

y <- calcNormFactors(y)

pch <- c(0,1,2,15,16,17)
colors <- rep(c("darkgreen", "red", "blue"), 2)
plotMDS(y, col=colors[factor(targets$Biomarkerquartiles_Q)],
        pch=pch[factor(targets$Biomarkerquartiles_Q)],
        gene.selection="common")

legend("topleft", legend=levels(factor(targets$Biomarkerquartiles)),
       pch=pch, col=colors, ncol=2)

I get this plot:

plotMDS_Output

And although there is no clear clustering of the quartiles, it is pretty clear that there is some clustering going on.

When I use the PCAtools package with the following code:

p <- pca(cpm(y, log = T),
         metadata = y$samples,
         removeVar = 0.1)
PCA.plot <- biplot(p,
    lab = paste0(p$metadata$group),
    colby = "group"
)

I get the following plot: PCAtools

I know both packages use different approaches but is it expected that the result can differ so much? Intriguingly, the x-axis and y-axis descriptions of both approaches seem to report comparable amounts of variations explained by the PCs.

Additionally, where could I begin to learn how to interpret the results from the plotMDS output?

edgeR PCAtools • 630 views

ADD COMMENT • link updated 3 months ago by Gordon Smyth 51k • written 4 months ago by Fabian • 0

0

Entering edit mode

I looked into some other variables and noticed that sex is probably the root cause of this batch effect:

pch <- c(0,1,2,15,16,17)colors <- rep(c(#"darkgreen",
                "red", "blue"), 2)
plotMDS(y, col=colors[factor(targets$Male)],
        pch=pch[factor(targets$Male)],
        gene.selection="common")

legend("topleft", legend=levels(factor(targets$Male, labels = c("Female", "Male"))),
       pch=pch, col=colors, ncol=2)

. enter image description here

Would it be enough to include it into my design matrix of the experiment and adjust for it or do I have to use limma::removeBatchEffect?

ADD REPLY • link 3 months ago Fabian • 0

1

Entering edit mode

The batch effect is not perfectly aligned with sex, with 3 samples not agreeing. I suggest you investigate the two male samples in the top right and the one female in the lower left. Either these samples are annotated incorrectly, or the batch effect has a different cause that is confounded with gender.

Yes, batch effects are always included in the design matrix. removeBatchEffect is only for graphical exploration (as the help page for removeBatchEffect will tell you).

ADD REPLY • link 3 months ago Gordon Smyth 51k

score 2 · Accepted Answer · 2024-03-29

2

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 35 minutes ago

WEHI, Melbourne, Australia

The reason why the two plots are different is that the first plot is a PCA plot while the second is a biplot. Quite different plots.

The PCA plot from edgeR plotMDS is showing very clearly that your data has a massive batch effect, which is something that you need to investigate. The biplot doesn't seem to be showing any useful information.

Any batch effect should be included in the design matrix.

The results from plotMDS seem pretty clear. You can see examples of MDS plots and their interpretation in the limma and edgeR case studies. As the documentation explains, distances on the MDS plot correspond to log2 leading fold-changes.

ADD COMMENT • link 3 months ago Gordon Smyth 51k

0

Entering edit mode

I really like the visualizations and functionality of the PCAtools package.

I looked through this vignette PCAtools: everything Principal Component Analysis and could only find the biplot function for my purpose, which as you said is not the same. Is there some kind of modification of the biplot function to get a PCA plot?