Question

MDS Plot help in edgeR

0

Entering edit mode

ilovesuperheroes1993 • 0

@ilovesuperheroes1993-17038

Last seen 5.1 years ago

I am doing a differential genes analysis between 24 pairs of paired samples, normal vs diseased. I want to generate an MDS Plot to check if my normal and diseased samples are being clustered well. I am running edgeR for the analysis.

Could anyone tell me the difference between the 2 methods given below?

I have inputted my file of raw read counts, described the groupings of the samples and the classification of normal and diseased. I have created a DGEList 'y'.

Method 1

y_norm <- calcNormFactors(y)

plotMDS(y_norm)

Method 2

y_norm <- calcNormFactors(y)

cpm <- cpm (y_norm$counts)

plotMDS(cpm)

In both the cases, the MDS Plots generated are showing the normal and diseased samples as separate clusters, but the clusters themselves are different in the two cases.

Can anyone please let me know which of these is the proper way to do it?

edgeR plotmds differential gene expression MDS mdsplot • 4.1k views

ADD COMMENT • link updated 5.6 years ago by Gordon Smyth 51k • written 5.6 years ago by ilovesuperheroes1993 • 0

Gordon Smyth · Answer 1 · 2019-01-08

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Method 1 is correct and Method 2 is wrong. All the documentation uses Method 1, so what has make you think that Method 2 might be appropriate?

Method 2 would be same as Method 1 if you called cpm() with log=TRUE and prior.count=2. We have always given the same advice, for example in Section 2.15 of the edgeR User's Guide. As it is though, your cpm values are unlogged and so are on the wrong scale to compute linear distances.

Note that priort.count=2 is now the default in the latest version of edgeR, but you still need

logCPM <- cpm(y_norm, log=TRUE)

if you want to compute summary values for plots and heatmaps.

ADD COMMENT • link 5.6 years ago • updated 5.5 years ago Gordon Smyth 51k

0

Entering edit mode

Thank you for your reply.

I originally have used Method 1. But recently I saw a couple of pages where someone had suggested to use method 2, i.e. doing the MDS plot with the normalized counts.

Could you please explain me why the second method is wrong, how the two methods are differing?

ADD REPLY • link 5.6 years ago ilovesuperheroes1993 • 0

0

Entering edit mode

Which page has advised Method 2? Can you give a link please.

cpm computes counts-per-million. It doesn't produce "normalized counts", because the result are not counts.

ADD REPLY • link 5.6 years ago Gordon Smyth 51k

0

Entering edit mode

I do not know if this is the same user on Biostars, but I and others have just been providing comments here: https://www.biostars.org/p/356810/#357188

I have added a final comment in response to your answer here, Gordon

ADD REPLY • link 5.6 years ago Kevin Blighe ★ 4.0k

1

Entering edit mode

Kevin, thanks for answering questions about edgeR on Biostars, and I hope you will continue to do that. The use of cpm() in the Biostars thread is fine because it uses log=TRUE, which was my main concern. The prior.count setting is less important, and prior.count=2 is the default anyway in the latest version of edgeR.

ADD REPLY • link 5.6 years ago Gordon Smyth 51k

0

Entering edit mode

Thank you Gordon - no problem. We try our best to be as accurate as possible. If in doubt, we direct users here. Usually most questions are already answered here on Bioconductor by you, Aaron, or James, in fact.

ADD REPLY • link 5.5 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Hi Gordon,

I am the one that have used the incorrect method (cpm without the prior.count function). But I didn't know about the need of defining te prior.count. I only explained a problem using edgeR with my data in biostars page https://www.biostars.org/p/356810 on how to correct the batch effect of my samples, and how to see if my samples cluster together in PCA plot. Kevin and I have been disscusing about the issue during all the day, and I would be very gratefull if you consider on having a look of the coments and give me your opinion.

Thanks in advance

ADD REPLY • link updated 5.6 years ago by Gordon Smyth 51k • written 5.6 years ago by IRAIA.MAIALEN • 0

1

Entering edit mode

Iraia, your use of cpm seems fine because you used log=TRUE (unlike OP's Method 2). The prior.count setting isn't so important, and prior.count=2 is the default in the latest version of edgeR anyway.

ADD REPLY • link 5.6 years ago Gordon Smyth 51k