Question: MDS Plot help in edgeR
0
gravatar for ilovesuperheroes1993
10 months ago by
ilovesuperheroes19930 wrote:

I am doing a differential genes analysis between 24 pairs of paired samples, normal vs diseased. I want to generate an MDS Plot to check if my normal and diseased samples are being clustered well. I am running edgeR for the analysis.

Could anyone tell me the difference between the 2 methods given below?

I have inputted my file of raw read counts, described the groupings of the samples and the classification of normal and diseased. I have created a DGEList 'y'.

Method 1

y_norm <- calcNormFactors(y)

plotMDS(y_norm)

 

Method 2

y_norm <- calcNormFactors(y)

cpm <- cpm (y_norm$counts)

plotMDS(cpm)

 

In both the cases, the MDS Plots generated are showing the normal and diseased samples as separate clusters, but the clusters themselves are different in the two cases.

Can anyone please let me know which of these is the proper way to do it?

 

 

ADD COMMENTlink modified 10 months ago by Gordon Smyth39k • written 10 months ago by ilovesuperheroes19930
Answer: MDS Plot help in edgeR
0
gravatar for Gordon Smyth
10 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Method 1 is correct and Method 2 is wrong. All the documentation uses Method 1, so what has make you think that Method 2 might be appropriate?

Method 2 would be same as Method 1 if you called cpm() with log=TRUE and prior.count=2. We have always given the same advice, for example in Section 2.15 of the edgeR User's Guide. As it is though, your cpm values are unlogged and so are on the wrong scale to compute linear distances.

Note that priort.count=2 is now the default in the latest version of edgeR, but you still need

logCPM <- cpm(y_norm, log=TRUE)

if you want to compute summary values for plots and heatmaps.

ADD COMMENTlink modified 9 months ago • written 10 months ago by Gordon Smyth39k

Thank you for your reply.

I originally have used Method 1. But recently I saw a couple of pages where someone had suggested to use method 2, i.e. doing the MDS plot with the normalized counts.

Could you please explain me why the second method is wrong, how the two methods are differing?

 

ADD REPLYlink written 10 months ago by ilovesuperheroes19930

Which page has advised Method 2? Can you give a link please.

cpm computes counts-per-million. It doesn't produce "normalized counts", because the result are not counts.

ADD REPLYlink modified 10 months ago • written 10 months ago by Gordon Smyth39k

I do not know if this is the same user on Biostars, but I and others have just been providing comments here: https://www.biostars.org/p/356810/#357188

I have added a final comment in response to your answer here, Gordon

ADD REPLYlink written 10 months ago by Kevin Blighe300
1

Kevin, thanks for answering questions about edgeR on Biostars, and I hope you will continue to do that. The use of cpm() in the Biostars thread is fine because it uses log=TRUE, which was my main concern. The prior.count setting is less important, and prior.count=2 is the default anyway in the latest version of edgeR.

ADD REPLYlink modified 10 months ago • written 10 months ago by Gordon Smyth39k

Thank you Gordon - no problem. We try our best to be as accurate as possible. If in doubt, we direct users here. Usually most questions are already answered here on Bioconductor by you, Aaron, or James, in fact.

ADD REPLYlink written 10 months ago by Kevin Blighe300

Hi Gordon,

I am the one that have used the incorrect method (cpm without the prior.count function). But I didn't know about the need of defining te prior.count. I only explained a problem using edgeR with my data in biostars page https://www.biostars.org/p/356810 on how to correct the batch effect of my samples, and how to see if my samples cluster together in PCA plot. Kevin and I have been disscusing about the issue during all the day, and I would be very gratefull if you consider on having a look of the coments and give me your opinion.

Thanks in advance

ADD REPLYlink written 10 months ago by IRAIA.MAIALEN0
1

Iraia, your use of cpm seems fine because you used log=TRUE (unlike OP's Method 2). The prior.count setting isn't so important, and prior.count=2 is the default in the latest version of edgeR anyway.

ADD REPLYlink modified 10 months ago • written 10 months ago by Gordon Smyth39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour