MDS in edgeR
1
0
Entering edit mode
@susanne-franssen-4994
Last seen 9.6 years ago
Dear all / authors of the edgeR package, I have a question concerning the use of the multidimensional scaling plot provided by the edgeR package. I have RNA-seq data for 8 libraries from a 2x2 factorial design and I want produce a mds plot (plotMDS.DGEList) to get a better idea of the distances between the libraries. The function plotMDS.DGEList now offers me the option top. Here, I can choose the x (top=x) genes that show the highest tagwise dispersion looking at all libraries. My question is now, what I should consider for the choice of x? As I have a 2x2 factorial design I was going to choose x=number of all genes as I don't see a rational for choosing a specific number smaller number, which would seem somehow arbitrary to me. Are there any opinions on that? Thanks a lot, Susanne
edgeR edgeR • 2.0k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 35 minutes ago
WEHI, Melbourne, Australia

Dear Susanne,

Please leave top at the default value unless you have a good reason to change it.

Setting top to the whole genome would mean that you would be trying to distinguish your samples using a collection genes that are mostly either not differentially expressed between the samples or are not expressed at all. This would increase the noise in your comparison and risk masking real patterns.

There is a large literature on unsupervised clustering, of which MDS is a type, and filtering the genes to those which contain real information for distinguishing the samples is pretty much universally recommended. The exact number that are used is not important, but the fact that it is limited to more variable genes is.

Best wishes
Gordon

ADD COMMENT

Login before adding your answer.

Traffic: 806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6