Question

MDS in edgeR

0

Entering edit mode

Susanne Franssen ▴ 30

@susanne-franssen-4994

Last seen 11.4 years ago

Dear all / authors of the edgeR package, I have a question concerning the use of the multidimensional scaling plot provided by the edgeR package. I have RNA-seq data for 8 libraries from a 2x2 factorial design and I want produce a mds plot (plotMDS.DGEList) to get a better idea of the distances between the libraries. The function plotMDS.DGEList now offers me the option top. Here, I can choose the x (top=x) genes that show the highest tagwise dispersion looking at all libraries. My question is now, what I should consider for the choice of x? As I have a 2x2 factorial design I was going to choose x=number of all genes as I don't see a rational for choosing a specific number smaller number, which would seem somehow arbitrary to me. Are there any opinions on that? Thanks a lot, Susanne

edgeR edgeR • 2.3k views

ADD COMMENT • link updated 14.1 years ago by Gordon Smyth 53k • written 14.1 years ago by Susanne Franssen ▴ 30

score 0 · Answer 1 · 2011-12-09

Dear Susanne,

Please leave top at the default value unless you have a good reason to change it.

Setting top to the whole genome would mean that you would be trying to distinguish your samples using a collection genes that are mostly either not differentially expressed between the samples or are not expressed at all. This would increase the noise in your comparison and risk masking real patterns.

There is a large literature on unsupervised clustering, of which MDS is a type, and filtering the genes to those which contain real information for distinguishing the samples is pretty much universally recommended. The exact number that are used is not important, but the fact that it is limited to more variable genes is.

Best wishes
Gordon