LIMMA: plotMDS

0

Entering edit mode

Kripa R ▴ 180

@kripa-r-4482

Last seen 11.4 years ago

Hi I was wondering how I can figure out which genes correspond to each of the principle components ($cmdscale.out). Gene.selection is set to pairwise for the top 500 but I'd like to know exactly which genes are being considered for cmdscale.out[,3], for example. Thanks for the help! .kripa [[alternative HTML version deleted]]

• 3.3k views

ADD COMMENT • link updated 2.5 years ago by Gordon Smyth 53k • written 13.7 years ago by Kripa R ▴ 180

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 10 weeks ago

United States

Hi, On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777 at="" hotmail.com=""> wrote: > > Hi I was wondering how I can figure out which genes correspond to each of the principle components ($cmdscale.out). Gene.selection is set to pairwise for the top 500 but I'd like to know exactly which genes are being considered for cmdscale.out[,3], for example. Perhaps you'd like to tell us what you did, exactly? without some reproducible code examples, we're left guessing as to what commands you've run and where `$cmdscale.out` comes from, etc. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 13.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Aaron Mackey ▴ 170

@aaron-mackey-4358

Last seen 11.4 years ago

All genes contribute to all principle components; some just more than others. I've found it useful (or perhaps just entertaining) to see heatmaps of the first N (say 10) component loadings, constrained to the 500 genes whose variances contribute most to those components. -Aaron On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com> wrote: > > Hi I was wondering how I can figure out which genes correspond to each of > the principle components ($cmdscale.out). Gene.selection is set to pairwise > for the top 500 but I'd like to know exactly which genes are being > considered for cmdscale.out[,3], for example. > > Thanks for the help! > > .kripa > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.7 years ago Aaron Mackey ▴ 170

0

Entering edit mode

Maybe this is very basic, but how do you identify those 500 genes for the heatmap? .kripa From: amackey@virginia.edu Date: Mon, 7 May 2012 11:47:16 -0400 Subject: Re: [BioC] LIMMA: plotMDS To: kripa777@hotmail.com CC: bioconductor@r-project.org All genes contribute to all principle components; some just more than others. I've found it useful (or perhaps just entertaining) to see heatmaps of the first N (say 10) component loadings, constrained to the 500 genes whose variances contribute most to those components. -Aaron On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com> wrote: Hi I was wondering how I can figure out which genes correspond to each of the principle components ($cmdscale.out). Gene.selection is set to pairwise for the top 500 but I'd like to know exactly which genes are being considered for cmdscale.out[,3], for example. Thanks for the help! .kripa [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 13.7 years ago Kripa R ▴ 180

0

Entering edit mode

try using the 'sparcl' package or something like this: http://www.biomedcentral.com/1471-2105/11/296 the point is to threshold vigorously enough that most of the loadings are forced to zero On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777@hotmail.com> wrote: > > Maybe this is very basic, but how do you identify those 500 genes for the > heatmap? > > .kripa > > From: amackey@virginia.edu > Date: Mon, 7 May 2012 11:47:16 -0400 > Subject: Re: [BioC] LIMMA: plotMDS > To: kripa777@hotmail.com > CC: bioconductor@r-project.org > > All genes contribute to all principle components; some just more than > others. I've found it useful (or perhaps just entertaining) to see > heatmaps of the first N (say 10) component loadings, constrained to the 500 > genes whose variances contribute most to those components. > > > -Aaron > > On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com> wrote: > > > > > Hi I was wondering how I can figure out which genes correspond to each of > the principle components ($cmdscale.out). Gene.selection is set to pairwise > for the top 500 but I'd like to know exactly which genes are being > considered for cmdscale.out[,3], for example. > > > > > > Thanks for the help! > > > > .kripa > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

also http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMetho ds/inst/doc/pcaMethods.pdf<http: www.bioconductor.org="" packages="" 2.11="" b="" ioc="" vignettes="" pcamethods="" inst="" doc="" pcamethods.pdf=""> if you are willing to simply truncate loadings at some arbitrary value (not always the worst idea of all time mind you) superPC and the preconditioned lasso turn this strategy into a method for obtaining sparse classifiers, FWIW. On Mon, May 7, 2012 at 10:46 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > try using the 'sparcl' package or something like this: > http://www.biomedcentral.com/1471-2105/11/296 > > the point is to threshold vigorously enough that most of the loadings are > forced to zero > > > > On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777@hotmail.com> wrote: > >> >> Maybe this is very basic, but how do you identify those 500 genes for the >> heatmap? >> >> .kripa >> >> From: amackey@virginia.edu >> Date: Mon, 7 May 2012 11:47:16 -0400 >> Subject: Re: [BioC] LIMMA: plotMDS >> To: kripa777@hotmail.com >> CC: bioconductor@r-project.org >> >> All genes contribute to all principle components; some just more than >> others. I've found it useful (or perhaps just entertaining) to see >> heatmaps of the first N (say 10) component loadings, constrained to the 500 >> genes whose variances contribute most to those components. >> >> >> -Aaron >> >> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com> wrote: >> >> >> >> >> Hi I was wondering how I can figure out which genes correspond to each of >> the principle components ($cmdscale.out). Gene.selection is set to pairwise >> for the top 500 but I'd like to know exactly which genes are being >> considered for cmdscale.out[,3], for example. >> >> >> >> >> >> Thanks for the help! >> >> >> >> .kripa >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Hi Kripa, It is a good question to ask which the top 500 genes are. We only can know that if "gene.selection" is "common", in which situation, the same top 500 genes were used for calculating the distance matrix among all samples. When "gene.selection" is "pairwise", different sets of 500 genes for different pairs of samples are used for distance matrix calculation. If you look at "plotMDS.default" in limma, you will see the above description is consistent with the code. "plotMDS.default" is used by plotMDS. The current code does not output the top-500-gene list. However, the code can be edited to output the list when "gene.selection" is "common". Please see the following. In the followed example, "mds4[[2]]" is the vector of indexes for the top 500 genes. Hope this help. Di plotMDS.default<- function (x, top = 500, labels = colnames(x), col = NULL, cex = 1, dim.plot = c(1, 2), ndim = max(dim.plot), gene.selection = "pairwise", xlab = paste("Dimension", dim.plot[1]), ylab = paste("Dimension", dim.plot[2]), ...) { x <- as.matrix(x) ok <- is.finite(x) if (!all(ok)) x <- x[apply(ok, 1, all), ] if (is.null(labels)) labels <- 1:dim(x)[2] nprobes <- nrow(x) nsamples <- ncol(x) if (ndim < 2) stop("Need at least two dim.plot") if (nsamples < ndim) stop("Two few samples") gene.selection <- match.arg(gene.selection, c("pairwise", "common")) cn <- colnames(x) dd <- matrix(0, nrow = nsamples, ncol = nsamples, dimnames = list(cn, cn)) topindex <- nprobes - top + 1 if (gene.selection == "pairwise") { for (i in 2:(nsamples)) for (j in 1:(i - 1)) dd[i, j] = sqrt(meansort.int((x[, i] - x[, j])^2, partial = topindex)[topindex:nprobes])) } else { # Same genes used for all comparisons ,"common" s <- rowMeans((x - rowMeans(x))^2) q <- quantile(s, p = (topindex - 1.5)/(nprobes - 1)) x <- x[s >= q, ] # an extra line ind.top.genes<-which(s >= q) for (i in 2:(nsamples)) dd[i, 1:(i - 1)] = sqrt(colMeans((x[, i] - x[, 1:(i - 1), drop = FALSE])^2)) } a1 <- cmdscale(as.dist(dd), k = ndim) mds <- new("MDS", list(dim.plot = dim.plot, distance.matrix = dd, cmdscale.out = a1, top = top, gene.selection = gene.selection)) mds$x <- a1[, dim.plot[1]] mds$y <- a1[, dim.plot[2]] mdsPlot<-plotMDS(mds, labels = labels, col = col, cex = cex, xlab = xlab, ylab = ylab, ...) list (mds=mds, ind.top.genes=ind.top.genes) } # The example from "plotMDS" documentation sd <- 0.3*sqrt(4/rchisq(1000,df=4)) x <- matrix(rnorm(1000*6,sd=sd),1000,6) rownames(x) <- paste("Gene",1:1000) x[1:50,4:6] <- x[1:50,4:6] + 2 # without labels, indexes of samples are plotted. mds <- plotMDS(x, col=c(rep("black",3), rep("red",3)) ) mds4<-plotMDS.default(x, col=c(rep("black",3), rep("red",3)) , gene.selection="common") > mds4[[1]] > length(mds4[[2]]) [1] 500 ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] On Behalf Of Tim Triche, Jr. [tim.triche@gmail.com] Sent: Monday, May 07, 2012 1:48 PM To: Kripa R Cc: amackey at virginia.edu; bioconductor at r-project.org Subject: Re: [BioC] LIMMA: plotMDS also http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMetho ds/inst/doc/pcaMethods.pdf<http: www.bioconductor.org="" packages="" 2.11="" b="" ioc="" vignettes="" pcamethods="" inst="" doc="" pcamethods.pdf=""> if you are willing to simply truncate loadings at some arbitrary value (not always the worst idea of all time mind you) superPC and the preconditioned lasso turn this strategy into a method for obtaining sparse classifiers, FWIW. On Mon, May 7, 2012 at 10:46 AM, Tim Triche, Jr. <tim.triche at="" gmail.com="">wrote: > try using the 'sparcl' package or something like this: > http://www.biomedcentral.com/1471-2105/11/296 > > the point is to threshold vigorously enough that most of the loadings are > forced to zero > > > > On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777 at="" hotmail.com=""> wrote: > >> >> Maybe this is very basic, but how do you identify those 500 genes for the >> heatmap? >> >> .kripa >> >> From: amackey at virginia.edu >> Date: Mon, 7 May 2012 11:47:16 -0400 >> Subject: Re: [BioC] LIMMA: plotMDS >> To: kripa777 at hotmail.com >> CC: bioconductor at r-project.org >> >> All genes contribute to all principle components; some just more than >> others. I've found it useful (or perhaps just entertaining) to see >> heatmaps of the first N (say 10) component loadings, constrained to the 500 >> genes whose variances contribute most to those components. >> >> >> -Aaron >> >> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777 at="" hotmail.com=""> wrote: >> >> >> >> >> Hi I was wondering how I can figure out which genes correspond to each of >> the principle components ($cmdscale.out). Gene.selection is set to pairwise >> for the top 500 but I'd like to know exactly which genes are being >> considered for cmdscale.out[,3], for example. >> >> >> >> >> >> Thanks for the help! >> >> >> >> .kripa >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.7 years ago Wu, Di ▴ 120

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

Dear Kripa,

Just to be clear on terminology, MDS computes principal coordinates, which are not quite the same as principal components.

Regardless of the name however, there actually is no set of genes that corresponds to each dimension. The "pairwise" method in plotMDS() uses a potentially different set of genes for every pair of samples, and all these pairwise sets of genes feed to some extent into all the dimensions.

Best wishes
Gordon

ADD COMMENT • link 13.7 years ago • updated 2.5 years ago Gordon Smyth 53k

Login before adding your answer.