Entering edit mode
Kripa R
▴
180
@kripa-r-4482
Last seen 10.2 years ago
Hi again, sorry to be a bother but one more question. After knowing
that lets say 20 genes are being used for the PCA would it be possible
to figure out how much each gene contributes? So to develop an
equation or know specifically in PC1 the variance in these x genes
were the most important or something?
.kripa
> From: dwu@fas.harvard.edu
> To: ttriche@usc.edu; kripa777@hotmail.com
> CC: amackey@virginia.edu; bioconductor@r-project.org
> Date: Mon, 7 May 2012 14:11:33 -0400
> Subject: RE: [BioC] LIMMA: plotMDS
>
> Hi Kripa,
>
> It is a good question to ask which the top 500 genes are. We only
can know that if "gene.selection" is "common", in which situation, the
same top 500 genes were used for calculating the distance matrix among
all samples. When "gene.selection" is "pairwise", different sets of
500 genes for different pairs of samples are used for distance matrix
calculation.
>
> If you look at "plotMDS.default" in limma, you will see the above
description is consistent with the code. "plotMDS.default" is used by
plotMDS.
>
> The current code does not output the top-500-gene list. However, the
code can be edited to output the list when "gene.selection" is
"common". Please see the following.
> In the followed example, "mds4[[2]]" is the vector of indexes for
the top 500 genes.
>
> Hope this help.
>
> Di
>
>
>
> plotMDS.default<-
> function (x, top = 500, labels = colnames(x), col = NULL, cex = 1,
> dim.plot = c(1, 2), ndim = max(dim.plot), gene.selection =
"pairwise",
> xlab = paste("Dimension", dim.plot[1]), ylab = paste("Dimension",
> dim.plot[2]), ...)
> {
> x <- as.matrix(x)
> ok <- is.finite(x)
> if (!all(ok))
> x <- x[apply(ok, 1, all), ]
> if (is.null(labels))
> labels <- 1:dim(x)[2]
> nprobes <- nrow(x)
> nsamples <- ncol(x)
> if (ndim < 2)
> stop("Need at least two dim.plot")
> if (nsamples < ndim)
> stop("Two few samples")
> gene.selection <- match.arg(gene.selection, c("pairwise",
> "common"))
> cn <- colnames(x)
> dd <- matrix(0, nrow = nsamples, ncol = nsamples, dimnames =
list(cn,
> cn))
> topindex <- nprobes - top + 1
> if (gene.selection == "pairwise") {
> for (i in 2:(nsamples)) for (j in 1:(i - 1)) dd[i, j] =
sqrt(meansort.int((x[,
> i] - x[, j])^2, partial = topindex)[topindex:nprobes]))
> }
> else {
>
> # Same genes used for all comparisons ,"common"
>
> s <- rowMeans((x - rowMeans(x))^2)
> q <- quantile(s, p = (topindex - 1.5)/(nprobes - 1))
>
> x <- x[s >= q, ]
>
> # an extra line
> ind.top.genes<-which(s >= q)
>
> for (i in 2:(nsamples)) dd[i, 1:(i - 1)] = sqrt(colMeans((x[,
> i] - x[, 1:(i - 1), drop = FALSE])^2))
> }
> a1 <- cmdscale(as.dist(dd), k = ndim)
> mds <- new("MDS", list(dim.plot = dim.plot, distance.matrix = dd,
> cmdscale.out = a1, top = top, gene.selection = gene.selection))
> mds$x <- a1[, dim.plot[1]]
> mds$y <- a1[, dim.plot[2]]
> mdsPlot<-plotMDS(mds, labels = labels, col = col, cex = cex, xlab =
xlab,
> ylab = ylab, ...)
> list (mds=mds, ind.top.genes=ind.top.genes)
> }
>
>
> # The example from "plotMDS" documentation
>
> sd <- 0.3*sqrt(4/rchisq(1000,df=4))
> x <- matrix(rnorm(1000*6,sd=sd),1000,6)
> rownames(x) <- paste("Gene",1:1000)
> x[1:50,4:6] <- x[1:50,4:6] + 2
> # without labels, indexes of samples are plotted.
>
> mds <- plotMDS(x, col=c(rep("black",3), rep("red",3)) )
> mds4<-plotMDS.default(x, col=c(rep("black",3), rep("red",3)) ,
gene.selection="common")
>
> > mds4[[1]]
> > length(mds4[[2]])
> [1] 500
>
>
>
>
>
>
> ----
> Di Wu
> Postdoctoral fellow
> Harvard University, Statistics Department
> Harvard Medical School
> Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA
>
> ________________________________________
> From: bioconductor-bounces@r-project.org [bioconductor-
bounces@r-project.org] On Behalf Of Tim Triche, Jr.
[tim.triche@gmail.com]
> Sent: Monday, May 07, 2012 1:48 PM
> To: Kripa R
> Cc: amackey@virginia.edu; bioconductor@r-project.org
> Subject: Re: [BioC] LIMMA: plotMDS
>
> also http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMet
hods/inst/doc/pcaMethods.pdf<http: www.bioconductor.org="" packages="" 2.11="" bioc="" vignettes="" pcamethods="" inst="" doc="" pcamethods.pdf="">
>
> if you are willing to simply truncate loadings at some arbitrary
value (not
> always the worst idea of all time mind you)
>
> superPC and the preconditioned lasso turn this strategy into a
method for
> obtaining sparse classifiers, FWIW.
>
>
> On Mon, May 7, 2012 at 10:46 AM, Tim Triche, Jr.
<tim.triche@gmail.com>wrote:
>
> > try using the 'sparcl' package or something like this:
> > http://www.biomedcentral.com/1471-2105/11/296
> >
> > the point is to threshold vigorously enough that most of the
loadings are
> > forced to zero
> >
> >
> >
> > On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777@hotmail.com>
wrote:
> >
> >>
> >> Maybe this is very basic, but how do you identify those 500 genes
for the
> >> heatmap?
> >>
> >> .kripa
> >>
> >> From: amackey@virginia.edu
> >> Date: Mon, 7 May 2012 11:47:16 -0400
> >> Subject: Re: [BioC] LIMMA: plotMDS
> >> To: kripa777@hotmail.com
> >> CC: bioconductor@r-project.org
> >>
> >> All genes contribute to all principle components; some just more
than
> >> others. I've found it useful (or perhaps just entertaining) to
see
> >> heatmaps of the first N (say 10) component loadings, constrained
to the 500
> >> genes whose variances contribute most to those components.
> >>
> >>
> >> -Aaron
> >>
> >> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com>
wrote:
> >>
> >>
> >>
> >>
> >> Hi I was wondering how I can figure out which genes correspond to
each of
> >> the principle components ($cmdscale.out). Gene.selection is set
to pairwise
> >> for the top 500 but I'd like to know exactly which genes are
being
> >> considered for cmdscale.out[,3], for example.
> >>
> >>
> >>
> >>
> >>
> >> Thanks for the help!
> >>
> >>
> >>
> >> .kripa
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >>
> >> _______________________________________________
> >>
> >> Bioconductor mailing list
> >>
> >> Bioconductor@r-project.org
> >>
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >
> >
> > --
> > *A model is a lie that helps you see the truth.*
> > *
> > *
> > Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
> >
> >
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]