Search
Question: limma: strange subsetting fit[, i] of "MArrayLM" objects design matrix, also robust=TRUE and topTable
0
4.3 years ago by
Maciek Sykulski20 wrote:
Dear Gordon, dear maintainers of limma package, While writing some code based on the R package limma, Bioconductor version: Release 2.13, I?ve noticed some strange code in topTable, and in subsetting fit[,i] of "MArrayLM" object (I?m not sure if this results in any bug). Assume a following execution: > fit <- lmFit(data,design,...) > topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F") specifically, in the function topTable, the line fit <- eBayes(fit[, coef]) raises my concerns (see comment below in the code). topTable <-function(...) { [...] if (length(coef) > 1) { coef <- unique(coef) if (length(fit$coef[1, coef]) < ncol(fit)) fit <- eBayes(fit[, coef]) # ^ Above reevaluates eBayes without robust=TRUE # even if we want a robust estimation [...] } Second confusion is about subsetting of fit object fit[, coef]. fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does fit[,coef] perform? Is it by samples, or by coefficients? After inspection I see that fit[,coef] subsets the fit$design matrix on samples, not on coefficients (while it does subset on columns from fit$coefficients). Inspection of the code responsible for subsetting "MArrayLM" reveals: subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX) { [...] for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE] for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE] for(a in I ) object[[a]] <- object[[a]][i] for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE] #Shouldnt this line above look more like this?: #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE] [...] } Here is an example of a design matrix (rows correspond to samples, columns to fitted coefficients): > class(fit) [1] "MArrayLM" attr(,"package") [1] "limma" > fit$design cluster.1 cluster.2 cluster.3 cluster.4 1 1 1 0 0 2 1 1 0 1 3 1 1 0 1 [...] 42 1 1 0 0 43 1 1 0 0 44 1 0 0 0 [...] Now, what I?d expect from an operation which subsets coefficients from a MArrayLM model (as in topTable line fit <- eBayes(fit[, coef])) is this cluster.2 cluster.3 1 1 0 2 1 0 3 1 0 [...] moreover fit[,2:3]$coefficients is modified exactly in this way. However, the resulting fit[,2:3]$design is a subset on samples: > fit[,2:3]$design cluster.1 cluster.2 cluster.3 cluster.4 2 1 1 0 1 3 1 1 0 1 Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to achieve? This may be relevant, since subsetting fit this way propagates further inside eBayes function, where it causes classifyTestsF not to run (in case of my data), because of fit[,coef]$design being singular. eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1)) { [...] #!!! Below condition always evaluates to false # because fit$design was malformed before and # is.fullrank always returns false if (!is.null(fit$design) && is.fullrank(fit$design)) { F.stat <- classifyTestsF(fit, fstat.only = TRUE) fit$F <- as.vector(F.stat) [...] } I?m not pointing to any specific misbehavior/bug, because it seems that topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) still somehow produces the same results as my modified: topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by="F",robu st=TRUE) and different from non-robust: topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F") So the final question is: Does fit[,coef] modifies the fitdesign matrix properly, in alignment with later check is.fullrank(fitdesign)? Attached is a simple example R session with these issues. Thanks, Maciek Sykulski -------------- next part -------------- A non-text attachment was scrubbed... Name: limma_debug_session.pdf Type: application/pdf Size: 75559 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140512="" 2762dc29="" attachment.pdf="">
modified 4.3 years ago by Panos Bolan20 • written 4.3 years ago by Maciek Sykulski20
0
4.3 years ago by
Gordon Smyth34k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth34k wrote:
Dear Maciek, The second index of an MArrayLM object subsets by coefficients. You will easily see by typing dim(fit) or colnames(fit) and so on that the columns refer to coefficients. You are right that the current subsetting of the design matrix is incorrect. I've fixed this now in bioc-devel. All the limma subsetting code has been rewritten in past 12 months and the treatment of the design matrix was overlooked. In any case, subsetting a design matrix is not always a sensible thing to do, and the subsetted matrix is not normally used downstream. Please upgrade to the latest version of Bioconductor. Best wishes Gordon On Mon, 12 May 2014, Maciek Sykulski wrote: > Dear Gordon, dear maintainers of limma package, > > While writing some code based on the R package limma, Bioconductor version: > Release 2.13, > I?ve noticed some strange code in topTable, and in subsetting fit[,i] of > "MArrayLM" object (I?m not sure if this results in any bug). > > Assume a following execution: > >> fit <- lmFit(data,design,...) >> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F") > > > specifically, in the function topTable, the line fit <- eBayes(fit[, coef]) > raises my concerns (see comment below in the code). > > topTable <-function(...) { > > [...] > > if (length(coef) > 1) { > > coef <- unique(coef) > > if (length(fit$coef[1, coef]) < ncol(fit)) > > fit <- eBayes(fit[, coef]) > > # ^ Above reevaluates eBayes without robust=TRUE > > # even if we want a robust estimation > > [...] > > } > > > Second confusion is about subsetting of fit object fit[, coef]. > > fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does > fit[,coef] perform? Is it by samples, or by coefficients? > > After inspection I see that fit[,coef] subsets the fit$design matrix on > samples, not on coefficients (while it does subset on columns from > fit$coefficients). > Inspection of the code responsible for subsetting "MArrayLM" reveals: > > subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX) > { > [...] > for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE] > for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE] > for(a in I ) object[[a]] <- object[[a]][i] > for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE] > #Shouldnt this line above look more like this?: > #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE] > [...] > } > > > Here is an example of a design matrix (rows correspond to samples, columns > to fitted coefficients): > >> class(fit) > [1] "MArrayLM" > attr(,"package") > [1] "limma" >> fit$design > cluster.1 cluster.2 cluster.3 cluster.4 > 1 1 1 0 0 > 2 1 1 0 1 > 3 1 1 0 1 > [...] > 42 1 1 0 0 > 43 1 1 0 0 > 44 1 0 0 0 > [...] > > > Now, what I?d expect from an operation which subsets coefficients from a > MArrayLM model > (as in topTable line fit <- eBayes(fit[, coef])) is this > > cluster.2 cluster.3 > 1 1 0 > 2 1 0 > 3 1 0 > [...] > > > moreover fit[,2:3]$coefficients is modified exactly in this way. > > However, the resulting fit[,2:3]$design is a subset on samples: > >> fit[,2:3]$design > cluster.1 cluster.2 cluster.3 cluster.4 > 2 1 1 0 1 > 3 1 1 0 1 > > > Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to > achieve? This may be relevant, since subsetting fit this way propagates > further inside eBayes function, where it causes classifyTestsF not to run > (in case of my data), because of fit[,coef]$design being singular. > > eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4), > trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1)) > { > [...] > #!!! Below condition always evaluates to false > # because fit$design was malformed before and > # is.fullrank always returns false > if (!is.null(fit$design) && is.fullrank(fit$design)) { > F.stat <- classifyTestsF(fit, fstat.only = TRUE) > fit$F <- as.vector(F.stat) > [...] > } > > > I?m not pointing to any specific misbehavior/bug, because it seems that > > topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) > > still somehow produces the same results as my modified: > > topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by="F",ro bust=TRUE) > > > and different from non-robust: > > topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F") > > > So the final question is: > Does fit[,coef] modifies the fitdesign matrix properly, in alignment with > later check is.fullrank(fitdesign)? > > > Attached is a simple example R session with these issues. > > Thanks, > Maciek Sykulski > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
Let me clarify that further. fit[,j] should not subset the design matrix at all. The effect of fit[,j] is actually the same as: contrasts.fit(fit, contrast=u) where u[j]=1 and u is otherwise zero. The design matrix is kept intact, but the subsetted object keeps track of which contrasts are represented in the coefficients matrix by means of the contrasts matrix. For example, the following code will reproduce in limma 3.19.10 and earlier or limma 3.21.2 and later: > example(lmFit) > fit[,2]$design Grp1 Grp2vs1 [1,] 1 0 [2,] 1 0 [3,] 1 0 [4,] 1 1 [5,] 1 1 [6,] 1 1 > fit[,2]$contrasts Contrast Coefficient Grp2vs1 Grp1 0 Grp2vs1 1 Best wishes Gordon On Wed, 14 May 2014, Gordon K Smyth wrote: > Dear Maciek, > > The second index of an MArrayLM object subsets by coefficients. You > will easily see by typing dim(fit) or colnames(fit) and so on that the > columns refer to coefficients. > > You are right that the current subsetting of the design matrix is > incorrect. I've fixed this now in bioc-devel. > > All the limma subsetting code has been rewritten in past 12 months and > the treatment of the design matrix was overlooked. In any case, > subsetting a design matrix is not always a sensible thing to do, and the > subsetted matrix is not normally used downstream. > > Please upgrade to the latest version of Bioconductor. > > Best wishes > Gordon > > > > On Mon, 12 May 2014, Maciek Sykulski wrote: > >> Dear Gordon, dear maintainers of limma package, >> >> While writing some code based on the R package limma, Bioconductor version: >> Release 2.13, >> I?ve noticed some strange code in topTable, and in subsetting fit[,i] of >> "MArrayLM" object (I?m not sure if this results in any bug). >> >> Assume a following execution: >> >>> fit <- lmFit(data,design,...) >>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F") >> >> >> specifically, in the function topTable, the line fit <- eBayes(fit[, coef]) >> raises my concerns (see comment below in the code). >> >> topTable <-function(...) { >> >> [...] >> >> if (length(coef) > 1) { >> >> coef <- unique(coef) >> >> if (length(fit$coef[1, coef]) < ncol(fit)) >> >> fit <- eBayes(fit[, coef]) >> >> # ^ Above reevaluates eBayes without robust=TRUE >> >> # even if we want a robust estimation >> >> [...] >> >> } >> >> >> Second confusion is about subsetting of fit object fit[, coef]. >> >> fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does >> fit[,coef] perform? Is it by samples, or by coefficients? >> >> After inspection I see that fit[,coef] subsets the fit$design matrix on >> samples, not on coefficients (while it does subset on columns from >> fit$coefficients). >> Inspection of the code responsible for subsetting "MArrayLM" reveals: >> >> subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX) >> { >> [...] >> for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE] >> for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE] >> for(a in I ) object[[a]] <- object[[a]][i] >> for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE] >> #Shouldnt this line above look more like this?: >> #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE] >> [...] >> } >> >> >> Here is an example of a design matrix (rows correspond to samples, columns >> to fitted coefficients): >> >>> class(fit) >> [1] "MArrayLM" >> attr(,"package") >> [1] "limma" >>> fit$design >> cluster.1 cluster.2 cluster.3 cluster.4 >> 1 1 1 0 0 >> 2 1 1 0 1 >> 3 1 1 0 1 >> [...] >> 42 1 1 0 0 >> 43 1 1 0 0 >> 44 1 0 0 0 >> [...] >> >> >> Now, what I?d expect from an operation which subsets coefficients from a >> MArrayLM model >> (as in topTable line fit <- eBayes(fit[, coef])) is this >> >> cluster.2 cluster.3 >> 1 1 0 >> 2 1 0 >> 3 1 0 >> [...] >> >> >> moreover fit[,2:3]$coefficients is modified exactly in this way. >> >> However, the resulting fit[,2:3]$design is a subset on samples: >> >>> fit[,2:3]$design >> cluster.1 cluster.2 cluster.3 cluster.4 >> 2 1 1 0 1 >> 3 1 1 0 1 >> >> >> Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to >> achieve? This may be relevant, since subsetting fit this way propagates >> further inside eBayes function, where it causes classifyTestsF not to run >> (in case of my data), because of fit[,coef]$design being singular. >> >> eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4), >> trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1)) >> { >> [...] >> #!!! Below condition always evaluates to false >> # because fit$design was malformed before and >> # is.fullrank always returns false >> if (!is.null(fit$design) && is.fullrank(fit$design)) { >> F.stat <- classifyTestsF(fit, fstat.only = TRUE) >> fit$F <- as.vector(F.stat) >> [...] >> } >> >> >> I?m not pointing to any specific misbehavior/bug, because it seems that >> >> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) >> >> still somehow produces the same results as my modified: >> >> topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by="F",r obust=TRUE) >> >> >> and different from non-robust: >> >> topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F") >> >> >> So the final question is: >> Does fit[,coef] modifies the fitdesign matrix properly, in alignment with >> later check is.fullrank(fitdesign)? >> >> >> Attached is a simple example R session with these issues. >> >> Thanks, >> Maciek Sykulski > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
Thank you Gordon, that makes sense. Also, I assume that > topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) creates a topTable ordered by a F-statistics for coefficients 2,3 only. I just couldn't pinpoint the source code where F-statistics for all coefficients (computed by eBayes(fit,robust=TRUE) is "trimmed" to F-statistics for only selected ones. Best regards, Maciek Sykulski On Wed, May 14, 2014 at 9:36 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Let me clarify that further. > > fit[,j] should not subset the design matrix at all. The effect of fit[,j] > is actually the same as: > > contrasts.fit(fit, contrast=u) > > where u[j]=1 and u is otherwise zero. The design matrix is kept intact, > but the subsetted object keeps track of which contrasts are represented in > the coefficients matrix by means of the contrasts matrix. > > For example, the following code will reproduce in limma 3.19.10 and > earlier or limma 3.21.2 and later: > > > example(lmFit) > > fit[,2]$design > Grp1 Grp2vs1 > [1,] 1 0 > [2,] 1 0 > [3,] 1 0 > [4,] 1 1 > [5,] 1 1 > [6,] 1 1 > > fit[,2]$contrasts > Contrast > Coefficient Grp2vs1 > Grp1 0 > Grp2vs1 1 > > Best wishes > Gordon > > On Wed, 14 May 2014, Gordon K Smyth wrote: > > Dear Maciek, >> >> The second index of an MArrayLM object subsets by coefficients. You will >> easily see by typing dim(fit) or colnames(fit) and so on that the columns >> refer to coefficients. >> >> You are right that the current subsetting of the design matrix is >> incorrect. I've fixed this now in bioc-devel. >> >> All the limma subsetting code has been rewritten in past 12 months and >> the treatment of the design matrix was overlooked. In any case, subsetting >> a design matrix is not always a sensible thing to do, and the subsetted >> matrix is not normally used downstream. >> >> Please upgrade to the latest version of Bioconductor. >> >> Best wishes >> Gordon >> >> >> >> On Mon, 12 May 2014, Maciek Sykulski wrote: >> >> Dear Gordon, dear maintainers of limma package, >>> >>> While writing some code based on the R package limma, Bioconductor >>> version: >>> Release 2.13, >>> Iâve noticed some strange code in topTable, and in subsetting fit[,i] of >>> "MArrayLM" object (Iâm not sure if this results in any bug). >>> >>> Assume a following execution: >>> >>> fit <- lmFit(data,design,...) >>>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F") >>>> >>> >>> >>> specifically, in the function topTable, the line fit <- eBayes(fit[, >>> coef]) >>> raises my concerns (see comment below in the code). >>> >>> topTable <-function(...) { >>> >>> [...] >>> >>> if (length(coef) > 1) { >>> >>> coef <- unique(coef) >>> >>> if (length(fit$coef[1, coef]) < ncol(fit)) >>> >>> fit <- eBayes(fit[, coef]) >>> >>> # ^ Above reevaluates eBayes without robust=TRUE >>> >>> # even if we want a robust estimation >>> >>> [...] >>> >>> } >>> >>> >>> Second confusion is about subsetting of fit object fit[, coef]. >>> >>> fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does >>> fit[,coef] perform? Is it by samples, or by coefficients? >>> >>> After inspection I see that fit[,coef] subsets the fit$design matrix on >>> samples, not on coefficients (while it does subset on columns from >>> fit$coefficients). >>> Inspection of the code responsible for subsetting "MArrayLM" reveals: >>> >>> subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX) >>> { >>> [...] >>> for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE] >>> for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE] >>> for(a in I ) object[[a]] <- object[[a]][i] >>> for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE] >>> #Shouldnt this line above look more like this?: >>> #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE] >>> [...] >>> } >>> >>> >>> Here is an example of a design matrix (rows correspond to samples, >>> columns >>> to fitted coefficients): >>> >>> class(fit) >>>> >>> [1] "MArrayLM" >>> attr(,"package") >>> [1] "limma" >>> >>>> fit$design >>>> >>> cluster.1 cluster.2 cluster.3 cluster.4 >>> 1 1 1 0 0 >>> 2 1 1 0 1 >>> 3 1 1 0 1 >>> [...] >>> 42 1 1 0 0 >>> 43 1 1 0 0 >>> 44 1 0 0 0 >>> [...] >>> >>> >>> Now, what Iâd expect from an operation which subsets coefficients from a >>> MArrayLM model >>> (as in topTable line fit <- eBayes(fit[, coef])) is this >>> >>> cluster.2 cluster.3 >>> 1 1 0 >>> 2 1 0 >>> 3 1 0 >>> [...] >>> >>> >>> moreover fit[,2:3]$coefficients is modified exactly in this way. >>> >>> However, the resulting fit[,2:3]$design is a subset on samples: >>> >>> fit[,2:3]$design >>>> >>> cluster.1 cluster.2 cluster.3 cluster.4 >>> 2 1 1 0 1 >>> 3 1 1 0 1 >>> >>> >>> Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to >>> achieve? This may be relevant, since subsetting fit this way propagates >>> further inside eBayes function, where it causes classifyTestsF not to run >>> (in case of my data), because of fit[,coef]$design being singular. >>> >>> eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4), >>> trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1)) >>> { >>> [...] >>> #!!! Below condition always evaluates to false >>> # because fit$design was malformed before and >>> # is.fullrank always returns false >>> if (!is.null(fit$design) && is.fullrank(fit$design)) { >>> F.stat <- classifyTestsF(fit, fstat.only = TRUE) >>> fit$F <- as.vector(F.stat) >>> [...] >>> } >>> >>> >>> Iâm not pointing to any specific misbehavior/bug, because it seems that >>> >>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) >>> >>> still somehow produces the same results as my modified: >>> >>> topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by >>> ="F",robust=TRUE) >>> >>> >>> and different from non-robust: >>> >>> topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F") >>> >>> >>> So the final question is: >>> Does fit[,coef] modifies the fitdesign matrix properly, in alignment >>> with >>> later check is.fullrank(fitdesign)? >>> >>> >>> Attached is a simple example R session with these issues. >>> >>> Thanks, >>> Maciek Sykulski >>> >> >> > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:9}}
On Wed, 14 May 2014, Maciek Sykulski wrote: > Thank you Gordon, that makes sense. > > Also, I assume that > > topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) > creates a topTable ordered by a F-statistics for coefficients 2,3 only. Yes, as documented. sort.by="F" is the default in this case, so could be omitted. > I just couldn't pinpoint the source code where F-statistics for all > coefficients (computed by eBayes(fit,robust=TRUE) is "trimmed" to > F-statistics for only selected ones. This is done by topTableF(). Not sure why you need to examine the source code in this way. Why not just read the documentation, which says: "topTableF ranks genes on the basis of moderated F-statistics for one or more coefficients. If topTable is called and coef has two or more elements, then the specified columns will be extracted from fit and topTableF called on the result." Best wishes Gordon > Best regards, > Maciek Sykulski > > > On Wed, May 14, 2014 at 9:36 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Let me clarify that further. >> >> fit[,j] should not subset the design matrix at all. The effect of fit[,j] >> is actually the same as: >> >> contrasts.fit(fit, contrast=u) >> >> where u[j]=1 and u is otherwise zero. The design matrix is kept intact, >> but the subsetted object keeps track of which contrasts are represented in >> the coefficients matrix by means of the contrasts matrix. >> >> For example, the following code will reproduce in limma 3.19.10 and >> earlier or limma 3.21.2 and later: >> >> > example(lmFit) >> > fit[,2]$design >> Grp1 Grp2vs1 >> [1,] 1 0 >> [2,] 1 0 >> [3,] 1 0 >> [4,] 1 1 >> [5,] 1 1 >> [6,] 1 1 >> > fit[,2]$contrasts >> Contrast >> Coefficient Grp2vs1 >> Grp1 0 >> Grp2vs1 1 >> >> Best wishes >> Gordon >> >> On Wed, 14 May 2014, Gordon K Smyth wrote: >> >> Dear Maciek, >>> >>> The second index of an MArrayLM object subsets by coefficients. You will >>> easily see by typing dim(fit) or colnames(fit) and so on that the columns >>> refer to coefficients. >>> >>> You are right that the current subsetting of the design matrix is >>> incorrect. I've fixed this now in bioc-devel. >>> >>> All the limma subsetting code has been rewritten in past 12 months and >>> the treatment of the design matrix was overlooked. In any case, subsetting >>> a design matrix is not always a sensible thing to do, and the subsetted >>> matrix is not normally used downstream. >>> >>> Please upgrade to the latest version of Bioconductor. >>> >>> Best wishes >>> Gordon >>> >>> >>> >>> On Mon, 12 May 2014, Maciek Sykulski wrote: >>> >>> Dear Gordon, dear maintainers of limma package, >>>> >>>> While writing some code based on the R package limma, Bioconductor >>>> version: >>>> Release 2.13, >>>> I?ve noticed some strange code in topTable, and in subsetting fit[,i] of >>>> "MArrayLM" object (I?m not sure if this results in any bug). >>>> >>>> Assume a following execution: >>>> >>>> fit <- lmFit(data,design,...) >>>>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F") >>>>> >>>> >>>> >>>> specifically, in the function topTable, the line fit <- eBayes(fit[, >>>> coef]) >>>> raises my concerns (see comment below in the code). >>>> >>>> topTable <-function(...) { >>>> >>>> [...] >>>> >>>> if (length(coef) > 1) { >>>> >>>> coef <- unique(coef) >>>> >>>> if (length(fit$coef[1, coef]) < ncol(fit)) >>>> >>>> fit <- eBayes(fit[, coef]) >>>> >>>> # ^ Above reevaluates eBayes without robust=TRUE >>>> >>>> # even if we want a robust estimation >>>> >>>> [...] >>>> >>>> } >>>> >>>> >>>> Second confusion is about subsetting of fit object fit[, coef]. >>>> >>>> fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does >>>> fit[,coef] perform? Is it by samples, or by coefficients? >>>> >>>> After inspection I see that fit[,coef] subsets the fit$design matrix on >>>> samples, not on coefficients (while it does subset on columns from >>>> fit$coefficients). >>>> Inspection of the code responsible for subsetting "MArrayLM" reveals: >>>> >>>> subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX) >>>> { >>>> [...] >>>> for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE] >>>> for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE] >>>> for(a in I ) object[[a]] <- object[[a]][i] >>>> for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE] >>>> #Shouldnt this line above look more like this?: >>>> #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE] >>>> [...] >>>> } >>>> >>>> >>>> Here is an example of a design matrix (rows correspond to samples, >>>> columns >>>> to fitted coefficients): >>>> >>>> class(fit) >>>>> >>>> [1] "MArrayLM" >>>> attr(,"package") >>>> [1] "limma" >>>> >>>>> fit$design >>>>> >>>> cluster.1 cluster.2 cluster.3 cluster.4 >>>> 1 1 1 0 0 >>>> 2 1 1 0 1 >>>> 3 1 1 0 1 >>>> [...] >>>> 42 1 1 0 0 >>>> 43 1 1 0 0 >>>> 44 1 0 0 0 >>>> [...] >>>> >>>> >>>> Now, what I?d expect from an operation which subsets coefficients from a >>>> MArrayLM model >>>> (as in topTable line fit <- eBayes(fit[, coef])) is this >>>> >>>> cluster.2 cluster.3 >>>> 1 1 0 >>>> 2 1 0 >>>> 3 1 0 >>>> [...] >>>> >>>> >>>> moreover fit[,2:3]$coefficients is modified exactly in this way. >>>> >>>> However, the resulting fit[,2:3]$design is a subset on samples: >>>> >>>> fit[,2:3]$design >>>>> >>>> cluster.1 cluster.2 cluster.3 cluster.4 >>>> 2 1 1 0 1 >>>> 3 1 1 0 1 >>>> >>>> >>>> Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to >>>> achieve? This may be relevant, since subsetting fit this way propagates >>>> further inside eBayes function, where it causes classifyTestsF not to run >>>> (in case of my data), because of fit[,coef]$design being singular. >>>> >>>> eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4), >>>> trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1)) >>>> { >>>> [...] >>>> #!!! Below condition always evaluates to false >>>> # because fit$design was malformed before and >>>> # is.fullrank always returns false >>>> if (!is.null(fit$design) && is.fullrank(fit$design)) { >>>> F.stat <- classifyTestsF(fit, fstat.only = TRUE) >>>> fit$F <- as.vector(F.stat) >>>> [...] >>>> } >>>> >>>> >>>> I?m not pointing to any specific misbehavior/bug, because it seems that >>>> >>>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N) >>>> >>>> still somehow produces the same results as my modified: >>>> >>>> topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by >>>> ="F",robust=TRUE) >>>> >>>> >>>> and different from non-robust: >>>> >>>> topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F") >>>> >>>> >>>> So the final question is: >>>> Does fit[,coef] modifies the fitdesign matrix properly, in alignment >>>> with >>>> later check is.fullrank(fitdesign)? >>>> >>>> >>>> Attached is a simple example R session with these issues. >>>> >>>> Thanks, >>>> Maciek Sykulski >>>> >>> >>> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
0
4.3 years ago by
Panos Bolan20
Panos Bolan20 wrote:
Dear list, I recently read a Bioconductor post where the developer of the WGCNA suggested the use of the package for RNA-seq data analysis after implementing a variance stabilization normalization to the raw counts. I have read the tutorials and run the example dataset at http://labs.g enetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials /index.html. I would like to apply WGCNA to my RNA-seq data consisting of 1000 transcripts whose expression is measured for 50 triplicated cell types (approximately 150 samples) and derive gene networks. I would like to ask if WGCNA can be used successfully in this kind of heterogeneous dataset where, for most of the transcripts, the various cell types expression patterns might differ substantially (so that a variance stabilizing transformation will not give me approximately normal distribution for each transcript; it would rather be a mixture of normal distributions). Thank you, Pan [[alternative HTML version deleted]]