Question

Problem with topTable function (after Bioconductor update)

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 47 minutes ago

WEHI, Melbourne, Australia

Dear James and Christian, There was a bug in topTableF(). Now fixed. Thanks for pointing it out. Best wishes Gordon > Date: Wed, 30 Oct 2013 14:08:29 -0400 > From: "James W. MacDonald" <jmacdon at="" uw.edu=""> > To: Christian De Santis <christian.desantis at="" stir.ac.uk=""> > Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] Problem with topTable function (after Bioconductor > update) > > Hi Christian, > > On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote: >> Hi to everybody, >> >> I hope someone can help me on this as I am losing my health. > > Not sure if you were intending humor, but this cracked me up! > >> >> I was re running my script today which worked fine till last time I did >> it. When I run the topTable function on more than one coefficient it >> gives me the following error. >> >>> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05) >> Error in `row.names<-.data.frame`(`*tmp*`, value = value) : >> duplicate 'row.names' are not allowed >> In addition: Warning message: >> non-unique values when setting 'row.names': >> >> I have tried to run the same script for each coefficient separately and >> it works fine. I can't explain the error since if there were duplicated >> row.names then I should get the same error when doing the analysis on >> individual contrasts since I run topTable on the fit object. >> >> The only thing I have changed was the fact that this morning I updated >> Bioconductor to the latest release. >> >> I really hope someone can help me understand. > > There appears to be a bug in the function topTableF. > > When you run topTable() on multiple coefficients, it actually uses > topTableF(). In that function there is a bit of code where the data are > subsetted according to the p.value and lfc values you submitted: > > if (lfc > 0 || p.value < 1) { > if (lfc > 0) > big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0 > else big <- TRUE > if (p.value < 1) { > sig <- adj.P.Value <= p.value > sig[is.na(sig)] <- FALSE > } > else sig <- TRUE > keep <- big & sig > if (!all(keep)) { > M <- M[keep, , drop = FALSE] > rn <- rn[keep] > Amean <- Amean[keep] > Fstat <- Fstat[keep] > Fp <- p.value[keep] > genelist <- genelist[keep, , drop = FALSE] > adj.P.Value <- adj.P.Value[keep] > } > > So now all the data have been subsetted (in your case) to just those > genes with an unadjusted p-value < 0.05. And presumably that is fewer > than the 20000 genes that you started with. But then there is some > sorting of the resulting data: > > if sort.by == "F") > o <- order(Fp, decreasing = FALSE)[1:number] > else o <- 1:number > if (is.null(genelist)) > tab <- data.frame(M[o, , drop = FALSE]) > else tab <- data.frame(genelist[o, , drop = FALSE], M[o, > , drop = FALSE]) > tab$AveExpr = fit$Amean[o] > tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val = > adj.P.Value[o]) > rownames(tab) <- rn[o] > > The problem here is that 1:number is _longer_ than Fp, which has been > subsetted already, based on the p-value criterion. So the vector 'o' > will have a bunch of NAs on the end. As an example: > > (1:5)[1:8] > [1] 1 2 3 4 5 NA NA NA > > And when you get to the last line there, where the rownames are set, > since there are multiple NAs, you will have repeated rownames, which R > won't allow. > > In the interest of your health, you can get around this for now by doing > > num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4), > number=20000)$P.value < 0.05) > > and then > > x.contrast.REG <- topTable(fit.contrast, adjust.method="none", > coef=c(1:4), number=num, p.value = 0.05) > > Best, > > Jim > > >> >> Regards, >> Christian De Santis >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 >> [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 >> [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 >> [16] xtable_1.7-1 >> >> >> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

• 1.0k views

ADD COMMENT • link 10.5 years ago Gordon Smyth 50k