Problem with topTable function (after Bioconductor update)
0
0
Entering edit mode
@gordon-smyth
Last seen 47 minutes ago
WEHI, Melbourne, Australia
Dear James and Christian, There was a bug in topTableF(). Now fixed. Thanks for pointing it out. Best wishes Gordon > Date: Wed, 30 Oct 2013 14:08:29 -0400 > From: "James W. MacDonald" <jmacdon at="" uw.edu=""> > To: Christian De Santis <christian.desantis at="" stir.ac.uk=""> > Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] Problem with topTable function (after Bioconductor > update) > > Hi Christian, > > On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote: >> Hi to everybody, >> >> I hope someone can help me on this as I am losing my health. > > Not sure if you were intending humor, but this cracked me up! > >> >> I was re running my script today which worked fine till last time I did >> it. When I run the topTable function on more than one coefficient it >> gives me the following error. >> >>> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05) >> Error in `row.names<-.data.frame`(`*tmp*`, value = value) : >> duplicate 'row.names' are not allowed >> In addition: Warning message: >> non-unique values when setting 'row.names': >> >> I have tried to run the same script for each coefficient separately and >> it works fine. I can't explain the error since if there were duplicated >> row.names then I should get the same error when doing the analysis on >> individual contrasts since I run topTable on the fit object. >> >> The only thing I have changed was the fact that this morning I updated >> Bioconductor to the latest release. >> >> I really hope someone can help me understand. > > There appears to be a bug in the function topTableF. > > When you run topTable() on multiple coefficients, it actually uses > topTableF(). In that function there is a bit of code where the data are > subsetted according to the p.value and lfc values you submitted: > > if (lfc > 0 || p.value < 1) { > if (lfc > 0) > big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0 > else big <- TRUE > if (p.value < 1) { > sig <- adj.P.Value <= p.value > sig[is.na(sig)] <- FALSE > } > else sig <- TRUE > keep <- big & sig > if (!all(keep)) { > M <- M[keep, , drop = FALSE] > rn <- rn[keep] > Amean <- Amean[keep] > Fstat <- Fstat[keep] > Fp <- p.value[keep] > genelist <- genelist[keep, , drop = FALSE] > adj.P.Value <- adj.P.Value[keep] > } > > So now all the data have been subsetted (in your case) to just those > genes with an unadjusted p-value < 0.05. And presumably that is fewer > than the 20000 genes that you started with. But then there is some > sorting of the resulting data: > > if sort.by == "F") > o <- order(Fp, decreasing = FALSE)[1:number] > else o <- 1:number > if (is.null(genelist)) > tab <- data.frame(M[o, , drop = FALSE]) > else tab <- data.frame(genelist[o, , drop = FALSE], M[o, > , drop = FALSE]) > tab$AveExpr = fit$Amean[o] > tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val = > adj.P.Value[o]) > rownames(tab) <- rn[o] > > The problem here is that 1:number is _longer_ than Fp, which has been > subsetted already, based on the p-value criterion. So the vector 'o' > will have a bunch of NAs on the end. As an example: > > (1:5)[1:8] > [1] 1 2 3 4 5 NA NA NA > > And when you get to the last line there, where the rownames are set, > since there are multiple NAs, you will have repeated rownames, which R > won't allow. > > In the interest of your health, you can get around this for now by doing > > num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4), > number=20000)$P.value < 0.05) > > and then > > x.contrast.REG <- topTable(fit.contrast, adjust.method="none", > coef=c(1:4), number=num, p.value = 0.05) > > Best, > > Jim > > >> >> Regards, >> Christian De Santis >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 >> [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 >> [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 >> [16] xtable_1.7-1 >> >> >> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
• 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6