Question

Problem with topTable function (after Bioconductor update)

0

Entering edit mode

Christian De Santis ▴ 150

@christian-de-santis-6143

Last seen 9.7 years ago

Hi to everybody, I hope someone can help me on this as I am losing my health. I was re running my script today which worked fine till last time I did it. When I run the topTable function on more than one coefficient it gives me the following error. > x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05) Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': I have tried to run the same script for each coefficient separately and it works fine. I can't explain the error since if there were duplicated row.names then I should get the same error when doing the analysis on individual contrasts since I run topTable on the fit object. The only thing I have changed was the fact that this morning I updated Bioconductor to the latest release. I really hope someone can help me understand. Regards, Christian De Santis > sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 [16] xtable_1.7-1 -- The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159. [[alternative HTML version deleted]]

• 891 views

ADD COMMENT • link updated 10.6 years ago by James W. MacDonald 65k • written 10.6 years ago by Christian De Santis ▴ 150

score 0 · Answer 1 · 2013-10-30

Hi Christian, On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote: > Hi to everybody, > > I hope someone can help me on this as I am losing my health. Not sure if you were intending humor, but this cracked me up! > > I was re running my script today which worked fine till last time I did it. When I run the topTable function on more than one coefficient it gives me the following error. > >> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05) > Error in `row.names<-.data.frame`(`*tmp*`, value = value) : > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique values when setting 'row.names': > > I have tried to run the same script for each coefficient separately and it works fine. I can't explain the error since if there were duplicated row.names then I should get the same error when doing the analysis on individual contrasts since I run topTable on the fit object. > > The only thing I have changed was the fact that this morning I updated Bioconductor to the latest release. > > I really hope someone can help me understand. There appears to be a bug in the function topTableF. When you run topTable() on multiple coefficients, it actually uses topTableF(). In that function there is a bit of code where the data are subsetted according to the p.value and lfc values you submitted: if (lfc > 0 || p.value < 1) { if (lfc > 0) big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0 else big <- TRUE if (p.value < 1) { sig <- adj.P.Value <= p.value sig[is.na(sig)] <- FALSE } else sig <- TRUE keep <- big & sig if (!all(keep)) { M <- M[keep, , drop = FALSE] rn <- rn[keep] Amean <- Amean[keep] Fstat <- Fstat[keep] Fp <- p.value[keep] genelist <- genelist[keep, , drop = FALSE] adj.P.Value <- adj.P.Value[keep] } So now all the data have been subsetted (in your case) to just those genes with an unadjusted p-value < 0.05. And presumably that is fewer than the 20000 genes that you started with. But then there is some sorting of the resulting data: if sort.by == "F") o <- order(Fp, decreasing = FALSE)[1:number] else o <- 1:number if (is.null(genelist)) tab <- data.frame(M[o, , drop = FALSE]) else tab <- data.frame(genelist[o, , drop = FALSE], M[o, , drop = FALSE]) tab$AveExpr = fit$Amean[o] tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val = adj.P.Value[o]) rownames(tab) <- rn[o] The problem here is that 1:number is _longer_ than Fp, which has been subsetted already, based on the p-value criterion. So the vector 'o' will have a bunch of NAs on the end. As an example: (1:5)[1:8] [1] 1 2 3 4 5 NA NA NA And when you get to the last line there, where the rownames are set, since there are multiple NAs, you will have repeated rownames, which R won't allow. In the interest of your health, you can get around this for now by doing num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000)$P.value < 0.05) and then x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=num, p.value = 0.05) Best, Jim > > Regards, > Christian De Santis > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 > [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 > [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 > [16] xtable_1.7-1 > > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099