Hello-
I am working with a non-model organism (Bombus impatiens) for which NCBI has some information (taxon id = "132113"), but no GO information. I learned this trying to use the makeOrgPackageFromNCBI() function.
makeOrgPackageFromNCBI(version = "2.0",
author = "Me <me@email.edu>",
maintainer = "Me <me@email.edu>",
outputDir = ".",
tax_id = "132113",
genus = "Bombus",
species = "impatiens")
I therefore exported the blast2GO transcriptome annotation results to try to create my own GO annotated database. Obviously, not all of my transcripts have GO annotation in the full data set, but they all have GO annotations in this sample dataset I have attached. The makeOrgPackage() command faults out with the following error whether I use the full dataset or the sample dataset.
Error in FUN(X[[i]], ...) :
data.frames in '...' cannot contain duplicated rows
I would like to either be able to add my GO annotations to the existing database built from NCBI or use the makeOrgPackage() command.
Please find my code, error messages, and assorted diagnostic output below.
Thank you!
Meaghan
> setwd("~/path/to/AnnotationForgeDB")
> library(AnnotationForge)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colMeans, colnames, colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste,
pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
> library(AnnotationDbi)
> symTable <- read.table(file="./sampleBimp_symbolTable.txt", sep="\t", header=TRUE)
> goTable <- read.table(file="./sampleBimp_GOTable.txt", sep="\t", header=TRUE)
> chrTable <- read.table(file="./sampleBimp_chrTable.txt", sep="\t", header=TRUE)
>
> makeOrgPackage(gene_info= symTable, chromosome= chrTable, go= goTable, verbose=TRUE,
+ version = "2.15",
+ author = "Me <me@email.edu>",
+ maintainer = "Me <me@email.edu>",
+ outputDir = ".",
+ tax_id = "132113",
+ genus = "Bombus",
+ species = "impatiens",
+ goTable= "go")
Error in FUN(X[[i]], ...) :
data.frames in '...' cannot contain duplicated rows
> BiocInstaller::biocValid()
* sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationForge_1.20.0 AnnotationDbi_1.40.0 IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0 BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 XML_3.98-1.9 digest_0.6.12 bitops_1.0-6 DBI_0.7 RSQLite_2.0 BiocInstaller_1.28.0 rlang_0.1.4 blob_1.1.0 tools_3.4.3 bit64_0.9-7
[12] RCurl_1.95-4.8 bit_1.1-12 compiler_3.4.3 memoise_1.1.0 tibble_1.3.4
* Out-of-date packages
Package LibPath Installed Built ReposVer Repository
limma "limma" "/Library/Frameworks/R.framework/Versions/3.4/Resources/library" "3.34.3" "3.4.2" "3.34.4" "https://bioconductor.org/packages/3.6/bioc/src/contrib"
update with biocLite()
Error: 1 package(s) out of date
> symTable <- read.table(file="./sampleBimp_symbolTable.txt", sep="\t", header=TRUE)
> goTable <- read.table(file="./sampleBimp_GOTable.txt", sep="\t", header=TRUE)
> chrTable <- read.table(file="./sampleBimp_chrTable.txt", sep="\t", header=TRUE)
>
> makeOrgPackage(gene_info= symTable, chromosome= chrTable, go= goTable, verbose=TRUE,
+ version = "2.15",
+ author = "Me <me@email.edu>",
+ maintainer = "Me <me@email.edu>",
+ outputDir = ".",
+ tax_id = "132113",
+ genus = "Bombus",
+ species = "impatiens",
+ goTable= "go")
Error in FUN(X[[i]], ...) :
data.frames in '...' cannot contain duplicated rows
> traceback()
6: stop("data.frames in '...' cannot contain duplicated rows")
5: FUN(X[[i]], ...)
4: lapply(data, function(x) {
rownames(x) <- NULL
if (any(duplicated(x)))
stop("data.frames in '...' cannot contain duplicated rows")
x
})
3: lapply(data, function(x) {
rownames(x) <- NULL
if (any(duplicated(x)))
stop("data.frames in '...' cannot contain duplicated rows")
x
})
2: .makeOrgPackage(data, version = version, maintainer = maintainer,
author = author, outputDir = outputDir, tax_id = tax_id,
genus = genus, species = species, goTable = goTable, verbose = verbose)
1: makeOrgPackage(gene_info = symTable, chromosome = chrTable, go = goTable,
verbose = TRUE, version = "2.15", author = "Me <me@email.edu>",
maintainer = "Me <me@email.edu>", outputDir = ".",
tax_id = "132113", genus = "Bombus", species = "impatiens",
goTable = "go")
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationForge_1.20.0 AnnotationDbi_1.40.0 IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0 BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 XML_3.98-1.9 digest_0.6.12 bitops_1.0-6 DBI_0.7 RSQLite_2.0 BiocInstaller_1.28.0 rlang_0.1.4 blob_1.1.0 tools_3.4.3 bit64_0.9-7
[12] RCurl_1.95-4.8 bit_1.1-12 compiler_3.4.3 memoise_1.1.0 tibble_1.3.4
symTable: https://alabama.box.com/s/4bxr73guw1o8vyjvj9zghhixe01pag50
chrTable: https://alabama.box.com/s/1qa82jr11tk3wqz0s6rp1atdu8fq9zxi
goTable: https://alabama.box.com/s/qarfshu11nbt4qxh9ey7hrrrhh67jeov