AnnotationForge, RNAseq, and non-model organism
1
0
Entering edit mode
mlpimsler • 0
@mlpimsler-14395
Last seen 3.0 years ago

Hello-

I am working with a non-model organism (Bombus impatiens) for which NCBI has some information (taxon id = "132113"), but no GO information. I learned this trying to use the makeOrgPackageFromNCBI() function.

makeOrgPackageFromNCBI(version = "2.0",
     author = "Me <me@email.edu>",
     maintainer = "Me <me@email.edu>",
     outputDir = ".",
     tax_id = "132113",
     genus = "Bombus",
     species = "impatiens")

I therefore exported the blast2GO transcriptome annotation results to try to create my own GO annotated database. Obviously, not all of my transcripts have GO annotation in the full data set, but they all have GO annotations in this sample dataset I have attached. The makeOrgPackage() command faults out with the following error whether I use the full dataset or the sample dataset.

Error in FUN(X[[i]], ...) : 
  data.frames in '...' cannot contain duplicated rows

I would like to either be able to add my GO annotations to the existing database built from NCBI or use the makeOrgPackage() command.

Please find my code, error messages, and assorted diagnostic output below. 

Thank you!

Meaghan


> setwd("~/path/to/AnnotationForgeDB")

> library(AnnotationForge)

Loading required package: BiocGenerics

Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames, colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste,

    pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: AnnotationDbi

Loading required package: stats4

Loading required package: IRanges

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

> library(AnnotationDbi)

> symTable <- read.table(file="./sampleBimp_symbolTable.txt", sep="\t", header=TRUE)

> goTable <- read.table(file="./sampleBimp_GOTable.txt", sep="\t", header=TRUE)

> chrTable <- read.table(file="./sampleBimp_chrTable.txt", sep="\t",  header=TRUE)

> 

> makeOrgPackage(gene_info= symTable, chromosome= chrTable,  go= goTable, verbose=TRUE,

+ version = "2.15",

+ author = "Me <me@email.edu>",

+ maintainer = "Me <me@email.edu>",

+ outputDir = ".",

+ tax_id = "132113",

+ genus = "Bombus",

+ species = "impatiens",

+ goTable= "go")

Error in FUN(X[[i]], ...) : 

  data.frames in '...' cannot contain duplicated rows

> BiocInstaller::biocValid()

* sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6

Matrix products: default

BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

[1] AnnotationForge_1.20.0 AnnotationDbi_1.40.0   IRanges_2.12.0         S4Vectors_0.16.0       Biobase_2.38.0         BiocGenerics_0.24.0   

loaded via a namespace (and not attached):

 [1] Rcpp_0.12.14         XML_3.98-1.9         digest_0.6.12        bitops_1.0-6         DBI_0.7              RSQLite_2.0          BiocInstaller_1.28.0 rlang_0.1.4          blob_1.1.0           tools_3.4.3          bit64_0.9-7         

[12] RCurl_1.95-4.8       bit_1.1-12           compiler_3.4.3       memoise_1.1.0        tibble_1.3.4        

* Out-of-date packages

      Package LibPath                                                          Installed Built   ReposVer Repository                                              

limma "limma" "/Library/Frameworks/R.framework/Versions/3.4/Resources/library" "3.34.3"  "3.4.2" "3.34.4" "https://bioconductor.org/packages/3.6/bioc/src/contrib"

update with biocLite()

Error: 1 package(s) out of date

> symTable <- read.table(file="./sampleBimp_symbolTable.txt", sep="\t", header=TRUE)

> goTable <- read.table(file="./sampleBimp_GOTable.txt", sep="\t", header=TRUE)

> chrTable <- read.table(file="./sampleBimp_chrTable.txt", sep="\t",  header=TRUE)

> 

> makeOrgPackage(gene_info= symTable, chromosome= chrTable,  go= goTable, verbose=TRUE,

+ version = "2.15",

+ author = "Me <me@email.edu>",

+ maintainer = "Me <me@email.edu>",

+ outputDir = ".",

+ tax_id = "132113",

+ genus = "Bombus",

+ species = "impatiens",

+ goTable= "go")

Error in FUN(X[[i]], ...) : 

  data.frames in '...' cannot contain duplicated rows

> traceback()

6: stop("data.frames in '...' cannot contain duplicated rows")

5: FUN(X[[i]], ...)

4: lapply(data, function(x) {

       rownames(x) <- NULL

       if (any(duplicated(x))) 

           stop("data.frames in '...' cannot contain duplicated rows")

       x

   })

3: lapply(data, function(x) {

       rownames(x) <- NULL

       if (any(duplicated(x))) 

           stop("data.frames in '...' cannot contain duplicated rows")

       x

   })

2: .makeOrgPackage(data, version = version, maintainer = maintainer, 

       author = author, outputDir = outputDir, tax_id = tax_id, 

       genus = genus, species = species, goTable = goTable, verbose = verbose)

1: makeOrgPackage(gene_info = symTable, chromosome = chrTable, go = goTable, 

       verbose = TRUE, version = "2.15", author = "Me <me@email.edu>", 

       maintainer = "Me <me@email.edu>", outputDir = ".", 

       tax_id = "132113", genus = "Bombus", species = "impatiens", 

       goTable = "go")

> sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6

Matrix products: default

BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

[1] AnnotationForge_1.20.0 AnnotationDbi_1.40.0   IRanges_2.12.0         S4Vectors_0.16.0       Biobase_2.38.0         BiocGenerics_0.24.0   

loaded via a namespace (and not attached):

 [1] Rcpp_0.12.14         XML_3.98-1.9         digest_0.6.12        bitops_1.0-6         DBI_0.7              RSQLite_2.0          BiocInstaller_1.28.0 rlang_0.1.4          blob_1.1.0           tools_3.4.3          bit64_0.9-7         

[12] RCurl_1.95-4.8       bit_1.1-12           compiler_3.4.3       memoise_1.1.0        tibble_1.3.4

symTable: https://alabama.box.com/s/4bxr73guw1o8vyjvj9zghhixe01pag50

chrTable: https://alabama.box.com/s/1qa82jr11tk3wqz0s6rp1atdu8fq9zxi

goTable: https://alabama.box.com/s/qarfshu11nbt4qxh9ey7hrrrhh67jeov

AnnotationForge annotationforge nonmodel rnaseq • 769 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Your goTable has a bunch of duplicated rows, and in fact contains no GO IDs anyway, so it's not clear to me what you expect to accomplish. If you actually have GO data that isn't all NA values, you could simply do

goTable <- goTable[!duplicated(goTable),]

And things should then work.

 

ADD COMMENT

Login before adding your answer.

Traffic: 137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6