Goseq supported organism (felcat5) has an error when testing
1
2
Entering edit mode
xyliu00 ▴ 20
@xyliu00-8370
Last seen 9.3 years ago
United States

Recently I started using it for pathway analysis. It works well for human mRNAseq DGE data. But I encountered some problem with non-model samples. I hope I could get some help from you.

The specific problem I have is with cat genome. I looked up the supported genome:

> supportedGenomes()[,1:4]

12   felCat5     Cat  Sep. 2011                    ICGSC Felis_catus-6.2
13   felCat4     Cat  Dec. 2008                         NHGRI catChrV17e
14   felCat3     Cat  Mar. 2006                Broad Institute Release 3

followed the example in the tutorial, constructed the DE genes vector:

> genes
ENSFCAG00000015719 
                 1
ENSFCAG00000005390 
                 0
…

then calculated pwf:

> pwf=nullp(genes,"felCat5","ensGene")
> head(pwf)
                   DEgenes bias.data        pwf
ENSFCAG00000015719       1      1637 0.01063125
ENSFCAG00000031227       1       546 0.01063223
ENSFCAG00000014746       1       738 0.01063220
ENSFCAG00000005042       1      3636 0.01015309
ENSFCAG00000001898       1      2379 0.01058249
ENSFCAG00000023471       1       540 0.01063223

> tail(pwf)
                   DEgenes bias.data        pwf
ENSFCAG00000005390       0       492 0.01063224
ENSFCAG00000009612       0      1527 0.01063165
ENSFCAG00000027330       0       996 0.01063211
ENSFCAG00000012934       0      2844 0.01046660
ENSFCAG00000007036       0      1520 0.01063167
ENSFCAG00000004574       0      1420 0.01063179

However when I tried to test, I had error:

> GO.wall = goseq(pwf,"felcat5","ensGene")
Error in library(paste(orgstring, "db", sep = "."), character.only = TRUE) : 
  there is no package called ‘NA.db’

 

As a novice R user, I figured that the orgstring must have not been defined (NA) in this case. I guess goseq or biocondutor does not know what annotation package should be loaded for felcat5. This pazzles me because felcat5 is supposed to be surpported in goseq. 

Some online search results suggest that I should install the annotation packages for the organism. But I seem to have trouble finding db packages for cat on bioconductor webpages. And the error message did not specify what package is missing.

So my question is how to make goseq working for cat DGE in this case. Is there a package I should install to solve the problem? And how to deal with this kind of situation (supported organism but difficult to find packages) in the future.

Thanks in advance!

 

sessionInfo()

R version 3.2.1 (2015-06-18)

Platform: x86_64-unknown-linux-gnu (64-bit)

Running under: CentOS release 6.5 (Final)

locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:

 [1] stats4    parallel  tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

 [1] org.Hs.eg.db_3.1.2    AnnotationDbi_1.30.1  Biobase_2.28.0        rtracklayer_1.28.7    GenomicRanges_1.20.5 
 [6] GenomeInfoDb_1.4.1    IRanges_2.2.7         S4Vectors_0.6.3       BiocGenerics_0.14.0   goseq_1.20.0         
[11] RSQLite_1.0.0         DBI_0.3.1             geneLenDataBase_1.4.0 BiasedUrn_1.06.1     

loaded via a namespace (and not attached):

 [1] XVector_0.8.0           zlibbioc_1.14.0         GenomicAlignments_1.4.1 BiocParallel_1.2.20     lattice_0.20-33        
 [6] grid_3.2.1              nlme_3.1-121            mgcv_1.8-7              lambda.r_1.1.7          futile.logger_1.4.1    
[11] Matrix_1.2-2            futile.options_1.0.0    bitops_1.0-6            RCurl_1.95-4.7          biomaRt_2.24.0         
[16] GO.db_3.1.2             GenomicFeatures_1.20.1  Biostrings_2.36.2       Rsamtools_1.20.4        XML_3.98-1.3  
goseq genesetenrichment annotation • 2.2k views
ADD COMMENT
0
Entering edit mode

@xyliu00 Are you, by chance, taking the Statistical-genomics course taught by Jeff Leek through Coursera?  This is the same problem I am having on the 4th week of the course.  I would like to find out if you found a work around or whether this was not a problem overall. Matt C., mockrun (at) gmail.com

ADD REPLY
0
Entering edit mode
@nadia-davidson-5739
Last seen 5.7 years ago
Australia

Hi Xiao-yu,

The supportedGenomes function is a bit misleading for a coupe of reasons. One is that it only checks whether the genome is supported in the geneLenDataBase package, but not for gene ontology information. The second is that if the "AvailableGeneIDs" column of the table returned is empty, then the genome is actually not supported by geneLenDataBase despite the genome name appearing in the table. These are things which I would like to fix when I get a chance.

To run goseq for your analysis you would need to provide the mapping between Ensembl gene IDs and go terms manually (via the "gene2cat" parameter of the goseq function). Often this information can be found in the "org.." bioconductor annotation packages, but there doesn't appear to be one for cat. The best way may be to get the go terms for each gene using the biomaRt package.

Best of luck with it and let us know if you need any help supplying the go terms manually.

Cheers,

Nadia.

ADD COMMENT

Login before adding your answer.

Traffic: 488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6