Error in GOFrame due to NA for unsupported organism (Gene Ontology)
Entering edit mode
cagenet34 ▴ 20
Last seen 12 months ago
Toulouse, France, INRA

Hi all,

I'm using reportingTools following RNAseq differential analysis and am having trouble with adding annotation for GO using unsupported model organism (ie sheep,

Ovis aries (sheep) is not supported by "AnnotationForge" package, so I'm following the instructions from "How To Use GOstats and Category to do Hypergeometric testing with unsupported model organisms by M. Carlson".

I obtained the following error when I'm trying to build GOFrame (with yellow marker in the code below). This seems to be cause by "<NA>" .

Can someone help me to deal with this error ?

Thanks in advance


> rm(list=ls())
> library("GOstats", lib.loc="~/R/win-library/3.3")
> hub<-AnnotationHub()
snapshotDate(): 2016-06-06
> query(hub,c("Ovis aries","OrgDb"))
AnnotationHub with 1 record
# snapshotDate(): 2016-06-06 
# names(): AH48021
# $dataprovider:
# $species: Ovis aries
# $rdataclass: OrgDb
# $title:
# $description: NCBI gene ID based annotations about Ovis_aries
# $taxonomyid: 9940
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl:,
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation 
# retrieve record with 'object[["AH48021"]]' 
> sheep<-hub[["AH48021"]]
loading from cache 'C:/Users/cagenet/Documents/AppData/.AnnotationHub/54327'
> keytypes(sheep)
 [7] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"    "ONTOLOGYALL"
[13] "PMID"        "REFSEQ"      "SYMBOL"      "UNIGENE"    
> columns(sheep)
 [1] "ACCNUM"      "ALIAS"       "CHR"         "ENSEMBL"     "ENTREZID"    "EVIDENCE"   
 [7] "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[13] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"      "UNIGENE"    
> sheepEID<-(keys(sheep,"ENTREZID"))# all ENTREZID
><-select(sheep, sheepEID, c("GO","EVIDENCE"),"ENTREZID")
'select()' returned 1:many mapping between keys and columns
> goframeData=data.frame([,c(2,3,1)])#on inverse l'ordre
> head(goframeData)
1       <NA>     <NA> 100034665
2 GO:0030669      IEA 100034666
3 GO:0010008      IEA 100034666
4 GO:0033162      IEA 100034666
5 GO:0005507      IEA 100034666
6 GO:0016716      IEA 100034666
> goFrame=GOFrame(goframeData,organism="Ovis aries")
Error in .testGOFrame(x, organism) : invalid GO Evidence codes: 'NA'
> goframeData<- goframeData[-which(row(goframeData)=="\<NA>"),]
Error: '\<' is an unrecognized escape in character string starting ""\<"
> goframeData<- goframeData[-which(row(goframeData)=="/<NA>"),]
> head(goframeData)
<0 lignes> (ou 'row.names' de longueur nulle)
> sessionInfo()
R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GO.db_3.3.0                RSQLite_1.0.0              DBI_0.4-1                 
 [4] AnnotationForge_1.14.2         AnnotationHub_2.4.2       
 [7] GOstats_2.38.1             graph_1.50.0               Category_2.38.0           
[10] Matrix_1.2-6               AnnotationDbi_1.34.3       edgeR_3.14.0              
[13] limma_3.28.14              HTSFilter_1.12.0           ReportingTools_2.12.2     
[16] knitr_1.13                 biomaRt_2.28.0             DESeq2_1.12.3             
[19] SummarizedExperiment_1.2.3 Biobase_2.32.0             GenomicRanges_1.24.2      
[22] GenomeInfoDb_1.8.2         IRanges_2.6.1              S4Vectors_0.10.1          
[25] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    splines_3.3.1                 R.utils_2.3.0                
 [4] Formula_1.2-1                 shiny_0.13.2                  interactiveDisplayBase_1.10.3
 [7] latticeExtra_0.6-28           RBGL_1.48.1                   BSgenome_1.40.1              
[10] Rsamtools_1.24.0              lattice_0.20-33               biovizBase_1.20.0            
[13] chron_2.3-47                  digest_0.6.9                  RColorBrewer_1.1-2           
[16] XVector_0.12.0                colorspace_1.2-6              ggbio_1.20.1                 
[19] R.oo_1.20.0                   htmltools_0.3.5               httpuv_1.3.3                 
[22] plyr_1.8.4                    OrganismDbi_1.14.1            GSEABase_1.34.0              
[25] XML_3.98-1.4                  genefilter_1.54.2             zlibbioc_1.18.0              
[28] xtable_1.8-2                  scales_0.4.0                  BiocParallel_1.6.2           
[31] annotate_1.50.0               ggplot2_2.1.0                 PFAM.db_3.3.0                
[34] GenomicFeatures_1.24.3        nnet_7.3-12                   mime_0.4                     
[37] survival_2.39-5               magrittr_1.5                  evaluate_0.9                 
[40] R.methodsS3_1.7.1             GGally_1.2.0                  hwriter_1.3.2                
[43] foreign_0.8-66                BiocInstaller_1.22.3          rsconnect_0.4.3              
[46] tools_3.3.1                   data.table_1.9.6              stringr_1.0.0                
[49] munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.4                
[52] ensembldb_1.4.7               Biostrings_2.40.2             DESeq_1.24.0                 
[55] grid_3.3.1                    RCurl_1.95-4.8                dichromat_2.0-0              
[58] VariantAnnotation_1.18.1      bitops_1.0-6                  gtable_0.2.0                 
[61] curl_0.9.7                    reshape_0.8.5                 reshape2_1.4.1               
[64] R6_2.1.2                      GenomicAlignments_1.8.3       gridExtra_2.2.1              
[67] rtracklayer_1.32.1            Hmisc_3.17-4                  stringi_1.1.1                
[70] Rcpp_0.12.5                   geneplotter_1.50.0            rpart_4.1-10                 
[73] acepack_1.3-3.3              
reportingtools annotationhub gostats • 918 views
Entering edit mode
Last seen 6 hours ago
United States

The error is intended to be self-explanatory. It says

Error in .testGOFrame(x, organism) : invalid GO Evidence codes: 'NA'

And as you already showed, there is an NA in (at the very least) the first row!

> head(goframeData)
1       <NA>     <NA> 100034665
2 GO:0030669      IEA 100034666
3 GO:0010008      IEA 100034666
4 GO:0033162      IEA 100034666
5 GO:0005507      IEA 100034666
6 GO:0016716      IEA 100034666

So just looking at what you pasted into the question box should have clued you in to what the problem was. I would think getting rid of any rows with an NA for EVIDENCE should do the trick.

Entering edit mode

Yes, I know but my problem is that I don't know how (i'm newbie). So I tried

goframeData<- goframeData[-which(row(goframeData)=="<NA>"),]


goframeData<- goframeData[-which(goframeData$EVIDENCE=="<NA>"),]

But my goframeData is empty.

Finally I proceed in a different way (download file, edit with excel and import again).




Entering edit mode

There are two lessons here. First, when you ask a question, be sure that are as precise as possible. You asked for help with the error, rather than saying that you understand that there are NA values, but don't know how to get rid of them. So I answered the question you posed, rather than the question you had.

Second, in R, nothing is equal to NA (you cannot be equal to something that is not available), so there is a function, that you can use to test for NA values.

> vec <- c(1,2,3,NA,4,5,6)
> which(vec == "NA")
> which(vec == NA)
> which(
[1] 4

But this is a bit of a chicken/egg problem, no? How do you figure out what you need if you don't know what you need? There are two methods in my experience that are useful in this context. The first is apropos, which will give you all the functions that match what you query on. Do note that this is one of the rare instances where R is case-insensitive (so you can search on "na", "NA","Na" or even "nA" if you want to be crazy), and also that 'na' is going to match many things, so it may be a bit of an eyeballometric exercise. For legibility, I am going to cut out all the extra returned values.

> apropos("na"
[79] ""                                
 [80] "<-"                              
 [81] ""                     
 [82] "<-.default"                      
 [83] "<-.factor" 

So hypothetically you could have used apropos to find, and went with that.

An even more powerful method is to use Google. Almost any conceivable R question has already been asked and answered somewhere on the line, and a query of the form 'R remove NA rows' will almost surely come up with multiple reasonable links.


Entering edit mode

Hi James,

You were right, my problem is that I was not precise as possible. thank for your advice. I didn't know for the useful "apropos" . I do used Google and found some answers BUT as I'm neewbie, I think it was written <NA> and not NA alone. Of course, now everything work fine except that I lost my time and yours...Sorry




Login before adding your answer.

Traffic: 343 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6