Error in GOFrame due to NA for unsupported organism (Gene Ontology)
1
0
Entering edit mode
cagenet34 ▴ 20
@cagenet34-10910
Last seen 6 weeks ago
Toulouse, France, INRA

Hi all,

I'm using reportingTools following RNAseq differential analysis and am having trouble with adding annotation for GO using unsupported model organism (ie sheep, org.Oa.eg.db).

Ovis aries (sheep) is not supported by "AnnotationForge" package, so I'm following the instructions from "How To Use GOstats and Category to do Hypergeometric testing with unsupported model organisms by M. Carlson".

I obtained the following error when I'm trying to build GOFrame (with yellow marker in the code below). This seems to be cause by "<NA>" .

Can someone help me to deal with this error ?

Thanks in advance

Carine

> rm(list=ls())
> library("GOstats", lib.loc="~/R/win-library/3.3")
> hub<-AnnotationHub()
snapshotDate(): 2016-06-06
> query(hub,c("Ovis aries","OrgDb"))
AnnotationHub with 1 record
# snapshotDate(): 2016-06-06 
# names(): AH48021
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Ovis aries
# $rdataclass: OrgDb
# $title: org.Ovis_aries.eg.sqlite
# $description: NCBI gene ID based annotations about Ovis_aries
# $taxonomyid: 9940
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uniprot.org/pub/databases/unip...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation 
# retrieve record with 'object[["AH48021"]]' 
> sheep<-hub[["AH48021"]]
loading from cache 'C:/Users/cagenet/Documents/AppData/.AnnotationHub/54327'
> keytypes(sheep)
 [1] "ACCNUM"      "ALIAS"       "ENSEMBL"     "ENTREZID"    "EVIDENCE"    "EVIDENCEALL"
 [7] "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"    "ONTOLOGYALL"
[13] "PMID"        "REFSEQ"      "SYMBOL"      "UNIGENE"    
> columns(sheep)
 [1] "ACCNUM"      "ALIAS"       "CHR"         "ENSEMBL"     "ENTREZID"    "EVIDENCE"   
 [7] "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"       "ONTOLOGY"   
[13] "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"      "UNIGENE"    
> sheepEID<-(keys(sheep,"ENTREZID"))# all ENTREZID
> sheep.eg.GO<-select(sheep, sheepEID, c("GO","EVIDENCE"),"ENTREZID")
'select()' returned 1:many mapping between keys and columns
> goframeData=data.frame(sheep.eg.GO[,c(2,3,1)])#on inverse l'ordre
> head(goframeData)
          GO EVIDENCE  ENTREZID
1       <NA>     <NA> 100034665
2 GO:0030669      IEA 100034666
3 GO:0010008      IEA 100034666
4 GO:0033162      IEA 100034666
5 GO:0005507      IEA 100034666
6 GO:0016716      IEA 100034666
> goFrame=GOFrame(goframeData,organism="Ovis aries")
Error in .testGOFrame(x, organism) : invalid GO Evidence codes: 'NA'
> goframeData<- goframeData[-which(row(goframeData)=="\<NA>"),]
Error: '\<' is an unrecognized escape in character string starting ""\<"
> goframeData<- goframeData[-which(row(goframeData)=="/<NA>"),]
> 
> head(goframeData)
[1] GO       EVIDENCE ENTREZID
<0 lignes> (ou 'row.names' de longueur nulle)
> sessionInfo()
R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GO.db_3.3.0                RSQLite_1.0.0              DBI_0.4-1                 
 [4] AnnotationForge_1.14.2     org.Mm.eg.db_3.3.0         AnnotationHub_2.4.2       
 [7] GOstats_2.38.1             graph_1.50.0               Category_2.38.0           
[10] Matrix_1.2-6               AnnotationDbi_1.34.3       edgeR_3.14.0              
[13] limma_3.28.14              HTSFilter_1.12.0           ReportingTools_2.12.2     
[16] knitr_1.13                 biomaRt_2.28.0             DESeq2_1.12.3             
[19] SummarizedExperiment_1.2.3 Biobase_2.32.0             GenomicRanges_1.24.2      
[22] GenomeInfoDb_1.8.2         IRanges_2.6.1              S4Vectors_0.10.1          
[25] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    splines_3.3.1                 R.utils_2.3.0                
 [4] Formula_1.2-1                 shiny_0.13.2                  interactiveDisplayBase_1.10.3
 [7] latticeExtra_0.6-28           RBGL_1.48.1                   BSgenome_1.40.1              
[10] Rsamtools_1.24.0              lattice_0.20-33               biovizBase_1.20.0            
[13] chron_2.3-47                  digest_0.6.9                  RColorBrewer_1.1-2           
[16] XVector_0.12.0                colorspace_1.2-6              ggbio_1.20.1                 
[19] R.oo_1.20.0                   htmltools_0.3.5               httpuv_1.3.3                 
[22] plyr_1.8.4                    OrganismDbi_1.14.1            GSEABase_1.34.0              
[25] XML_3.98-1.4                  genefilter_1.54.2             zlibbioc_1.18.0              
[28] xtable_1.8-2                  scales_0.4.0                  BiocParallel_1.6.2           
[31] annotate_1.50.0               ggplot2_2.1.0                 PFAM.db_3.3.0                
[34] GenomicFeatures_1.24.3        nnet_7.3-12                   mime_0.4                     
[37] survival_2.39-5               magrittr_1.5                  evaluate_0.9                 
[40] R.methodsS3_1.7.1             GGally_1.2.0                  hwriter_1.3.2                
[43] foreign_0.8-66                BiocInstaller_1.22.3          rsconnect_0.4.3              
[46] tools_3.3.1                   data.table_1.9.6              stringr_1.0.0                
[49] munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.4                
[52] ensembldb_1.4.7               Biostrings_2.40.2             DESeq_1.24.0                 
[55] grid_3.3.1                    RCurl_1.95-4.8                dichromat_2.0-0              
[58] VariantAnnotation_1.18.1      bitops_1.0-6                  gtable_0.2.0                 
[61] curl_0.9.7                    reshape_0.8.5                 reshape2_1.4.1               
[64] R6_2.1.2                      GenomicAlignments_1.8.3       gridExtra_2.2.1              
[67] rtracklayer_1.32.1            Hmisc_3.17-4                  stringi_1.1.1                
[70] Rcpp_0.12.5                   geneplotter_1.50.0            rpart_4.1-10                 
[73] acepack_1.3-3.3              
reportingtools annotationhub gostats • 762 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The error is intended to be self-explanatory. It says

Error in .testGOFrame(x, organism) : invalid GO Evidence codes: 'NA'

And as you already showed, there is an NA in (at the very least) the first row!

> head(goframeData)
          GO EVIDENCE  ENTREZID
1       <NA>     <NA> 100034665
2 GO:0030669      IEA 100034666
3 GO:0010008      IEA 100034666
4 GO:0033162      IEA 100034666
5 GO:0005507      IEA 100034666
6 GO:0016716      IEA 100034666

So just looking at what you pasted into the question box should have clued you in to what the problem was. I would think getting rid of any rows with an NA for EVIDENCE should do the trick.

ADD COMMENT
0
Entering edit mode

Yes, I know but my problem is that I don't know how (i'm newbie). So I tried

goframeData<- goframeData[-which(row(goframeData)=="<NA>"),]

or

goframeData<- goframeData[-which(goframeData$EVIDENCE=="<NA>"),]

But my goframeData is empty.

Finally I proceed in a different way (download file, edit with excel and import again).

Carine

 

 

ADD REPLY
0
Entering edit mode

There are two lessons here. First, when you ask a question, be sure that are as precise as possible. You asked for help with the error, rather than saying that you understand that there are NA values, but don't know how to get rid of them. So I answered the question you posed, rather than the question you had.

Second, in R, nothing is equal to NA (you cannot be equal to something that is not available), so there is a function is.na, that you can use to test for NA values.

> vec <- c(1,2,3,NA,4,5,6)
> which(vec == "NA")
integer(0)
> which(vec == NA)
integer(0)
> which(is.na(vec))
[1] 4
> is.na(vec)
[1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

But this is a bit of a chicken/egg problem, no? How do you figure out what you need if you don't know what you need? There are two methods in my experience that are useful in this context. The first is apropos, which will give you all the functions that match what you query on. Do note that this is one of the rare instances where R is case-insensitive (so you can search on "na", "NA","Na" or even "nA" if you want to be crazy), and also that 'na' is going to match many things, so it may be a bit of an eyeballometric exercise. For legibility, I am going to cut out all the extra returned values.

> apropos("na"
<snip>
[79] "is.na"                                
 [80] "is.na<-"                              
 [81] "is.na.data.frame"                     
 [82] "is.na<-.default"                      
 [83] "is.na<-.factor" 
<snip>

So hypothetically you could have used apropos to find is.na, and went with that.

An even more powerful method is to use Google. Almost any conceivable R question has already been asked and answered somewhere on the line, and a query of the form 'R remove NA rows' will almost surely come up with multiple reasonable links.

 

ADD REPLY
0
Entering edit mode

Hi James,

You were right, my problem is that I was not precise as possible. thank for your advice. I didn't know for the useful "apropos" . I do used Google and found some answers BUT as I'm neewbie, I think it was written <NA> and not NA alone. Of course, now everything work fine except that I lost my time and yours...Sorry

 

 

ADD REPLY

Login before adding your answer.

Traffic: 185 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6