Question: how to choose annotation file in AnnotationHub web server
0
gravatar for 15958021290
25 days ago by
159580212900 wrote:

Hey, guys.I met a problem about choice of annotation file in AnnotationHub web server. I was up to do GO enrichemnt analysis of Oryza sativa. I find 3 latest(2019-10-29) annotation file in AnnotationHub web server in https://annotationhub.bioconductor.org/species/Oryza%20sativa .

AH75915 35257 EntrezID gene 346 unqiue GO term

AH75916 35574 EntrezID gene 346 unqiue GO term

AH75917 35257 EntrezID gene 345 unqiue GO term

They are same files when you use AnnotationHub in R(3.6.1) I download all 3 annotation file .And check the total EntrezID gene and total unique GO term. There are actually difference in two prarameter of 3 file. But when I use same gene list to do the GO enrichemnt. The result is mostly same But I don't know which benchmark I can baed on. More gene numbers ,more better? or More unique GO term ,more better? Hope someone can help me Thanks in advance!

annotation annotationhub • 74 views
ADD COMMENTlink modified 24 days ago • written 25 days ago by 159580212900

May I ask what code you used to determine the differences?
For instance when I grab the ENTREZID column I get 35257 for all three:

one = ah[["AH75915"]]
two = ah[["AH75916"]]
three = ah[["AH75917"]]

> length(keys(one, keytype="ENTREZID"))
[1] 35257
> length(keys(two, keytype="ENTREZID"))
[1] 35257
> length(keys(three, keytype="ENTREZID"))
[1] 35257

I'll investigate the code on how the files were generated to see if they were generated differently but firstly please provide the code you used to discover the differences.

ADD REPLYlink written 24 days ago by shepherl ♦♦ 1.7k

Looks pretty identical to me:

> d.f <- do.call(rbind, lapply(dbListTables(dbconn(one)), 
        function(x) sapply(c(one, two, three),
           function(y) dbGetQuery(dbconn(y), paste0("select count(*) from ", x, ";")))))
> rownames(d.f) <- dbListTables(dbconn(one))
> colnames(d.f) <- c("one","two","three")
> d.f
             one    two    three 
accessions   208558 208558 208558
alias        50651  50651  50651 
chromosomes  35091  35091  35091 
entrez_genes 35257  35257  35257 
gene_info    35257  35257  35257 
genes        35257  35257  35257 
go           7995   7995   7995  
go_all       83477  83477  83477 
go_bp        622    622    622   
go_bp_all    11248  11248  11248 
go_cc        6607   6607   6607  
go_cc_all    66906  66906  66906 
go_mf        766    766    766   
go_mf_all    5323   5323   5323  
map_counts   0      0      0     
map_metadata 0      0      0     
metadata     8      8      8     
pubmed       20462  20462  20462 
refseq       96029  96029  96029 

There could be some differences there, but I can't imagine the row counts would be identical for every table if what was in those tables is different? Howeva

> sapply(dbListTables(dbconn(one)), function(x) all.equal(dbGetQuery(dbconn(one), paste("select * from", x))[,2], dbGetQuery(dbconn(two), paste("select * from", x))[,2]))
           accessions                 alias           chromosomes 
               "TRUE"                "TRUE"                "TRUE" 
         entrez_genes             gene_info                 genes 
               "TRUE"                "TRUE"                "TRUE" 
                   go                go_all                 go_bp 
               "TRUE"                "TRUE"                "TRUE" 
            go_bp_all                 go_cc             go_cc_all 
               "TRUE"                "TRUE"                "TRUE" 
                go_mf             go_mf_all            map_counts 
               "TRUE"                "TRUE"                "TRUE" 
         map_metadata              metadata                pubmed 
               "TRUE" "2 string mismatches"                "TRUE" 
               refseq 
               "TRUE" 
> sapply(dbListTables(dbconn(one)), function(x) all.equal(dbGetQuery(dbconn(one), paste("select * from", x))[,2], dbGetQuery(dbconn(three), paste("select * from", x))[,2]))
           accessions                 alias           chromosomes 
               "TRUE"                "TRUE"                "TRUE" 
         entrez_genes             gene_info                 genes 
               "TRUE"                "TRUE"                "TRUE" 
                   go                go_all                 go_bp 
               "TRUE"                "TRUE"                "TRUE" 
            go_bp_all                 go_cc             go_cc_all 
               "TRUE"                "TRUE"                "TRUE" 
                go_mf             go_mf_all            map_counts 
               "TRUE"                "TRUE"                "TRUE" 
         map_metadata              metadata                pubmed 
               "TRUE" "2 string mismatches"                "TRUE" 
               refseq 
               "TRUE" 
> 
ADD REPLYlink written 24 days ago by James W. MacDonald52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 132 users visited in the last hour