ENSEMBL causing issues replicating previous results
2
0
Entering edit mode
snguyen268 • 0
@snguyen268-13912
Last seen 6.6 years ago

Hi all,

I have a quite novice question:

I am aware that ENSEMBL has recently release a new version (ENSEMBL 91). Since then I have been having trouble reproducing my results from the GO analysis (using topGO package). Potentially, this could be due to me upgrading Bioconductor as well (but I think it is less likely).

I am attaching here the 2 sets of results (original and new). For the new results, I even try to use the archived version of ENSEMBL. However, somehow the results are still not the same.

For example, for the annotated column, the term "mRNA catabolic process" has 218 terms in the original but it has 313 terms in the new result ??? This must be due to ENSEMBL database because they both have the same input data files.

Anyone has any advice/suggestion?

    bm <- useMart("ensembl", host = "http://aug2017.archive.ensembl.org", 
    dataset = "hsapiens_gene_ensembl")
    EG2GO <- getBM(mart=bm, 
    attributes=c('ensembl_gene_id','external_gene_name','go_id'))
    EG2GO <- EG2GO[EG2GO$go_id != '',]
    name2GO <- by(EG2GO$go_id,EG2GO$external_gene_name,function(x) 
    as.character(x))
    
    geneuniverse<-as.character(rownames(LN_cells_no_zero))
    geneofinterest<-as.character(ductruong_l0_norm$Significant_Genes)
    genelist<-factor(as.integer(geneuniverse %in% geneofinterest))
    names(genelist)<-geneuniverse
    godata<-new("topGOdata", description = "Significant_Genes_L0norm", 
    allGenes = genelist, ontology="BP", nodeSize = 10, annot = 
    annFUN.gene2GO, gene2GO = name2GO)

New data

Original

R ENSEMBL topGO • 1.4k views
ADD COMMENT
0
Entering edit mode

I am just having the same issue: can't do run my old code. no dataset can be found or loaded. Hope anyone can help

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States

See A: biomaRt is not working

ADD COMMENT
0
Entering edit mode
snguyen268 • 0
@snguyen268-13912
Last seen 6.6 years ago

Hi all,

Thank you for your answer.

I have upgraded my biomaRt to the newest version. It seems like somehow the GO annotation is different that affects my results. Does ENSEMBL modify their archive servers as well?

ADD COMMENT
0
Entering edit mode

I don't think that Ensembl change the content of the archive sites, otherwise what's the point in maintaining the archives.

I think this may be a a more insidious problem related to how the BioMart server responds to 'really big' queries.  I've encountered problems in the past where the query essentially times out and returns whatever it's retrieved up to that point, but there's no obvious indication this occurred.  This only happens when you're getting a large amount of data, but GO terms for all genes (since you're not using a filter) might well be large enough to trigger it.  

I'm currently trying to think of the best way to check whether it's affected you, and to prevent it happening via changes to biomaRt.

ADD REPLY
0
Entering edit mode

Just to confirm that we do not update the Ensembl archive sites as they are snapshots of a given release.

Kind Regards,

Thomas

ADD REPLY
0
Entering edit mode

Hi all,

Thanks for your reply. I don't think it is the Ensembl archive sites either because most of the results are pretty much similar to what I got before.

However, there are differences that I can't really reconcile.

For example, the term "mRNA catabolic process" had 218 annotated term before, now it has 313 terms. This to me seems like something about the query rather anything that has to do with my data.

ADD REPLY

Login before adding your answer.

Traffic: 1070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6