Search
Question: ENSEMBL causing issues replicating previous results
0
gravatar for snguyen268
8 months ago by
snguyen2680
snguyen2680 wrote:

Hi all,

I have a quite novice question:

I am aware that ENSEMBL has recently release a new version (ENSEMBL 91). Since then I have been having trouble reproducing my results from the GO analysis (using topGO package). Potentially, this could be due to me upgrading Bioconductor as well (but I think it is less likely).

I am attaching here the 2 sets of results (original and new). For the new results, I even try to use the archived version of ENSEMBL. However, somehow the results are still not the same.

For example, for the annotated column, the term "mRNA catabolic process" has 218 terms in the original but it has 313 terms in the new result ??? This must be due to ENSEMBL database because they both have the same input data files.

Anyone has any advice/suggestion?

    bm <- useMart("ensembl", host = "http://aug2017.archive.ensembl.org", 
    dataset = "hsapiens_gene_ensembl")
    EG2GO <- getBM(mart=bm, 
    attributes=c('ensembl_gene_id','external_gene_name','go_id'))
    EG2GO <- EG2GO[EG2GO$go_id != '',]
    name2GO <- by(EG2GO$go_id,EG2GO$external_gene_name,function(x) 
    as.character(x))
    
    geneuniverse<-as.character(rownames(LN_cells_no_zero))
    geneofinterest<-as.character(ductruong_l0_norm$Significant_Genes)
    genelist<-factor(as.integer(geneuniverse %in% geneofinterest))
    names(genelist)<-geneuniverse
    godata<-new("topGOdata", description = "Significant_Genes_L0norm", 
    allGenes = genelist, ontology="BP", nodeSize = 10, annot = 
    annFUN.gene2GO, gene2GO = name2GO)

New data

Original

ADD COMMENTlink modified 7 months ago • written 8 months ago by snguyen2680

I am just having the same issue: can't do run my old code. no dataset can be found or loaded. Hope anyone can help

ADD REPLYlink written 8 months ago by shenwei13760
0
gravatar for Martin Morgan
8 months ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

See A: biomaRt is not working

ADD COMMENTlink written 8 months ago by Martin Morgan ♦♦ 22k
0
gravatar for snguyen268
7 months ago by
snguyen2680
snguyen2680 wrote:

Hi all,

Thank you for your answer.

I have upgraded my biomaRt to the newest version. It seems like somehow the GO annotation is different that affects my results. Does ENSEMBL modify their archive servers as well?

ADD COMMENTlink written 7 months ago by snguyen2680

I don't think that Ensembl change the content of the archive sites, otherwise what's the point in maintaining the archives.

I think this may be a a more insidious problem related to how the BioMart server responds to 'really big' queries.  I've encountered problems in the past where the query essentially times out and returns whatever it's retrieved up to that point, but there's no obvious indication this occurred.  This only happens when you're getting a large amount of data, but GO terms for all genes (since you're not using a filter) might well be large enough to trigger it.  

I'm currently trying to think of the best way to check whether it's affected you, and to prevent it happening via changes to biomaRt.

ADD REPLYlink written 7 months ago by Mike Smith2.8k

Just to confirm that we do not update the Ensembl archive sites as they are snapshots of a given release.

Kind Regards,

Thomas

ADD REPLYlink written 7 months ago by Thomas Maurel740

Hi all,

Thanks for your reply. I don't think it is the Ensembl archive sites either because most of the results are pretty much similar to what I got before.

However, there are differences that I can't really reconcile.

For example, the term "mRNA catabolic process" had 218 annotated term before, now it has 313 terms. This to me seems like something about the query rather anything that has to do with my data.

ADD REPLYlink written 7 months ago by snguyen2680
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 322 users visited in the last hour