Question: ENSEMBL causing issues replicating previous results
0
15 months ago by
snguyen2680 wrote:

Hi all,

I have a quite novice question:

I am aware that ENSEMBL has recently release a new version (ENSEMBL 91). Since then I have been having trouble reproducing my results from the GO analysis (using topGO package). Potentially, this could be due to me upgrading Bioconductor as well (but I think it is less likely).

I am attaching here the 2 sets of results (original and new). For the new results, I even try to use the archived version of ENSEMBL. However, somehow the results are still not the same.

For example, for the annotated column, the term "mRNA catabolic process" has 218 terms in the original but it has 313 terms in the new result ??? This must be due to ENSEMBL database because they both have the same input data files.

bm <- useMart("ensembl", host = "http://aug2017.archive.ensembl.org",
dataset = "hsapiens_gene_ensembl")
EG2GO <- getBM(mart=bm,
attributes=c('ensembl_gene_id','external_gene_name','go_id'))
EG2GO <- EG2GO[EG2GO$go_id != '',] name2GO <- by(EG2GO$go_id,EG2GO$external_gene_name,function(x) as.character(x)) geneuniverse<-as.character(rownames(LN_cells_no_zero)) geneofinterest<-as.character(ductruong_l0_norm$Significant_Genes)
genelist<-factor(as.integer(geneuniverse %in% geneofinterest))
names(genelist)<-geneuniverse
godata<-new("topGOdata", description = "Significant_Genes_L0norm",
allGenes = genelist, ontology="BP", nodeSize = 10, annot =
annFUN.gene2GO, gene2GO = name2GO)

topgo R ensembl • 408 views
modified 15 months ago • written 15 months ago by snguyen2680

I am just having the same issue: can't do run my old code. no dataset can be found or loaded. Hope anyone can help

Answer: ENSEMBL causing issues replicating previous results
0
15 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:
Answer: ENSEMBL causing issues replicating previous results
0
15 months ago by
snguyen2680 wrote:

Hi all,

I have upgraded my biomaRt to the newest version. It seems like somehow the GO annotation is different that affects my results. Does ENSEMBL modify their archive servers as well?

I don't think that Ensembl change the content of the archive sites, otherwise what's the point in maintaining the archives.

I think this may be a a more insidious problem related to how the BioMart server responds to 'really big' queries.  I've encountered problems in the past where the query essentially times out and returns whatever it's retrieved up to that point, but there's no obvious indication this occurred.  This only happens when you're getting a large amount of data, but GO terms for all genes (since you're not using a filter) might well be large enough to trigger it.

I'm currently trying to think of the best way to check whether it's affected you, and to prevent it happening via changes to biomaRt.

Just to confirm that we do not update the Ensembl archive sites as they are snapshots of a given release.

Kind Regards,

Thomas

Hi all,

Thanks for your reply. I don't think it is the Ensembl archive sites either because most of the results are pretty much similar to what I got before.

However, there are differences that I can't really reconcile.

For example, the term "mRNA catabolic process" had 218 annotated term before, now it has 313 terms. This to me seems like something about the query rather anything that has to do with my data.