Question: getBM
0
gravatar for Andreia Fonseca
10.0 years ago by
Andreia Fonseca810 wrote:
Dear Forum, I am trying to get the entrezgene information for a list of ensembl_gene_id, the command that I am using is test3<-getBM(attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol"), filters = "ensembl_gene_id", values =dataHT[,1], mart=human) the list dataHT[,1] has 10,987 unique ids and the first ensembl_gene_id is ENSG00000000003 which corresponds to entrezgene 7105 the result has 18533 rows and the first value is not 7105, showing that the query is not happening by order in the filter vector and I am getting too many hits, with doubled lines with almost the same information, and not by the order of the query vector (see result below), can someone, help me with this? head(test3) ensembl_gene_id entrezgene hgnc_symbol 1 ENSG00000198692 9086 EIF1AY 2 ENSG00000198692 NA EIF1AY 3 ENSG00000101557 9097 USP14 4 ENSG00000079134 9984 THOC1 5 ENSG00000158270 81035 COLEC12 6 ENSG00000079101 27098 CLUL1 Thanks Andreia [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENTlink modified 10.0 years ago by steffen@stat.Berkeley.EDU600 • written 10.0 years ago by Andreia Fonseca810
Answer: getBM
0
gravatar for steffen@stat.Berkeley.EDU
10.0 years ago by
Dear Andreia, The result of a biomaRt query is indeed not sorted according to the input. There is not a one to one mapping for all three identifiers you are querying for. This partially explains the expansion in your result. In addition, Ensembl annotates everything to the transcript and some transcript ids are mapped to hgnc_symbols/entrezgene ids and others not (that's why you get repetitive information and NAs). For example in your result you have: ensembl_gene_id entrezgene hgnc_symbol > 1 ENSG00000198692 9086 EIF1AY > 2 ENSG00000198692 NA EIF1AY This looks like repetitive information and you get an NA but, if you would add the ensembl_transcript_id to your query you would get: getBM(attributes = c("ensembl_gene_id","ensembl_transcript_id","entrezgene","hgnc_symbol" ),filters = "ensembl_gene_id", values ="ENSG00000198692", mart=human) ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol 1 ENSG00000198692 ENST00000382772 NA EIF1AY 2 ENSG00000198692 ENST00000361365 9086 EIF1AY As you see the transcript ENST00000382772 was not associated with the entrezgene id 9086 but transcript ENST00000361365 of that same gene was. To avoid getting NAs and duplication, I would do your query in two steps and combine the results in R. 1) get a map from ensembl_gene_id to entrezgene map1 = getBM(attributes = c("ensembl_gene_id","entrezgene"),filters=c("with_entrezgene","ensembl _gene_id"), values=list(TRUE,dataHT[,1]),mart=human) 2) get a map from ensembl_gene_id to hgnc_symbol map2 = getBM(attributes = c("ensembl_gene_id","hgnc_symbol"),filters=c("with_hgnc","ensembl_gene _id"), values=list(TRUE,dataHT[,1]),mart=human) Cheers, Steffen > Dear Forum, > > I am trying to get the entrezgene information for a list of > ensembl_gene_id, > the command that I am using is > test3<-getBM(attributes = c("ensembl_gene_id", > "entrezgene","hgnc_symbol"), > filters = "ensembl_gene_id", values =dataHT[,1], mart=human) > > the list dataHT[,1] has 10,987 unique ids and the first ensembl_gene_id is > ENSG00000000003 which corresponds to entrezgene 7105 > the result has 18533 rows and the first value is not 7105, showing that > the > query is not happening by order in the filter vector and I am getting too > many hits, with doubled lines with almost the same information, and not by > the order of the query vector (see result below), can someone, help me > with > this? > head(test3) > ensembl_gene_id entrezgene hgnc_symbol > 1 ENSG00000198692 9086 EIF1AY > 2 ENSG00000198692 NA EIF1AY > 3 ENSG00000101557 9097 USP14 > 4 ENSG00000079134 9984 THOC1 > 5 ENSG00000158270 81035 COLEC12 > 6 ENSG00000079101 27098 CLUL1 > > Thanks > Andreia > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENTlink written 10.0 years ago by steffen@stat.Berkeley.EDU600
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 150 users visited in the last hour