getBM
1
0
Entering edit mode
@andreia-fonseca-3796
Last seen 7.2 years ago
Dear Forum, I am trying to get the entrezgene information for a list of ensembl_gene_id, the command that I am using is test3<-getBM(attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol"), filters = "ensembl_gene_id", values =dataHT[,1], mart=human) the list dataHT[,1] has 10,987 unique ids and the first ensembl_gene_id is ENSG00000000003 which corresponds to entrezgene 7105 the result has 18533 rows and the first value is not 7105, showing that the query is not happening by order in the filter vector and I am getting too many hits, with doubled lines with almost the same information, and not by the order of the query vector (see result below), can someone, help me with this? head(test3) ensembl_gene_id entrezgene hgnc_symbol 1 ENSG00000198692 9086 EIF1AY 2 ENSG00000198692 NA EIF1AY 3 ENSG00000101557 9097 USP14 4 ENSG00000079134 9984 THOC1 5 ENSG00000158270 81035 COLEC12 6 ENSG00000079101 27098 CLUL1 Thanks Andreia [[alternative HTML version deleted]]
• 2.2k views
ADD COMMENT
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 9.7 years ago
Dear Andreia, The result of a biomaRt query is indeed not sorted according to the input. There is not a one to one mapping for all three identifiers you are querying for. This partially explains the expansion in your result. In addition, Ensembl annotates everything to the transcript and some transcript ids are mapped to hgnc_symbols/entrezgene ids and others not (that's why you get repetitive information and NAs). For example in your result you have: ensembl_gene_id entrezgene hgnc_symbol > 1 ENSG00000198692 9086 EIF1AY > 2 ENSG00000198692 NA EIF1AY This looks like repetitive information and you get an NA but, if you would add the ensembl_transcript_id to your query you would get: getBM(attributes = c("ensembl_gene_id","ensembl_transcript_id","entrezgene","hgnc_symbol" ),filters = "ensembl_gene_id", values ="ENSG00000198692", mart=human) ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol 1 ENSG00000198692 ENST00000382772 NA EIF1AY 2 ENSG00000198692 ENST00000361365 9086 EIF1AY As you see the transcript ENST00000382772 was not associated with the entrezgene id 9086 but transcript ENST00000361365 of that same gene was. To avoid getting NAs and duplication, I would do your query in two steps and combine the results in R. 1) get a map from ensembl_gene_id to entrezgene map1 = getBM(attributes = c("ensembl_gene_id","entrezgene"),filters=c("with_entrezgene","ensembl _gene_id"), values=list(TRUE,dataHT[,1]),mart=human) 2) get a map from ensembl_gene_id to hgnc_symbol map2 = getBM(attributes = c("ensembl_gene_id","hgnc_symbol"),filters=c("with_hgnc","ensembl_gene _id"), values=list(TRUE,dataHT[,1]),mart=human) Cheers, Steffen > Dear Forum, > > I am trying to get the entrezgene information for a list of > ensembl_gene_id, > the command that I am using is > test3<-getBM(attributes = c("ensembl_gene_id", > "entrezgene","hgnc_symbol"), > filters = "ensembl_gene_id", values =dataHT[,1], mart=human) > > the list dataHT[,1] has 10,987 unique ids and the first ensembl_gene_id is > ENSG00000000003 which corresponds to entrezgene 7105 > the result has 18533 rows and the first value is not 7105, showing that > the > query is not happening by order in the filter vector and I am getting too > many hits, with doubled lines with almost the same information, and not by > the order of the query vector (see result below), can someone, help me > with > this? > head(test3) > ensembl_gene_id entrezgene hgnc_symbol > 1 ENSG00000198692 9086 EIF1AY > 2 ENSG00000198692 NA EIF1AY > 3 ENSG00000101557 9097 USP14 > 4 ENSG00000079134 9984 THOC1 > 5 ENSG00000158270 81035 COLEC12 > 6 ENSG00000079101 27098 CLUL1 > > Thanks > Andreia > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6