Biomart annotation
1
0
Entering edit mode
@jarod_v6liberoit-6654
Last seen 5.8 years ago
Italy
Hi there! I need to convert all my ensemble gene id on hgnc symbols and entrez gene id. My ensemble release s the n?72. I use this script: dif.DEs$ensembl <- sapply(strsplit(rownames(dif.DEs),split="nn+"),"[",1) #use biomart library( "biomaRt" ) ensembl = useMart( host="jun2013.archive.ensembl.org",biomart=" ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = "ensembl_gene_id", values = dif.DEs$ensembl, mart = ensembl ) idx <- match(dif.DEs$ensembl, genemap$ensembl_gene_id ) dif.DEs$entrez <- genemap$entrezgene[ idx ] dif.DEs$hgnc_symbol <- genemap$hgnc_symbol[ idx ] dif.DEs$entrez [1] 25870 89869 54465 2840 NA 80230 57673 123264 NA NA [11] NA 392364 NA NA NA NA 221883 NA NA NA [21] NA NA NA NA Many of that are as NA. How can I annotate all the genes? thanks in advance for any help!
annotate convert annotate convert • 3.1k views
ADD COMMENT
0
Entering edit mode
John Blischak ▴ 190
@john-blischak-6562
Last seen 7.0 years ago
Hi, I don't think there is a problem. Ensembl includes annotations for some genes that Entrez does not. An example that I found using the code below, RN7SL163P is a pseudogene included in Ensembl (ENSG00000266195) but not in Entrez. If you are not interested in pseudogenes, this should not be an issue for you analysis. library("biomaRt") ensembl <- useMart(host = "jun2013.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") ens_id <- getBM(attributes = "ensembl_gene_id", mart = ensembl) entrez_id <- getBM(attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = "ensembl_gene_id", values = ens_id$ensembl_gene_id, mart = ensembl) dim(entrez_id) # [1] 66211 3 sumis.na(entrez_id$entrezgene)) # [1] 36545 head(entrez_id) # ensembl_gene_id entrezgene hgnc_symbol # 1 ENSG00000266195 NA RN7SL163P # 2 ENSG00000264715 NA # 3 ENSG00000264800 100422895 MIR4294 # 4 ENSG00000207390 NA # 5 ENSG00000206995 NA # 6 ENSG00000266431 100847076 MIR5580 John On Fri, Jul 18, 2014 at 4:27 AM, jarod_v6@libero.it <jarod_v6@libero.it> wrote: > Hi there! > I need to convert all my ensemble gene id on hgnc symbols and entrez gene > id. > My ensemble release s the n°72. > > > I use this script: > > dif.DEs$ensembl <- sapply(strsplit(rownames(dif.DEs),split="nn+"),"[",1) > #use biomart > library( "biomaRt" ) > ensembl = useMart( host="jun2013.archive.ensembl.org",biomart=" > ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) > genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", > "hgnc_symbol"), > filters = "ensembl_gene_id", > values = dif.DEs$ensembl, > mart = ensembl ) > idx <- match(dif.DEs$ensembl, genemap$ensembl_gene_id ) > dif.DEs$entrez <- genemap$entrezgene[ idx ] > dif.DEs$hgnc_symbol <- genemap$hgnc_symbol[ idx ] > > > dif.DEs$entrez > [1] 25870 89869 54465 2840 NA 80230 57673 123264 NA NA > [11] NA 392364 NA NA NA NA 221883 NA NA NA > [21] NA NA NA NA > > Many of that are as NA. How can I annotate all the genes? > thanks in advance for any help! > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Also, you can specify the biotype filter so you only retrieve protein coding genes by changing your query to: entrez_id <- getBM(attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = c("ensembl_gene_id","biotype"), values = list(ens_id$ensembl_gene_id,"protein_coding"), mart = ensembl) This wil remove most of the NAs for entrez gene ids. sumis.na(entrez_id$entrezgene)) [1] 1432 Best, Steffen On Fri, Jul 18, 2014 at 10:00 AM, John Blischak <jdblischak@gmail.com> wrote: > Hi, > > I don't think there is a problem. Ensembl includes annotations for some > genes that Entrez does not. An example that I found using the code below, > RN7SL163P is a pseudogene included in Ensembl (ENSG00000266195) but not in > Entrez. If you are not interested in pseudogenes, this should not be an > issue for you analysis. > > library("biomaRt") > ensembl <- useMart(host = "jun2013.archive.ensembl.org", > biomart = "ENSEMBL_MART_ENSEMBL", > dataset = "hsapiens_gene_ensembl") > ens_id <- getBM(attributes = "ensembl_gene_id", mart = ensembl) > entrez_id <- getBM(attributes = c("ensembl_gene_id", "entrezgene", > "hgnc_symbol"), > filters = "ensembl_gene_id", > values = ens_id$ensembl_gene_id, > mart = ensembl) > dim(entrez_id) > # [1] 66211 3 > sumis.na(entrez_id$entrezgene)) > # [1] 36545 > head(entrez_id) > # ensembl_gene_id entrezgene hgnc_symbol > # 1 ENSG00000266195 NA RN7SL163P > # 2 ENSG00000264715 NA > # 3 ENSG00000264800 100422895 MIR4294 > # 4 ENSG00000207390 NA > # 5 ENSG00000206995 NA > # 6 ENSG00000266431 100847076 MIR5580 > > John > > > On Fri, Jul 18, 2014 at 4:27 AM, jarod_v6@libero.it <jarod_v6@libero.it> > wrote: > > > Hi there! > > I need to convert all my ensemble gene id on hgnc symbols and entrez > gene > > id. > > My ensemble release s the n°72. > > > > > > I use this script: > > > > dif.DEs$ensembl <- sapply(strsplit(rownames(dif.DEs),split="nn+"),"[",1) > > #use biomart > > library( "biomaRt" ) > > ensembl = useMart( host="jun2013.archive.ensembl.org",biomart=" > > ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) > > genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", > > "hgnc_symbol"), > > filters = "ensembl_gene_id", > > values = dif.DEs$ensembl, > > mart = ensembl ) > > idx <- match(dif.DEs$ensembl, genemap$ensembl_gene_id ) > > dif.DEs$entrez <- genemap$entrezgene[ idx ] > > dif.DEs$hgnc_symbol <- genemap$hgnc_symbol[ idx ] > > > > > > dif.DEs$entrez > > [1] 25870 89869 54465 2840 NA 80230 57673 123264 NA > NA > > [11] NA 392364 NA NA NA NA 221883 NA NA > NA > > [21] NA NA NA NA > > > > Many of that are as NA. How can I annotate all the genes? > > thanks in advance for any help! > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6