Search
Question: Biomart annotation
0
gravatar for jarod_v6@libero.it
3.4 years ago by
Italy
jarod_v6@libero.it40 wrote:
Hi there! I need to convert all my ensemble gene id on hgnc symbols and entrez gene id. My ensemble release s the n?72. I use this script: dif.DEs$ensembl <- sapply(strsplit(rownames(dif.DEs),split="nn+"),"[",1) #use biomart library( "biomaRt" ) ensembl = useMart( host="jun2013.archive.ensembl.org",biomart=" ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = "ensembl_gene_id", values = dif.DEs$ensembl, mart = ensembl ) idx <- match(dif.DEs$ensembl, genemap$ensembl_gene_id ) dif.DEs$entrez <- genemap$entrezgene[ idx ] dif.DEs$hgnc_symbol <- genemap$hgnc_symbol[ idx ] dif.DEs$entrez [1] 25870 89869 54465 2840 NA 80230 57673 123264 NA NA [11] NA 392364 NA NA NA NA 221883 NA NA NA [21] NA NA NA NA Many of that are as NA. How can I annotate all the genes? thanks in advance for any help!
ADD COMMENTlink modified 3.4 years ago by John Blischak120 • written 3.4 years ago by jarod_v6@libero.it40
0
gravatar for John Blischak
3.4 years ago by
John Blischak120
John Blischak120 wrote:
Hi, I don't think there is a problem. Ensembl includes annotations for some genes that Entrez does not. An example that I found using the code below, RN7SL163P is a pseudogene included in Ensembl (ENSG00000266195) but not in Entrez. If you are not interested in pseudogenes, this should not be an issue for you analysis. library("biomaRt") ensembl <- useMart(host = "jun2013.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") ens_id <- getBM(attributes = "ensembl_gene_id", mart = ensembl) entrez_id <- getBM(attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = "ensembl_gene_id", values = ens_id$ensembl_gene_id, mart = ensembl) dim(entrez_id) # [1] 66211 3 sumis.na(entrez_id$entrezgene)) # [1] 36545 head(entrez_id) # ensembl_gene_id entrezgene hgnc_symbol # 1 ENSG00000266195 NA RN7SL163P # 2 ENSG00000264715 NA # 3 ENSG00000264800 100422895 MIR4294 # 4 ENSG00000207390 NA # 5 ENSG00000206995 NA # 6 ENSG00000266431 100847076 MIR5580 John On Fri, Jul 18, 2014 at 4:27 AM, jarod_v6@libero.it <jarod_v6@libero.it> wrote: > Hi there! > I need to convert all my ensemble gene id on hgnc symbols and entrez gene > id. > My ensemble release s the n°72. > > > I use this script: > > dif.DEs$ensembl <- sapply(strsplit(rownames(dif.DEs),split="nn+"),"[",1) > #use biomart > library( "biomaRt" ) > ensembl = useMart( host="jun2013.archive.ensembl.org",biomart=" > ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) > genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", > "hgnc_symbol"), > filters = "ensembl_gene_id", > values = dif.DEs$ensembl, > mart = ensembl ) > idx <- match(dif.DEs$ensembl, genemap$ensembl_gene_id ) > dif.DEs$entrez <- genemap$entrezgene[ idx ] > dif.DEs$hgnc_symbol <- genemap$hgnc_symbol[ idx ] > > > dif.DEs$entrez > [1] 25870 89869 54465 2840 NA 80230 57673 123264 NA NA > [11] NA 392364 NA NA NA NA 221883 NA NA NA > [21] NA NA NA NA > > Many of that are as NA. How can I annotate all the genes? > thanks in advance for any help! > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENTlink written 3.4 years ago by John Blischak120
Also, you can specify the biotype filter so you only retrieve protein coding genes by changing your query to: entrez_id <- getBM(attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = c("ensembl_gene_id","biotype"), values = list(ens_id$ensembl_gene_id,"protein_coding"), mart = ensembl) This wil remove most of the NAs for entrez gene ids. sumis.na(entrez_id$entrezgene)) [1] 1432 Best, Steffen On Fri, Jul 18, 2014 at 10:00 AM, John Blischak <jdblischak@gmail.com> wrote: > Hi, > > I don't think there is a problem. Ensembl includes annotations for some > genes that Entrez does not. An example that I found using the code below, > RN7SL163P is a pseudogene included in Ensembl (ENSG00000266195) but not in > Entrez. If you are not interested in pseudogenes, this should not be an > issue for you analysis. > > library("biomaRt") > ensembl <- useMart(host = "jun2013.archive.ensembl.org", > biomart = "ENSEMBL_MART_ENSEMBL", > dataset = "hsapiens_gene_ensembl") > ens_id <- getBM(attributes = "ensembl_gene_id", mart = ensembl) > entrez_id <- getBM(attributes = c("ensembl_gene_id", "entrezgene", > "hgnc_symbol"), > filters = "ensembl_gene_id", > values = ens_id$ensembl_gene_id, > mart = ensembl) > dim(entrez_id) > # [1] 66211 3 > sumis.na(entrez_id$entrezgene)) > # [1] 36545 > head(entrez_id) > # ensembl_gene_id entrezgene hgnc_symbol > # 1 ENSG00000266195 NA RN7SL163P > # 2 ENSG00000264715 NA > # 3 ENSG00000264800 100422895 MIR4294 > # 4 ENSG00000207390 NA > # 5 ENSG00000206995 NA > # 6 ENSG00000266431 100847076 MIR5580 > > John > > > On Fri, Jul 18, 2014 at 4:27 AM, jarod_v6@libero.it <jarod_v6@libero.it> > wrote: > > > Hi there! > > I need to convert all my ensemble gene id on hgnc symbols and entrez > gene > > id. > > My ensemble release s the n°72. > > > > > > I use this script: > > > > dif.DEs$ensembl <- sapply(strsplit(rownames(dif.DEs),split="nn+"),"[",1) > > #use biomart > > library( "biomaRt" ) > > ensembl = useMart( host="jun2013.archive.ensembl.org",biomart=" > > ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) > > genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", > > "hgnc_symbol"), > > filters = "ensembl_gene_id", > > values = dif.DEs$ensembl, > > mart = ensembl ) > > idx <- match(dif.DEs$ensembl, genemap$ensembl_gene_id ) > > dif.DEs$entrez <- genemap$entrezgene[ idx ] > > dif.DEs$hgnc_symbol <- genemap$hgnc_symbol[ idx ] > > > > > > dif.DEs$entrez > > [1] 25870 89869 54465 2840 NA 80230 57673 123264 NA > NA > > [11] NA 392364 NA NA NA NA 221883 NA NA > NA > > [21] NA NA NA NA > > > > Many of that are as NA. How can I annotate all the genes? > > thanks in advance for any help! > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 3.4 years ago by Steffen Durinck530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 116 users visited in the last hour