how can I annotate some of my genes
1
0
Entering edit mode
@bioinformatics-10931
Last seen 2.8 years ago
United States

I have a serious problem with gene ID annotation :-)

I am converting the ensemble_ID to gene names 

this is an example of IDs 

ENSG00000122718
ENSG00000130201
ENSG00000150076
ENSG00000150526
ENSG00000155640
ENSG00000166748
ENSG00000168260
ENSG00000168787
ENSG00000170590
ENSG00000170803
ENSG00000171484
ENSG00000172381
ENSG00000172774

 

here is what I do 

data = read.table("list_ensembl_gid.tsv")
colnames(data)[1] <- "ensembl_gene_id"
colnames(data)[2] <- "counts"
library('biomaRt')
hsapiens = useMart("ensembl",
                    dataset="hsapiens_gene_ensembl")
hsapiens_infos <- getBM(attributes=c('ensembl_gene_id',
                                     'external_gene_name'),
                        mart = hsapiens)
merge_infos <- merge(x = data,
                     y = hsapiens_infos,
                     by = "ensembl_gene_id",
                     all.x = TRUE)

 

biomart r • 936 views
ADD COMMENT
0
Entering edit mode

What exactly is the problem?  You're using a number of packages and functions here, but haven't said what is going wrong, or what output you're currently generating.

ADD REPLY
0
Entering edit mode

@Mike Smith I dont have any problem with programing. The problem is that I cannot annotate them to gene name. Aparently these are old biomart database. Is there any possibility to annotate them ? can you please just run and see it by yourself?

ADD REPLY
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 5 minutes ago
EMBL Heidelberg

These IDs are no longer in the Ensembl database e.g. ENSG00000155640 

Gene: ENSG00000155640

This identifier is not in the current EnsEMBL database

You can use an old version of Ensembl, but your probably better off working out why you're getting these out-of-date gene IDs in your results.  To use the old Ensembl archive you can do:

hsapiens = useMart("ensembl",
                   dataset="hsapiens_gene_ensembl",
                   host = "http://sep2015.archive.ensembl.org")
ADD COMMENT
0
Entering edit mode

@Mike Smith the problem is that the data is coming from TCGA and they were aligned in 2014-2015. They don't release teh raw files and I should use the htseq-count as it is. I am having issue with about 3000 genes which are outdated. I dont want to discard them but also I did not know how to get them assigned to a gene name. I will see if I can get them annotaed 

 

ADD REPLY

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6