Question: ENSEBL gene_ID in edgeR analysis
0
gravatar for mzillur
12 weeks ago by
mzillur0
mzillur0 wrote:

Hello there, I have a count file containing 50 human samples. The gencode gtf file was used to map the reads. STAR was used to map and featureCounts was used to generate count data. So, the rownames in my files are ensenbl gene_ids which looks like: "ENSG00000231251.1" "ENSG00000236335.1" "ENSG00000231949.1" "ENSG00000162510.5" total 61471. How can I convert these ids to refseq ids for edgeR analysis? I have tried several ways but failed each time.

 `gids=mapIds(org.Hs.eg.db,keys = rownames(y1),keytype = 'ENSEMBL',column = "SYMBOL")` it says 
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments. but the rownames are the ensembl ids!!
idfound=y1$genes$genes %in% mappedkeys(org.Hs.egENSEMBL) . Only 33 match!!
gids=y1$genes$genes %in% mappedkeys(org.Hs.egREFSEQ) . Only 33 match!!

y1 is the DGElist object and y$genes$genes are the ensembl ids as mentioned above. Any help in this matter? Thanks in advance. Best Regards Zillur

edger refseq enembl • 110 views
ADD COMMENTlink modified 12 weeks ago by Gordon Smyth37k • written 12 weeks ago by mzillur0
Answer: ENSEBL gene_ID in edgeR analysis
1
gravatar for Gordon Smyth
12 weeks ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

You will need to remove the version numbers .1, .5 etc from the names before you will be able to do any matching with the Ensembl Ids. When you've done that, you could then use BiomaRt or the Ensembl website to get gene symbols.

But even better and more direct would to simply extract the gene symbols that are already embedded in the Gencode GTF file you have used.

You don't need Entrez Gene Ids to do an edgeR differential expression analysis, unless you are using the gene ontology or pathway analysis tools. If you really do need Entrez Gene Ids, then I would map from the Genecode gene symbols rather than from the Ensembl Ids. There is essentially a one-to-one correspondence between official HUGO human gene symbols and Entrez Gene Ids, but not between Ensembl Ids and Entrez Ids. Beware however that about 40% of gene symbols in the latest Gencode human annotation are not recognized by HUGO and so cannot be mapped to Entrez.

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Gordon Smyth37k

Thank you very much for your quick response. I have managed to overcome this problem. Your suggestions helped me a lot. I am facing another problem. I m getting totally opposite results using glmQLFit, glmQLFTest in place of glmFit and glmLRT for same contrasts. Which method I need to use to see differential expression between two groups? I assume later according to the user guide. But why I am getting opposite results? Best regards Zillur

ADD REPLYlink written 11 weeks ago by mzillur0

If you want to ask something new then start a new question rather than adding a comment to an old question. I can tell you though that glmQLTest and glmLRT do not give opposite results so, when if you post a question, you would need to give much more detail of what is bothering you that you have here.

ADD REPLYlink written 11 weeks ago by Gordon Smyth37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 190 users visited in the last hour