Question

ENSEBL gene_ID in edgeR analysis

0

Entering edit mode

mzillur • 0

@mzillur-20118

Last seen 4.8 years ago

Hello there, I have a count file containing 50 human samples. The gencode gtf file was used to map the reads. STAR was used to map and featureCounts was used to generate count data. So, the rownames in my files are ensenbl gene_ids which looks like: "ENSG00000231251.1" "ENSG00000236335.1" "ENSG00000231949.1" "ENSG00000162510.5" total 61471. How can I convert these ids to refseq ids for edgeR analysis? I have tried several ways but failed each time.

 `gids=mapIds(org.Hs.eg.db,keys = rownames(y1),keytype = 'ENSEMBL',column = "SYMBOL")` it says 
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments. but the rownames are the ensembl ids!!
idfound=y1$genes$genes %in% mappedkeys(org.Hs.egENSEMBL) . Only 33 match!!
gids=y1$genes$genes %in% mappedkeys(org.Hs.egREFSEQ) . Only 33 match!!

y1 is the DGElist object and y$genes$genes are the ensembl ids as mentioned above. Any help in this matter? Thanks in advance. Best Regards Zillur

edgeR ENEMBL RefSeq • 2.8k views

ADD COMMENT • link updated 5.1 years ago by Gordon Smyth 50k • written 5.1 years ago by mzillur • 0

score 1 · Answer 1 · 2019-03-20

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 minutes ago

WEHI, Melbourne, Australia

You will need to remove the version numbers .1, .5 etc from the names before you will be able to do any matching with the Ensembl Ids. When you've done that, you could then use BiomaRt or the Ensembl website to get gene symbols.

But even better and more direct would to simply extract the gene symbols that are already embedded in the Gencode GTF file you have used.

You don't need Entrez Gene Ids to do an edgeR differential expression analysis, unless you are using the gene ontology or pathway analysis tools. If you really do need Entrez Gene Ids, then I would map from the Genecode gene symbols rather than from the Ensembl Ids. There is essentially a one-to-one correspondence between official HUGO human gene symbols and Entrez Gene Ids, but not between Ensembl Ids and Entrez Ids. Beware however that about 40% of gene symbols in the latest Gencode human annotation are not recognized by HUGO and so cannot be mapped to Entrez.

ADD COMMENT • link 5.1 years ago Gordon Smyth 50k

0

Entering edit mode

Thank you very much for your quick response. I have managed to overcome this problem. Your suggestions helped me a lot. I am facing another problem. I m getting totally opposite results using glmQLFit, glmQLFTest in place of glmFit and glmLRT for same contrasts. Which method I need to use to see differential expression between two groups? I assume later according to the user guide. But why I am getting opposite results? Best regards Zillur

ADD REPLY • link 5.1 years ago mzillur • 0

0

Entering edit mode

If you want to ask something new then start a new question rather than adding a comment to an old question. I can tell you though that glmQLTest and glmLRT do not give opposite results so, when if you post a question, you would need to give much more detail of what is bothering you that you have here.

ADD REPLY • link 5.1 years ago Gordon Smyth 50k