ensembl to hgnc symbols creates duplicates
Entering edit mode
Matan G. ▴ 40
Last seen 6 months ago

Hi all,

My data is a data frame of estimated TPM counts, where rows are the genes and columns are the samples. I'm using "library(biomaRt)" to get ensembl symbols and hgnc symbols. When trying to change the rownames from enemble to hgnc symbols I get an error which stems from duplicates in hgnc symbols the way I understand it.

The error I get:

"Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘’, ‘ABCF2’, ‘LINC01238’, ‘POLR2J3’, ‘POLR2J4’, ‘TBCE"

How can I solve this issue? EDITED: using .rowNamesDF(TPM_countdata, make.names=TRUE) I've managed to force row names to be hgnc coded but I don't understand the reason it creates duplicates initially and not unique names of hgnc.

Thanks and all the best

data screenshot https://ibb.co/8M853bm

r biomart genemap TPM • 209 views
Entering edit mode
Last seen 5 hours ago
Paddington, London, UK

Hey Matan,

It is expected for this to happen when comparing across annotation systems, in this case, Ensembl to HGNC. To understand why, please look at these answers on biostars, one from the Ensembl Outreach Project Leader:

What I usually do is merge the Ensembl and HGNC IDs via an underscore '_', which can be removed when it comes to exporting your final result or generating plots.

Note that, while we define a gene as a static unit, the genome does not behave this way. Transcription is a pervasive process whereby, over millions of years of evolution, certain parts of the genome are transcribed more frequently under certain cellular / environmental conditions, and then translated into proteins. The vast majority of the genome is still transcribed to some level, but can be regarded as background 'transcriptional noise'.



Login before adding your answer.

Traffic: 317 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6