ensembl to hgnc symbols creates duplicates
1
1
Entering edit mode
Matan G. ▴ 60
@matan-g-22483
Last seen 3.4 years ago

Hi all,

My data is a data frame of estimated TPM counts, where rows are the genes and columns are the samples. I'm using "library(biomaRt)" to get ensembl symbols and hgnc symbols. When trying to change the rownames from enemble to hgnc symbols I get an error which stems from duplicates in hgnc symbols the way I understand it.

The error I get:

"Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘’, ‘ABCF2’, ‘LINC01238’, ‘POLR2J3’, ‘POLR2J4’, ‘TBCE"

How can I solve this issue? EDITED: using .rowNamesDF(TPM_countdata, make.names=TRUE) I've managed to force row names to be hgnc coded but I don't understand the reason it creates duplicates initially and not unique names of hgnc.

Thanks and all the best

data screenshot https://ibb.co/8M853bm

r biomart genemap TPM • 2.5k views
ADD COMMENT
1
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 7 weeks ago
Republic of Ireland

Hey Matan,

It is expected for this to happen when comparing across annotation systems, in this case, Ensembl to HGNC. To understand why, please look at these answers on biostars, one from the Ensembl Outreach Project Leader:

What I usually do is merge the Ensembl and HGNC IDs via an underscore '_', which can be removed when it comes to exporting your final result or generating plots.

Note that, while we define a gene as a static unit, the genome does not behave this way. Transcription is a pervasive process whereby, over millions of years of evolution, certain parts of the genome are transcribed more frequently under certain cellular / environmental conditions, and then translated into proteins. The vast majority of the genome is still transcribed to some level, but can be regarded as background 'transcriptional noise'.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6