I am analyzing RNA-seq data using edgeR, and I have my read counts matrix annotated with gene symbols and the ENSEMBL id. I am finding that there are multiple gene symbols assigned to different ENSEMBL ids. Some are ncRNAs on different locations of the chromosome. I am wondering how you deal with duplicates prior to CPM filtering, TMM normalization, and designing the matrix - do you sum or average the read counts per duplicated gene across all samples or do you remove all instances of duplicates and keep the gene with the highest read count total? What is the best practice?