Question: DESeq2: removal of duplicated genes during statistical analysis
0
gravatar for tkapell
5 months ago by
tkapell0
University of Bonn, Germany
tkapell0 wrote:

Hi all,

in my latest analysis with the DESeq2 package, I noticed that I had a few paralog genes (~30) which shared the same statistics and ensembl ID. I can use only the unique genes, but I was wondering whether it would be more appropriate to remove the duplicated genes before the statistical analysis since they contain redundant information which worsens my statistics. Would you recommend doing this or would you still include them in the analysis and maybe discard them later in downstream visualization? And if so, would you remove them before running results() or DESeq()? Hope this makes sense.

 

deseq2 ensembl gene symbol • 135 views
ADD COMMENTlink modified 5 months ago by Michael Love23k • written 5 months ago by tkapell0
Answer: DESeq2: removal of duplicated genes during statistical analysis
0
gravatar for Michael Love
5 months ago by
Michael Love23k
United States
Michael Love23k wrote:

What is your quantification setup? Why do you end up with multiple rows with the same ID?

ADD COMMENTlink written 5 months ago by Michael Love23k

I did a gene differential expression analysis using transcriptome levels therefore my count table has transcript version IDs (e.g. ENSG00000000003.14). When I convert those to ensembl IDs (e.g. ENSG00000000003), there are about 30 genes which share the same ensembl ID because they are paralogs in the Y chromosome (their transcript version ID ende in _PAR_Y, but have the same ensembl ID).

ADD REPLYlink written 5 months ago by tkapell0

I haven't thought about what to do with these, but generally if the sequence is near identical, I would collapse the redundant transcripts by adding their counts together. Salmon does this by default for identical transcripts (where otherwise the counts would be split equally among the identical sequences).

ADD REPLYlink written 5 months ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 272 users visited in the last hour