Gene ID Duplicates
Entering edit mode
anokhi1997 • 0
Last seen 2.9 years ago

I've been trying to follow the DeSeq2 manual ( but I've been hitting a problem in this part:

> pasillaCountTable = read.table( datafile, header=TRUE, row.names=1 )

> head( pasillaCountTable )

For my data, I have GeneID duplicates and consequently row duplicates. Anyone know a way to get around this? Or a way to compress my GeneIDs into one GeneID row without compromising the data?

deseq2 genetics Tutorial • 532 views
Entering edit mode
Last seen 21 hours ago
United States

Please provide more information about the upstream steps. Ideally you would not have duplicate rows at the start with the same ID. How did you produce the count table?

Entering edit mode

Hello! Thank you for your quick reply! I should have been clearer in my question.

I used Kallisto on my RNA-seq data followed by KallistoGather ( to get a table of unnormalized counts for all my samples with the Transcript IDs. I just 'merged' this with a table containing corresponding Gene IDs in R which is the table I'm currently using.

Entering edit mode

We have software for this: tximport followed by DESeqDataSetFromTximport. I think tximport uniquely protects you against changes to gene length from differential isoform usage, while summing estimated counts from isoforms does not (the message from Trapnell et al 2013). tximport also does not produce duplicate gene IDs. You give it the table of transcript to gene correspondence and it produces unique gene estimated counts and statistical offsets.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Traffic: 138 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.4