Question

Gene ID Duplicates

0

Entering edit mode

anokhi1997 • 0

@anokhi1997-15445

Last seen 6.1 years ago

I've been trying to follow the DeSeq2 manual (https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq/inst/doc/DESeq.pdf) but I've been hitting a problem in this part:

> pasillaCountTable = read.table( datafile, header=TRUE, row.names=1 )

> head( pasillaCountTable )

For my data, I have GeneID duplicates and consequently row duplicates. Anyone know a way to get around this? Or a way to compress my GeneIDs into one GeneID row without compromising the data?

deseq2 genetics Tutorial • 2.1k views

ADD COMMENT • link updated 6.1 years ago by James W. MacDonald 65k • written 6.1 years ago by anokhi1997 • 0

score 0 · Answer 1 · 2018-04-05

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 18 minutes ago

United States

Please provide more information about the upstream steps. Ideally you would not have duplicate rows at the start with the same ID. How did you produce the count table?

ADD COMMENT • link 6.1 years ago Michael Love 42k

0

Entering edit mode

Hello! Thank you for your quick reply! I should have been clearer in my question.

I used Kallisto on my RNA-seq data followed by KallistoGather (https://github.com/raynamharris/BehavEphyRNAseq/blob/master/markdownfiles/02b_KallistoGather.Rmd) to get a table of unnormalized counts for all my samples with the Transcript IDs. I just 'merged' this with a table containing corresponding Gene IDs in R which is the table I'm currently using.

ADD REPLY • link 6.1 years ago anokhi1997 • 0

1

Entering edit mode

We have software for this: tximport followed by DESeqDataSetFromTximport. I think tximport uniquely protects you against changes to gene length from differential isoform usage, while summing estimated counts from isoforms does not (the message from Trapnell et al 2013). tximport also does not produce duplicate gene IDs. You give it the table of transcript to gene correspondence and it produces unique gene estimated counts and statistical offsets.

ADD REPLY • link 6.1 years ago Michael Love 42k