Gene ID Duplicates
1
0
Entering edit mode
anokhi1997 • 0
@anokhi1997-15445
Last seen 6.6 years ago

I've been trying to follow the DeSeq2 manual (https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq/inst/doc/DESeq.pdf) but I've been hitting a problem in this part:

> pasillaCountTable = read.table( datafile, header=TRUE, row.names=1 )

> head( pasillaCountTable )

For my data, I have GeneID duplicates and consequently row duplicates. Anyone know a way to get around this? Or a way to compress my GeneIDs into one GeneID row without compromising the data?

deseq2 genetics Tutorial • 2.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Please provide more information about the upstream steps. Ideally you would not have duplicate rows at the start with the same ID. How did you produce the count table?

ADD COMMENT
0
Entering edit mode

Hello! Thank you for your quick reply! I should have been clearer in my question.

I used Kallisto on my RNA-seq data followed by KallistoGather (https://github.com/raynamharris/BehavEphyRNAseq/blob/master/markdownfiles/02b_KallistoGather.Rmd) to get a table of unnormalized counts for all my samples with the Transcript IDs. I just 'merged' this with a table containing corresponding Gene IDs in R which is the table I'm currently using.

ADD REPLY
1
Entering edit mode

We have software for this: tximport followed by DESeqDataSetFromTximport. I think tximport uniquely protects you against changes to gene length from differential isoform usage, while summing estimated counts from isoforms does not (the message from Trapnell et al 2013). tximport also does not produce duplicate gene IDs. You give it the table of transcript to gene correspondence and it produces unique gene estimated counts and statistical offsets.

ADD REPLY

Login before adding your answer.

Traffic: 619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6