I've been trying to follow the DeSeq2 manual (https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq/inst/doc/DESeq.pdf) but I've been hitting a problem in this part:
> pasillaCountTable = read.table( datafile, header=TRUE, row.names=1 )
> head( pasillaCountTable )
For my data, I have GeneID duplicates and consequently row duplicates. Anyone know a way to get around this? Or a way to compress my GeneIDs into one GeneID row without compromising the data?
Hello! Thank you for your quick reply! I should have been clearer in my question.
I used Kallisto on my RNA-seq data followed by KallistoGather (https://github.com/raynamharris/BehavEphyRNAseq/blob/master/markdownfiles/02b_KallistoGather.Rmd) to get a table of unnormalized counts for all my samples with the Transcript IDs. I just 'merged' this with a table containing corresponding Gene IDs in R which is the table I'm currently using.
We have software for this: tximport followed by DESeqDataSetFromTximport. I think tximport uniquely protects you against changes to gene length from differential isoform usage, while summing estimated counts from isoforms does not (the message from Trapnell et al 2013). tximport also does not produce duplicate gene IDs. You give it the table of transcript to gene correspondence and it produces unique gene estimated counts and statistical offsets.