The DEXSeq vignette describes several protocols for obtaining exon counts. One option is to use the dexseq_count.py
script (powered by htseq) that comes with the package. Another option is to use summarizeOverlaps
directly in R. I tried both methods with unstranded parameters and the results were identical (based on a few genes). However, stranded settings yield very different results. Some exons are off by a few reads, but some can be as low as half. I used the settings as specified in the vignette, which I assumed would be most appropriate. I also tried adjusting parameters, but it didn't help.
The command for dexseq_count.py:
python /path/DEXSeq/python_scripts/dexseq_count.py \ --format bam \ --order pos \ --paired yes \ --stranded reverse \ $ref_gff \ $bam \ $counts_txt
The command for GenomicAlignments:
txdb = makeTxDbFromGFF(file = genes_gtf, format = "gtf") exons = disjointExons(txdb, aggregateGenes = FALSE) SE = summarizeOverlaps(features = exons, reads = bam_list, singleEnd = FALSE, ignore.strand = FALSE, inter.feature = FALSE, fragments = TRUE)
Is there a mistake with one of the methods?