I am trying to count reads with summarizeOverlaps function. I am working with murine genome and GTF file dowloaded from the website (ftp://ftp.ensembl.org/pub/release-83/gtf/mus_musculus/). My code is:
gtffile <- file.path(dir,"Mus_musculus.GRCm38.83.gtf")
(txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
Counting takes only 10 minutes for all bam files which makes me suspicious since it's much longer for human genome, approx. 30 minutes per bam file with 30 milion reads. I have 12 bam files with approx. 10 milion reads per file. I get a count table but when I continue with differential expression analysis I get low numbers of differentially expressed genes (between 20-200, depends on the condition). I guess there is something not ok about counting step. Could anyone help?
Thanks in advance