I am following along with an RNA-Seq Pipeline with ambiguous steps. I was told they quantification was performed like so: "We used subread to summarize counts to the gene level."
These are the parameters that I am using:
mock1 <- featureCounts(files="mock1_Aligned.sortedByCoord.out.bam",isPairedEnd=TRUE, GTF.featureType="exon", GTF.attrType="gene_id", annot.ext= "gencode.v41.primary_assembly.annotation.gtf", isGTFAnnotationFile=TRUE, countMultiMappingReads = FALSE, countReadPairs=T, minMQS = 30, largestOverlap = T)
So my reads are being summarized at the gene-level correct because the default is to count meta-features?
I am confused because the GTF.featureType is set to "exon" and not gene? I have performed Differential expression with GTF.featureType set to both exon and gene and get differences in my top-DE genes. I would expect the results to be similar if exon counts are being aggregated at the gene-level... I feel like keeping the largest Overlap could cause differences because in one situation it is the overlap for genes and it is the overlap for exons in another. How should I deal with overlapping fragments for the most accurate results?
Here is the status report for gene feature quantification:
Status mock Assigned 15877654 Unassigned_Unmapped 73946 Unassigned_Read_Type 0 Unassigned_Singleton 0 Unassigned_MappingQuality 3571086 Unassigned_Chimera 0 Unassigned_FragmentLength 0 Unassigned_Duplicate 0 Unassigned_MultiMapping 10826366 Unassigned_Secondary 0 Unassigned_NonSplit 0 Unassigned_NoFeatures 963219 Unassigned_Overlapping_Length 0 Unassigned_Ambiguity 3536273
And for exon-feature:
Status mock1 Assigned 15795944 Unassigned_Unmapped 73946 Unassigned_Read_Type 0 Unassigned_Singleton 0 Unassigned_MappingQuality 3571086 Unassigned_Chimera 0 Unassigned_FragmentLength 0 Unassigned_Duplicate 0 Unassigned_MultiMapping 10826366 Unassigned_Secondary 0 Unassigned_NonSplit 0 Unassigned_NoFeatures 3532032 Unassigned_Overlapping_Length 0 Unassigned_Ambiguity 1049170
Counts for counting exons:
gene mock1 ENSG00000223972.5 0 ENSG00000227232.5 28 ENSG00000278267.1 0 ENSG00000243485.5 22
Counts for counting genes:
mock1 ENSG00000223972.5 0 ENSG00000227232.5 86 ENSG00000278267.1 0 ENSG00000243485.5 19
Why are the counts significantly higher for ENSG00000227232.5 when the GTF.featureType is set to "gene". There are more exons than genes so I figured the count would be higher for exon?