Quantification of Genes with RSubread::featureCounts() at exon-level vs gene-level?
1
0
Entering edit mode
Sara • 0
@41d09ed8
Last seen 10 months ago
United States

Hello,

I am following along with an RNA-Seq Pipeline with ambiguous steps. I was told they quantification was performed like so: "We used subread to summarize counts to the gene level."

These are the parameters that I am using:

mock1 <- featureCounts(files="mock1_Aligned.sortedByCoord.out.bam",isPairedEnd=TRUE, GTF.featureType="exon",
 GTF.attrType="gene_id", annot.ext= "gencode.v41.primary_assembly.annotation.gtf", isGTFAnnotationFile=TRUE, 
countMultiMappingReads = FALSE, countReadPairs=T, minMQS = 30, largestOverlap = T)

So my reads are being summarized at the gene-level correct because the default is to count meta-features?

I am confused because the GTF.featureType is set to "exon" and not gene? I have performed Differential expression with GTF.featureType set to both exon and gene and get differences in my top-DE genes. I would expect the results to be similar if exon counts are being aggregated at the gene-level... I feel like keeping the largest Overlap could cause differences because in one situation it is the overlap for genes and it is the overlap for exons in another. How should I deal with overlapping fragments for the most accurate results?

Here is the status report for gene feature quantification:

Status  mock
Assigned                  15877654
Unassigned_Unmapped       73946
Unassigned_Read_Type            0
Unassigned_Singleton    0
Unassigned_MappingQuality     3571086
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping      10826366
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures         963219
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity          3536273

And for exon-feature:

Status  mock1
Assigned                15795944
Unassigned_Unmapped     73946
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality    3571086
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 10826366
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures        3532032
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity        1049170

Counts for counting exons:

    gene mock1
ENSG00000223972.5   0
ENSG00000227232.5   28
ENSG00000278267.1   0
ENSG00000243485.5   22

Counts for counting genes:

    mock1
ENSG00000223972.5   0
ENSG00000227232.5   86
ENSG00000278267.1   0
ENSG00000243485.5   19

Why are the counts significantly higher for ENSG00000227232.5 when the GTF.featureType is set to "gene". There are more exons than genes so I figured the count would be higher for exon?

Thanks, Sara

Subread RNASeq featureCounts quantification • 2.7k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 4 minutes ago
WEHI, Melbourne, Australia

GTF.featureType should be set to `"exon" regardless of whether you want to count at the exon or gene level.

Apart from specifying your files by

files = "mock1_Aligned.sortedByCoord.out.bam",
isPairedEnd = TRUE, 
annot.ext = "gencode.v41.primary_assembly.annotation.gtf",
isGTFAnnotationFile = TRUE,

you should leave all arguments at their default values. All arguments are set to defaults that are recommended and used by the featureCounts authors themselves, so you should not change the settings unless your situation is unusual (which yours is not) or you are an expert.

Setting GTF.featureType="gene" tells featureCounts to count all reads that map anywhere in the gene body including introns as well as exons, so naturally it will give higher counts.

ADD COMMENT

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6