Question: Featurecounts issue for counting exons: Not matching total counts mapped for specific gene
0
gravatar for A
5 months ago by
A40
A40 wrote:

Hi all, 

I am having a weird issue (weird to me being a new R user) using featurecounts to generate counts for genes but also for individual exons. I have successfully mapped gene counts to their respective genes and the total reads mapped and annotated to genes match those generated by a bioinformatics department so its a good quality control. 

I also want to generate counts for individual exons so that I can test the knockdown of one of our genes at specific exons in a mouse model. The issue is that the exon counts do not add up or match I am seeing with total read mapped to genes. The code I am using is as follows:

For annotating genes:

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE)

Count result for gene of interest = 

ENSMUSG00000053080 = 1158

 

This matches results obtained from bioinformatcis department  for the gene above as well as all other genes (with +/- of 5 counts, I am guessing for parameters they have used in their script which I am not )...

However, when I attempt to annotate exons specifically, I am running in to trouble for two reasons. The total counts generated and also the amount of exons being incorrect (possible duplicates?)

The code I am using is as follows:

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE, useMetaFeatures = FALSE) OR

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE, useMetaFeatures = "exons")

These both produce the following results: 

 

ENSMUSG00000053080 3
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 71
ENSMUSG00000053080 55
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 8
ENSMUSG00000053080 0
ENSMUSG00000053080

0

 

This is icnredible confusing as there are only 5 exons for this gene, and clearly the counts are far too low to reflect the total counts annotated to the gene as a whole.. Can anybody advice what I might be doing wrong here and why this result is being produced?

 

And finally, is there an intuitive way of plotting counts per exons in an easy to follow way, maybe thorugh ggplo2 just plotting counts on y axis by exon number on x axis? or a way of passing this festurecount result to ggbio?

Thank you for your help!

 

 

rsubread ggbio featurecounts • 213 views
ADD COMMENTlink modified 5 months ago by Wei Shi3.1k • written 5 months ago by A40
Answer: Featurecounts issue for counting exons: Not matching total counts mapped for spe
0
gravatar for Wei Shi
5 months ago by
Wei Shi3.1k
Australia
Wei Shi3.1k wrote:

Yes as you suspect there should be duplicates in the annotated exons for this gene. You can check this by looking at the exon annotation data featureCounts extracted from the GTF file (the extracted annotation is saved in the component fc$annotation).

When you count reads at exon level, by default a read overlapping more than one exon is not counted. Therefore all duplicate exons will receive zero count. Also junction reads will not be counted either because they span more than one exon. However if you set allowMultiOverlap=TRUE you will get all these reads counted, but the total number of reads counted to exons will be greater than the total number of reads counted to the gene.

For gene level counting, reads overlapping duplicate exons or multiple different exons are always counted once only (the allowMultiOverlap parameter controls how to count those reads overlapping more than one gene instead of more than one exon).

You can use the flattenGTF function in Rsubread to flatten your annotation before you run featureCounts

ADD COMMENTlink written 5 months ago by Wei Shi3.1k

Hi Wei, 

 

Thank you so much for your quick response! I really appreciate the clear explanation. May I ask one quick follow up question please.. Given that reads span exon junctions, looking at the annotation slot, similar start sites have different site ends... So how does one tackle this to produce a total number of counts for the correct amount of exons in the gene.. Is it to merge the duplicates?

 

Or is this specifically what flattenGTF will do by merging non-overlapping exon bins? If so, do I need to pass the flattened GTF file which is a GFF? because passing the GTF path after flattening GTF produces the same result with duplicated exons...

 

Thank you!!

ADD REPLYlink written 5 months ago by A40

I would recommend you to merge the duplicated or overlapping exons. This is what flattenGTF function does. Output of this function is a data.frame object that contains a SAF format annotation. Take a look at the help page for flattenGTF for more information. You can then feed the data.frame object to featureCounts for counting.

Or you may directly use the inbuilt annotation for GRCm38/mm10 in featureCounts. The inbuilt annotation has all overlapping/duplicated exons merged. By default, featureCounts uses the inbuilt mm10 annotation to count reads (annot.inbuilt = "mm10"). 

ADD REPLYlink modified 5 months ago • written 5 months ago by Wei Shi3.1k

thank you so much for your assistance! Really appreciate this, it has been so helpful!

ADD REPLYlink written 5 months ago by A40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 143 users visited in the last hour