Question

Featurecounts issue for counting exons: Not matching total counts mapped for specific gene

0

Entering edit mode

A ▴ 60

@a-14337

Last seen 2.2 years ago

United Kingdom

Hi all,

I am having a weird issue (weird to me being a new R user) using featurecounts to generate counts for genes but also for individual exons. I have successfully mapped gene counts to their respective genes and the total reads mapped and annotated to genes match those generated by a bioinformatics department so its a good quality control.

I also want to generate counts for individual exons so that I can test the knockdown of one of our genes at specific exons in a mouse model. The issue is that the exon counts do not add up or match I am seeing with total read mapped to genes. The code I am using is as follows:

For annotating genes:

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE)

Count result for gene of interest =

ENSMUSG00000053080 =

1158

This matches results obtained from bioinformatcis department for the gene above as well as all other genes (with +/- of 5 counts, I am guessing for parameters they have used in their script which I am not )...

However, when I attempt to annotate exons specifically, I am running in to trouble for two reasons. The total counts generated and also the amount of exons being incorrect (possible duplicates?)

The code I am using is as follows:

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE, useMetaFeatures = FALSE) OR

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE, useMetaFeatures = "exons")

These both produce the following results:

ENSMUSG00000053080	3
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	71
ENSMUSG00000053080	55
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	0
ENSMUSG00000053080	8
ENSMUSG00000053080	0
ENSMUSG00000053080	0

This is icnredible confusing as there are only 5 exons for this gene, and clearly the counts are far too low to reflect the total counts annotated to the gene as a whole.. Can anybody advice what I might be doing wrong here and why this result is being produced?

And finally, is there an intuitive way of plotting counts per exons in an easy to follow way, maybe thorugh ggplo2 just plotting counts on y axis by exon number on x axis? or a way of passing this festurecount result to ggbio?

Thank you for your help!

featurecounts Rsubread ggbio • 4.3k views

ADD COMMENT • link updated 7.2 years ago by Wei Shi ★ 3.6k • written 7.2 years ago by A ▴ 60

score 0 · Answer 1 · 2018-12-03

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 25 days ago

Australia/Melbourne

Yes as you suspect there should be duplicates in the annotated exons for this gene. You can check this by looking at the exon annotation data featureCounts extracted from the GTF file (the extracted annotation is saved in the component fc$annotation).

When you count reads at exon level, by default a read overlapping more than one exon is not counted. Therefore all duplicate exons will receive zero count. Also junction reads will not be counted either because they span more than one exon. However if you set allowMultiOverlap=TRUE you will get all these reads counted, but the total number of reads counted to exons will be greater than the total number of reads counted to the gene.

For gene level counting, reads overlapping duplicate exons or multiple different exons are always counted once only (the allowMultiOverlap parameter controls how to count those reads overlapping more than one gene instead of more than one exon).

You can use the flattenGTF function in Rsubread to flatten your annotation before you run featureCounts.

ADD COMMENT • link 7.2 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Hi Wei,

Thank you so much for your quick response! I really appreciate the clear explanation. May I ask one quick follow up question please.. Given that reads span exon junctions, looking at the annotation slot, similar start sites have different site ends... So how does one tackle this to produce a total number of counts for the correct amount of exons in the gene.. Is it to merge the duplicates?

Or is this specifically what flattenGTF will do by merging non-overlapping exon bins? If so, do I need to pass the flattened GTF file which is a GFF? because passing the GTF path after flattening GTF produces the same result with duplicated exons...

Thank you!!

ADD REPLY • link 7.2 years ago A ▴ 60

0

Entering edit mode

I would recommend you to merge the duplicated or overlapping exons. This is what flattenGTF function does. Output of this function is a data.frame object that contains a SAF format annotation. Take a look at the help page for flattenGTF for more information. You can then feed the data.frame object to featureCounts for counting.

Or you may directly use the inbuilt annotation for GRCm38/mm10 in featureCounts. The inbuilt annotation has all overlapping/duplicated exons merged. By default, featureCounts uses the inbuilt mm10 annotation to count reads (annot.inbuilt = "mm10").

ADD REPLY • link 7.2 years ago Wei Shi ★ 3.6k

0

Entering edit mode

thank you so much for your assistance! Really appreciate this, it has been so helpful!

ADD REPLY • link 7.2 years ago A ▴ 60