read count summed over exons is greater tham the gene-level read count using featureCounts
1
0
Entering edit mode
inah ▴ 10
@inah-13176
Last seen 6.4 years ago

Hi,
  I have been using featureCounts to obtain both exon- and gene-level read counts (reads were aligned with STAR). For one particular gene (ARID5B, which has 12 exons, 5 unique to one isoform, 2 unique to another isoform and 5 shared), I find that the read count summed over the 12 exons is greater than the gene-based read count. This is not posssible as featureCounts uses the exon-union method for gene-level counting. Below are the relevant parameter settings for featureCounts:

gene-based count:

annot.ext="/home/inah/RefGTF/GRCh38/annotation/Homo_sapiens.GRCh38.85.gtf",
isGTFAnnotationFile=TRUE,
GTF.featureType="exon", GTF.attrType="gene_id", useMetaFeatures=TRUE,
allowMultiOverlap=TRUE,
minOverlap=1,
largestOverlap=TRUE,
strandSpecific=2,    
isPairedEnd=TRUE

exon-level counts:

annot.ext="/home/inah/RefGTF/GRCh38/annotation/Homo_sapiens.GRCh38.85.gtf",
isGTFAnnotationFile=TRUE,
GTF.featureType="exon", GTF.attrType="exon_id", useMetaFeatures=TRUE,
allowMultiOverlap=TRUE,
minOverlap=1,
largestOverlap=TRUE,
strandSpecific=2,    
isPairedEnd=TRUE

For the exon-level counts, I set useMetaFeatures=TRUE because if it is FALSE, then it looks like the count matrix contains multiple (identical) rows for exons which are shared by several isoforms (when TRUE only one of these rows is present).

Can someone give me a hint why my exon-sum count is higher than the gene-level count, is there something wrong with my parameter settings for the exon-level counting?

Thanks, Ina

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsubread_1.24.1

 

rsubread featurecounts exon mRNAseq • 3.1k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 11 weeks ago
Australia/Melbourne/Olivia Newton-John …

Hi Ina, this is not unexpected since exon-spanning reads (reads overlapping more than one exon) were counted more than once in your exon-level counting but they were counted only once in your gene-level counting. These reads should be counted more than once in your exon-level counting since they originate from multiple exons and each overlapping exon should receive a count. Your commands seem fine.

 

ADD COMMENT
0
Entering edit mode
To be more specific, is this because in the command, both "useMetaFeatures" and "allowMultiOverlap" are set true? If one only set "useMetaFeatures" true and "allowMultiOverlap" false, then the summed counts over exonic should be smaller than gene level? (in manual, it says if at meta feature level, exon spanning reads will only count once even they overlap with multiple exons)
ADD REPLY
0
Entering edit mode

Yes this should make total exonic counts be less than total counts for genes. However this will result in the loss of exon-spanning reads and your exonic counting result wouldn't be as accurate. 

What is the problem with getting more counts for exons? You have to count all the reads originating from an exon no matter they are exon-spanning reads or reads falling entirely within the exon.

 

ADD REPLY

Login before adding your answer.

Traffic: 744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6