Search
Question: read count summed over exons is greater tham the gene-level read count using featureCounts
0
gravatar for inah
5 months ago by
inah0
inah0 wrote:

Hi,
  I have been using featureCounts to obtain both exon- and gene-level read counts (reads were aligned with STAR). For one particular gene (ARID5B, which has 12 exons, 5 unique to one isoform, 2 unique to another isoform and 5 shared), I find that the read count summed over the 12 exons is greater than the gene-based read count. This is not posssible as featureCounts uses the exon-union method for gene-level counting. Below are the relevant parameter settings for featureCounts:

gene-based count:

annot.ext="/home/inah/RefGTF/GRCh38/annotation/Homo_sapiens.GRCh38.85.gtf",
isGTFAnnotationFile=TRUE,
GTF.featureType="exon", GTF.attrType="gene_id", useMetaFeatures=TRUE,
allowMultiOverlap=TRUE,
minOverlap=1,
largestOverlap=TRUE,
strandSpecific=2,    
isPairedEnd=TRUE

exon-level counts:

annot.ext="/home/inah/RefGTF/GRCh38/annotation/Homo_sapiens.GRCh38.85.gtf",
isGTFAnnotationFile=TRUE,
GTF.featureType="exon", GTF.attrType="exon_id", useMetaFeatures=TRUE,
allowMultiOverlap=TRUE,
minOverlap=1,
largestOverlap=TRUE,
strandSpecific=2,    
isPairedEnd=TRUE

For the exon-level counts, I set useMetaFeatures=TRUE because if it is FALSE, then it looks like the count matrix contains multiple (identical) rows for exons which are shared by several isoforms (when TRUE only one of these rows is present).

Can someone give me a hint why my exon-sum count is higher than the gene-level count, is there something wrong with my parameter settings for the exon-level counting?

Thanks, Ina

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsubread_1.24.1

 

ADD COMMENTlink modified 5 months ago by Wei Shi2.7k • written 5 months ago by inah0
0
gravatar for Wei Shi
5 months ago by
Wei Shi2.7k
Australia
Wei Shi2.7k wrote:

Hi Ina, this is not unexpected since exon-spanning reads (reads overlapping more than one exon) were counted more than once in your exon-level counting but they were counted only once in your gene-level counting. These reads should be counted more than once in your exon-level counting since they originate from multiple exons and each overlapping exon should receive a count. Your commands seem fine.

 

ADD COMMENTlink written 5 months ago by Wei Shi2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 268 users visited in the last hour