Search
Question: strange results with featureCounts
0
gravatar for inah
12 days ago by
inah0
inah0 wrote:

Hi,

  I have human total RNA-seq data (PE, Next-Seq) from a pilot study with two samples and am getting results from featureCounts that do not make any sense to me. I process the data as follows: (1) I perform adapter trimming using ea-utils mcf and mild quality trimming using btrim. I use STAR for alignment to the genome. STAR tells me that the first sample has 99,018,190 input reads (these are read pairs) and the 2nd sample has 126,164,150 input reads.  For the first sample, 86,571,963 reads were aligned (70,579,007 uniquely), and for the 2nd sample, 112,525,928 reads were aligned (89,730,382 uniquely).  Now when I run featureCounts on these data, it prints out this information:

First sample:    Total fragments: 123128857,   Successfully assigned fragments : 63971619 (52.0%)

2nd sample:    Total fragments: 168328570,    Successfully assigned fragments : 83223103 (49.4%)

and this warning: 

WARNING: reads from the same pair were found not adjacent to each other in the input (due to read sorting by location or reporting of multi-mapping read pairs).    

It seems that the total fragments from featureCounts do not match up at all with the read counts from STAR.

Thanks, Ina

ADD COMMENTlink modified 11 days ago by Gordon Smyth33k • written 12 days ago by inah0
1
gravatar for Wei Shi
12 days ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

FeatureCounts reports number of alignments whereas STAR reports number of reads (number of reads pairs in this case since the data is paired end). The reporting of a mapped read may include one or more alignments. A uniquely mapped read will lead to the reporting of one single alignment, but a multi-mapping read will result in more than one alignment being reported. So the difference you observed was caused by the reporting and counting of multi-mapping reads.

If you instruct STAR to output uniquely mapped reads only, then featureCounts will report the same total count. When STAR is allowed to output multi-mapping reads, the total count from featureCounts is always higher because it reports the number of alignments rather than number of reads.

ADD COMMENTlink written 12 days ago by Wei Shi2.8k

thank you very much for the quick response, Wei.  I have one other question: The percentage of successfully assigned fragments is 52% and 49.4% in these two total RNA samples. Is this unusually low?

Thanks again, Ina

ADD REPLYlink written 11 days ago by Ina Hoeschele610

The assignment percentage is typically around 50 - 70 percent. So your percentages are a bit low but not unusual. The percentage tends to be lower when multi-mapping reads are included.

ADD REPLYlink modified 10 days ago • written 11 days ago by Wei Shi2.8k

There is still something that must be going wrong with my analysis. I have compared the numbers of protein coding genes present (here meaning present in two samples with counts of at least 5) between mRNA-seq and total RNA-seq data. The mRNA-seg data had library sizes around 24 million, while the total RNA-seq data has library sizes around 100 million. I get fewer protein-coding genes for the total-RNA data than for the mRNA data (about 13K versus 15K). This is not possible.

There is one thing with featureCounts that I would still like to check. FeatureCounts tells me that for sample 1 63,971,619 fragments were successfully assigned and for sample 2 this number is 83,233,103. When I take the column sums of the count matrix, the library sizes of the two samples are 72,946,895 and 94,373,791. How do these two sets of numbers relate to each other?

Thanks, Ina

ADD REPLYlink written 3 days ago by inah0

If multi-overlapping alignments (alignment overlapping more than one gene) are included in the counting, then it is possible that column sums of count matrix are greater than the total number of alignments assigned by featureCounts because a multi-overlapping alignment gives rise to more than one count in the count matrix. What is your featureCounts command?

ADD REPLYlink written 3 days ago by Wei Shi2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 158 users visited in the last hour