Question: Create count matrix with featureCounts for stranded as well as reversely stranded, singles as well as paired-end libraries
0
18 months ago by
juls0
juls0 wrote:

Hi all,

I am analysing some public stranded RNA-seq data. Most are paired-end. Some libraries are  fr-secondstrand (FR / F) and most are fr-firststrand (RF / R). I have now mapped all libraries with HISAT2 using the correct --rna-strandness parameter (RF, R, F, FR). Now I would like to use featureCount to create a count matrix.

This leads to the question - How should I set these two arguments?

isPairedEnd  [F|T]

logical indicating if paired-end reads are used. If TRUE, fragments (templates or read pairs) will be counted instead of individual reads. FALSE by default.

strandSpecific [0|1|2]

integer indicating if strand-specific read counting should be performed. It has three possible values: 0(unstranded), 1 (stranded) and 2 (reversely stranded). 0 by default.

My guess is I should create count matrices for the different data types separately and then merge them together. Is this correct?

However, can I simply compare paired end and single end data if I set the parameter isPairedEnd correctly?

And finally strandSpecific=2 means RF / R correct?

Thanks!

modified 18 months ago by Gordon Smyth38k • written 18 months ago by juls0
Answer: Create count matrix with featureCounts for stranded as well as reversely strande
4
18 months ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:

You don't have to count your PE and SE bam files separately. featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE. featureCounts will count number of reads pairs for your PE bam files and number of reads for your SE bam files, and it will return you a single matrix including all counts.

You can certainly compare your PE counts with your SE counts because they both represent the number of fragments from which reads were generated.

Regarding your last question, I think you are probably right. You may also check what sequencing protocol was used. If it is dUTP, then it is most likely to be reversely stranded and you should set strandSpecific=2. You can also tell from the counting result - if strandSpecific is incorrectly set you will get very little counts.

Thank you for this! If I have two different protocols (RF and FR), I would need to create two separate count matrices though? It would be nice if featureCount accepts a vector for strandSpecific so one could specify the protocol for each bam file.

1

Yes you would need to count the two types of reads separately and then combine them (it should be straightforward to combine them). But we will add support to allow strandSpecific to have a vector value.