Search
Question: Create count matrix with featureCounts for stranded as well as reversely stranded, singles as well as paired-end libraries
0
gravatar for juls
7 days ago by
juls0
juls0 wrote:

Hi all,

I am analysing some public stranded RNA-seq data. Most are paired-end. Some libraries are  fr-secondstrand (FR / F) and most are fr-firststrand (RF / R). I have now mapped all libraries with HISAT2 using the correct --rna-strandness parameter (RF, R, F, FR). Now I would like to use featureCount to create a count matrix. 

This leads to the question - How should I set these two arguments?

isPairedEnd  [F|T]

logical indicating if paired-end reads are used. If TRUE, fragments (templates or read pairs) will be counted instead of individual reads. FALSE by default.

strandSpecific [0|1|2] 

integer indicating if strand-specific read counting should be performed. It has three possible values: 0(unstranded), 1 (stranded) and 2 (reversely stranded). 0 by default.

My guess is I should create count matrices for the different data types separately and then merge them together. Is this correct?

However, can I simply compare paired end and single end data if I set the parameter isPairedEnd correctly?

And finally strandSpecific=2 means RF / R correct?

Thanks!

 

ADD COMMENTlink modified 6 days ago by Gordon Smyth33k • written 7 days ago by juls0
3
gravatar for Wei Shi
7 days ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

You don't have to count your PE and SE bam files separately. featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE. featureCounts will count number of reads pairs for your PE bam files and number of reads for your SE bam files, and it will return you a single matrix including all counts.

You can certainly compare your PE counts with your SE counts because they both represent the number of fragments from which reads were generated.

Regarding your last question, I think you are probably right. You may also check what sequencing protocol was used. If it is dUTP, then it is most likely to be reversely stranded and you should set strandSpecific=2. You can also tell from the counting result - if strandSpecific is incorrectly set you will get very little counts.

ADD COMMENTlink written 7 days ago by Wei Shi2.8k

Thank you for this! If I have two different protocols (RF and FR), I would need to create two separate count matrices though? It would be nice if featureCount accepts a vector for strandSpecific so one could specify the protocol for each bam file. 

 

ADD REPLYlink written 6 days ago by juls0
1

Yes you would need to count the two types of reads separately and then combine them (it should be straightforward to combine them). But we will add support to allow strandSpecific to have a vector value.

ADD REPLYlink written 3 days ago by Wei Shi2.8k

Thank you! this would be great.

ADD REPLYlink written 1 day ago by juls0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 188 users visited in the last hour