Create count matrix with featureCounts for stranded as well as reversely stranded, singles as well as paired-end libraries
1
0
Entering edit mode
juls • 0
@juls-11275
Last seen 3.3 years ago
Austria

Hi all,

I am analysing some public stranded RNA-seq data. Most are paired-end. Some libraries are  fr-secondstrand (FR / F) and most are fr-firststrand (RF / R). I have now mapped all libraries with HISAT2 using the correct --rna-strandness parameter (RF, R, F, FR). Now I would like to use featureCount to create a count matrix. 

This leads to the question - How should I set these two arguments?

isPairedEnd  [F|T]

logical indicating if paired-end reads are used. If TRUE, fragments (templates or read pairs) will be counted instead of individual reads. FALSE by default.

strandSpecific [0|1|2] 

integer indicating if strand-specific read counting should be performed. It has three possible values: 0(unstranded), 1 (stranded) and 2 (reversely stranded). 0 by default.

My guess is I should create count matrices for the different data types separately and then merge them together. Is this correct?

However, can I simply compare paired end and single end data if I set the parameter isPairedEnd correctly?

And finally strandSpecific=2 means RF / R correct?

Thanks!

 

rsubread • 5.1k views
ADD COMMENT
4
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 months ago
Australia/Melbourne/Olivia Newton-John …

You don't have to count your PE and SE bam files separately. featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE. featureCounts will count number of reads pairs for your PE bam files and number of reads for your SE bam files, and it will return you a single matrix including all counts.

You can certainly compare your PE counts with your SE counts because they both represent the number of fragments from which reads were generated.

Regarding your last question, I think you are probably right. You may also check what sequencing protocol was used. If it is dUTP, then it is most likely to be reversely stranded and you should set strandSpecific=2. You can also tell from the counting result - if strandSpecific is incorrectly set you will get very little counts.

ADD COMMENT
0
Entering edit mode

Thank you for this! If I have two different protocols (RF and FR), I would need to create two separate count matrices though? It would be nice if featureCount accepts a vector for strandSpecific so one could specify the protocol for each bam file. 

 

ADD REPLY
1
Entering edit mode

Yes you would need to count the two types of reads separately and then combine them (it should be straightforward to combine them). But we will add support to allow strandSpecific to have a vector value.

ADD REPLY
0
Entering edit mode

Thank you! this would be great.

ADD REPLY
0
Entering edit mode

I have a question that is similar but not really. I have alignments from a pair-end library (GSE51338 from the Galaxy tutorial) and when I use Rsubread with isPairedEnd = T but strandSpecific = 0 I barely get any alignments (hisat2) assigned, like average < 5% for each pair. However I can run with strandSpecific = 1 and get around 40-60% assigned or strandSpecific = 2 and again get like 40-60% assigned but to completely different set of transcripts than with strandSpecific=1. The tutorial uses a much older version of the command-line, Subread version 1.6.4 in Galaxy and this version allows submitting a paired-end read alignment without need to pass '-p' parameter which I am assuming was fixed in later versions. So how do we deal will being able to obtain the assignment for transcripts for both strandSpecific 1 and 2 together?

ADD REPLY
0
Entering edit mode

As I read another post about "messed" up annotation files, I tried rerunning with a standard RefSeq GTF file as opposed to previously running featurecounts of a GffCompare GTF file from downstream Stringtie Assembly and now I can reads in both directions using stranded=0. I am wondering if you have heard of this happening using Stringtie prior to featureCounts?

ADD REPLY

Login before adding your answer.

Traffic: 450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6