Question

Create count matrix with featureCounts for stranded as well as reversely stranded, singles as well as paired-end libraries

0

Entering edit mode

juls • 0

@juls-11275

Last seen 3.7 years ago

Austria

Hi all,

I am analysing some public stranded RNA-seq data. Most are paired-end. Some libraries are fr-secondstrand (FR / F) and most are fr-firststrand (RF / R). I have now mapped all libraries with HISAT2 using the correct --rna-strandness parameter (RF, R, F, FR). Now I would like to use featureCount to create a count matrix.

This leads to the question - How should I set these two arguments?

isPairedEnd [F|T]

logical indicating if paired-end reads are used. If TRUE, fragments (templates or read pairs) will be counted instead of individual reads. FALSE by default.

strandSpecific [0|1|2]

integer indicating if strand-specific read counting should be performed. It has three possible values: 0(unstranded), 1 (stranded) and 2 (reversely stranded). 0 by default.

My guess is I should create count matrices for the different data types separately and then merge them together. Is this correct?

However, can I simply compare paired end and single end data if I set the parameter isPairedEnd correctly?

And finally strandSpecific=2 means RF / R correct?

Thanks!

rsubread • 5.8k views

ADD COMMENT • link updated 4.4 years ago by bjwiley23 • 0 • written 7.0 years ago by juls • 0

score 4 · Accepted Answer · 2018-04-12

4

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 1 day ago

Australia/Melbourne

You don't have to count your PE and SE bam files separately. featureCounts automatically detects if your bam files contain SE or PE reads. You can provide all your bam files to featureCounts in one go and set isPairedEnd=TRUE. featureCounts will count number of reads pairs for your PE bam files and number of reads for your SE bam files, and it will return you a single matrix including all counts.

You can certainly compare your PE counts with your SE counts because they both represent the number of fragments from which reads were generated.

Regarding your last question, I think you are probably right. You may also check what sequencing protocol was used. If it is dUTP, then it is most likely to be reversely stranded and you should set strandSpecific=2. You can also tell from the counting result - if strandSpecific is incorrectly set you will get very little counts.

ADD COMMENT • link 7.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thank you for this! If I have two different protocols (RF and FR), I would need to create two separate count matrices though? It would be nice if featureCount accepts a vector for strandSpecific so one could specify the protocol for each bam file.

ADD REPLY • link 7.0 years ago juls • 0

1

Entering edit mode

Yes you would need to count the two types of reads separately and then combine them (it should be straightforward to combine them). But we will add support to allow strandSpecific to have a vector value.

ADD REPLY • link 7.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thank you! this would be great.

ADD REPLY • link 7.0 years ago juls • 0

0

Entering edit mode

I have a question that is similar but not really. I have alignments from a pair-end library (GSE51338 from the Galaxy tutorial) and when I use Rsubread with isPairedEnd = T but strandSpecific = 0 I barely get any alignments (hisat2) assigned, like average < 5% for each pair. However I can run with strandSpecific = 1 and get around 40-60% assigned or strandSpecific = 2 and again get like 40-60% assigned but to completely different set of transcripts than with strandSpecific=1. The tutorial uses a much older version of the command-line, Subread version 1.6.4 in Galaxy and this version allows submitting a paired-end read alignment without need to pass '-p' parameter which I am assuming was fixed in later versions. So how do we deal will being able to obtain the assignment for transcripts for both strandSpecific 1 and 2 together?

ADD REPLY • link 4.4 years ago bjwiley23 • 0

0

Entering edit mode

As I read another post about "messed" up annotation files, I tried rerunning with a standard RefSeq GTF file as opposed to previously running featurecounts of a GffCompare GTF file from downstream Stringtie Assembly and now I can reads in both directions using stranded=0. I am wondering if you have heard of this happening using Stringtie prior to featureCounts?

ADD REPLY • link 4.4 years ago bjwiley23 • 0