Hi all,
I am analysing some public stranded RNA-seq data. Most are paired-end. Some libraries are fr-secondstrand (FR / F) and most are fr-firststrand (RF / R). I have now mapped all libraries with HISAT2 using the correct --rna-strandness parameter (RF, R, F, FR). Now I would like to use featureCount to create a count matrix.
This leads to the question - How should I set these two arguments?
isPairedEnd [F|T]
logical indicating if paired-end reads are used. If TRUE
, fragments (templates or read pairs) will be counted instead of individual reads. FALSE
by default.
strandSpecific [0|1|2]
integer indicating if strand-specific read counting should be performed. It has three possible values: 0(unstranded), 1 (stranded) and 2 (reversely stranded). 0 by default.
My guess is I should create count matrices for the different data types separately and then merge them together. Is this correct?
However, can I simply compare paired end and single end data if I set the parameter isPairedEnd correctly?
And finally strandSpecific=2 means RF / R correct?
Thanks!
Thank you for this! If I have two different protocols (RF and FR), I would need to create two separate count matrices though? It would be nice if featureCount accepts a vector for
strandSpecific
so one could specify the protocol for each bam file.Yes you would need to count the two types of reads separately and then combine them (it should be straightforward to combine them). But we will add support to allow
strandSpecific
to have a vector value.Thank you! this would be great.
I have a question that is similar but not really. I have alignments from a pair-end library (GSE51338 from the Galaxy tutorial) and when I use Rsubread with
isPairedEnd = T
butstrandSpecific = 0
I barely get any alignments (hisat2) assigned, like average < 5% for each pair. However I can run withstrandSpecific = 1
and get around 40-60% assigned orstrandSpecific = 2
and again get like 40-60% assigned but to completely different set of transcripts than withstrandSpecific=1
. The tutorial uses a much older version of the command-line, Subread version 1.6.4 in Galaxy and this version allows submitting a paired-end read alignment without need to pass '-p' parameter which I am assuming was fixed in later versions. So how do we deal will being able to obtain the assignment for transcripts for both strandSpecific 1 and 2 together?As I read another post about "messed" up annotation files, I tried rerunning with a standard RefSeq GTF file as opposed to previously running featurecounts of a GffCompare GTF file from downstream Stringtie Assembly and now I can reads in both directions using stranded=0. I am wondering if you have heard of this happening using Stringtie prior to featureCounts?