Question: featureCounts Parameters for Exon Junctions
gravatar for Dario Strbenac
15 months ago by
Dario Strbenac1.4k
Dario Strbenac1.4k wrote:

I am interested in counting the number of reads that span splice junctions. Therefore, I set the options GTF.featureType = "transcript", useMetaFeatures = FALSE, and juncCounts = TRUE. However, the resulting counts matrix has row names like ENSG00000223972.5_1 and many of them are duplicated. What combination of parameters should be used to make the row names be transcript identifiers? Also, the counts_junction data frame has no feature identifiers, just NAs in the first two columns with these settings.

Also, the annotation element of the result is a data frame with a column named GeneID. This isn't the a suitable column name if GTF.attrType is not "gene_id".

ADD COMMENTlink modified 15 months ago by Wei Shi2.7k • written 15 months ago by Dario Strbenac1.4k
gravatar for Wei Shi
15 months ago by
Wei Shi2.7k
Wei Shi2.7k wrote:

featureCounts uses GTF.featureType and GTF.attrType parameters to get meta features (eg. genes) and the features belonging to the same meta feature. It then uses the feature coordinates to work out the start and end locations of each meta feature. A junction is assigned to a meta feature if its donor or receptor site falls within the genomic span of the meta feature.

Junction counting is always performed at meta feature level (value of useMetaFeatures parameter is ignored). If you want to count junction reads for transcripts, you will need to set GTF.attrType to transcript.

However, you need to be aware of the ambiguity in assigning junctions to transcripts since transcripts from the same gene often has a lot of overlap. I would suggest you to count junctions to genes instead of transcripts. To do this, you need to set GTF.featureType to exon and GTF.attrType to gene. Default values of these two parameters should work well for most GTF annotations. 

ADD COMMENTlink written 15 months ago by Wei Shi2.7k

I would manually filter any junctions that are present it 2 or more transcripts after the processing is done.

ADD REPLYlink written 15 months ago by Dario Strbenac1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 263 users visited in the last hour