featureCounts Parameters for Exon Junctions
1
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 6 days ago
Australia

I am interested in counting the number of reads that span splice junctions. Therefore, I set the options GTF.featureType = "transcript", useMetaFeatures = FALSE, and juncCounts = TRUE. However, the resulting counts matrix has row names like ENSG00000223972.5_1 and many of them are duplicated. What combination of parameters should be used to make the row names be transcript identifiers? Also, the counts_junction data frame has no feature identifiers, just NAs in the first two columns with these settings.

Also, the annotation element of the result is a data frame with a column named GeneID. This isn't the a suitable column name if GTF.attrType is not "gene_id".

2
Entering edit mode
Wei Shi ★ 3.3k
@wei-shi-2183
Last seen 2 days ago
Australia/Melbourne/Olivia Newton-John …

featureCounts uses GTF.featureType and GTF.attrType parameters to get meta features (eg. genes) and the features belonging to the same meta feature. It then uses the feature coordinates to work out the start and end locations of each meta feature. A junction is assigned to a meta feature if its donor or receptor site falls within the genomic span of the meta feature.

Junction counting is always performed at meta feature level (value of useMetaFeatures parameter is ignored). If you want to count junction reads for transcripts, you will need to set GTF.attrType to transcript.

However, you need to be aware of the ambiguity in assigning junctions to transcripts since transcripts from the same gene often has a lot of overlap. I would suggest you to count junctions to genes instead of transcripts. To do this, you need to set GTF.featureType to exon and GTF.attrType to gene. Default values of these two parameters should work well for most GTF annotations.

0
Entering edit mode

I would manually filter any junctions that are present it 2 or more transcripts after the processing is done.