Currently, I would like to calculate exon counts using package Rsubread 1.32.0, function featurecounts. Inputs:
- Bam file based on paired end RNA sequencing.
- Annotation file in GTF format, downloaded from GenCode v27.
More details about Bam: - Created based on uBam. - Aligned with Star - Unmapped reads are kept in/ added to bam - File is validated multiple times with Picard.ValidateSamFile - Duplicates are flagged - Gatk.SplitNCigarReads - Gatk.BaseRecalibrator - Gatk.ApplyBQSR
From my point of view, I should be able to run following command:
featureCounts( files=inputfile, annot.ext=annotationfile, isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "exon_id", isPairedEnd = TRUE)
In theory, this would result in counts per exon (defined by GTF.featureType; column 3 of gtf file) and mapped to exon_id as metafeature (defined by GTF.attrType; column 9 of gtf file). However, I receive an ERROR:
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'exonid' An example of attributes included in your GTF annotation is 'geneid "IGH.g@"; transcriptid "IGH.t@"; genename "IGH@";' The program has to terminate.
There is a parameter called useMetaFeatures where I can disable to metafeatures. Provided annotation as part of the results is geneid and not exonid, so I have to provide a metafeature to specify exon_id.
I also tried the commandline package version, namely subread and had the same problem for subread v1.6.4. It is working with subread v1.5.2 with the exact same inputs and parameters as specified above. I was wondering whether the functionality has changed on purpose or that this is a bug?
Thanks in advance,
Ellen de Jong.
> sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /hpc/local/CentOS7/common/lang/R/3.5.1/lib64/R/lib/libRblas.so LAPACK: /hpc/local/CentOS7/common/lang/R/3.5.1/lib64/R/lib/libRlapack.so locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8  LC_PAPER=en_US.UTF-8 LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  edgeR_3.24.3 limma_3.38.3 Rsubread_1.32.0 loaded via a namespace (and not attached):  compiler_3.5.1 tools_3.5.1 Rcpp_1.0.1 grid_3.5.1  locfit_1.5-9.1 lattice_0.20-38