Question: featurecounts Rsubread ERROR on exon_id
0
gravatar for edejong
6 months ago by
edejong0
edejong0 wrote:

Dear,

Currently, I would like to calculate exon counts using package Rsubread 1.32.0, function featurecounts. Inputs:

  1. Bam file based on paired end RNA sequencing.
  2. Annotation file in GTF format, downloaded from GenCode v27.

More details about Bam: - Created based on uBam. - Aligned with Star - Unmapped reads are kept in/ added to bam - File is validated multiple times with Picard.ValidateSamFile - Duplicates are flagged - Gatk.SplitNCigarReads - Gatk.BaseRecalibrator - Gatk.ApplyBQSR

From my point of view, I should be able to run following command:

featureCounts(
  files=inputfile, 
  annot.ext=annotationfile, 
  isGTFAnnotationFile = TRUE,
  GTF.featureType = "exon", 
  GTF.attrType = "exon_id",
  isPairedEnd = TRUE)

In theory, this would result in counts per exon (defined by GTF.featureType; column 3 of gtf file) and mapped to exon_id as metafeature (defined by GTF.attrType; column 9 of gtf file). However, I receive an ERROR:

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'exonid' An example of attributes included in your GTF annotation is 'geneid "IGH.g@"; transcriptid "IGH.t@"; genename "IGH@";' The program has to terminate.

There is a parameter called useMetaFeatures where I can disable to metafeatures. Provided annotation as part of the results is geneid and not exonid, so I have to provide a metafeature to specify exon_id.

I also tried the commandline package version, namely subread and had the same problem for subread v1.6.4. It is working with subread v1.5.2 with the exact same inputs and parameters as specified above. I was wondering whether the functionality has changed on purpose or that this is a bug?

Thanks in advance,

Ellen de Jong.

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /hpc/local/CentOS7/common/lang/R/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /hpc/local/CentOS7/common/lang/R/3.5.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.24.3    limma_3.38.3    Rsubread_1.32.0

loaded via a namespace (and not attached):
[1] compiler_3.5.1  tools_3.5.1     Rcpp_1.0.1      grid_3.5.1     
[5] locfit_1.5-9.1  lattice_0.20-38
ADD COMMENTlink modified 6 months ago by Wei Shi3.2k • written 6 months ago by edejong0

It also works in an earlier version of Rsubread:

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /hpc/local/CentOS7/common/lang/R/3.4.1/lib64/R/lib/libRblas.so
LAPACK: /hpc/local/CentOS7/common/lang/R/3.4.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.20.9    limma_3.34.9    Rsubread_1.28.1

loaded via a namespace (and not attached):
[1] compiler_3.4.1  tools_3.4.1     Rcpp_0.12.14    grid_3.4.1      locfit_1.5-9.1 
[6] lattice_0.20-35
ADD REPLYlink written 6 months ago by edejong0
Answer: featurecounts Rsubread ERROR on exon_id
1
gravatar for Wei Shi
6 months ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:

The error message tells you that 'exonid' is not a valid attribute and this resulted in the termination of featureCounts. You should use geneid instead for your exon-level counting. Also you should specify useMetaFeatures=FALSE so that the counting is performed at the feature level (ie exon level). Below is the command that should work for your counting:

featureCounts( files=inputfile, annot.ext=annotationfile, isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "geneid", isPairedEnd = TRUE, useMetaFeatures=FALSE)

ADD COMMENTlink written 6 months ago by Wei Shi3.2k

Thanks. Probably need to use GTF.attrType.extra to retrieve the exon_ids as part of the annotation.

However, this does not result in counts table with exon_ids. I find it quite dangerous to assume that the order of the count table is in the same order as the annotation table.

ADD REPLYlink modified 6 months ago • written 6 months ago by edejong0

As I mentioned in my reply exon_ids is not an attribute included in your annotation, so you should not use it in any parameters you provided to featureCounts.

ADD REPLYlink written 6 months ago by Wei Shi3.2k

Sorry to bother you again, but I am not sure if I understand it completely. When I open my annotation file, I do see 'exon_id' as one of the fields, when 'exon' is stated in column 3. So than I would assume I can use this. And as specified before, it is working that way in version 1.28.1

I would like to receive counts in following format (random example):

ENSE00000000001.1    154314
ENSE00000000002.1     235452
ENSE00000000003.1     35542

description: evidence-based annotation of the human genome (GRCh38), version 27 (Ensembl 90) provider: GENCODE contact: gencode-help@sanger.ac.uk format: gtf date: 2017-08-01

chr1 HAVANA gene 11869 14409 . + . geneid "ENSG00000223972.5"; genetype "transcribedunprocessedpseudogene"; genename "DDX11L1"; level 2; havanagene "OTTHUMG00000000961.2"; chr1 HAVANA transcript 11869 14409 . + . geneid "ENSG00000223972.5"; transcriptid "ENST00000456328.2"; genetype "transcribedunprocessedpseudogene"; genename "DDX11L1"; transcripttype "processedtranscript"; transcriptname "DDX11L1-202"; level 2; transcriptsupportlevel "1"; tag "basic"; havanagene "OTTHUMG00000000961.2"; havanatranscript "OTTHUMT00000362751.1"; chr1 HAVANA exon 11869 12227 . + . geneid "ENSG00000223972.5"; transcriptid "ENST00000456328.2"; genetype "transcribedunprocessedpseudogene"; genename "DDX11L1"; transcripttype "processedtranscript"; transcriptname "DDX11L1-202"; exonnumber 1; *exonid "ENSE00002234944.1"*; level 2; transcriptsupportlevel "1"; tag "basic"; havanagene "OTTHUMG00000000961.2"; havanatranscript "OTTHUMT00000362751.1";

ADD REPLYlink modified 6 months ago • written 6 months ago by edejong0
1

There is an attribute type exonid in the few annotation lines you provided but not 'exon_id'. featureCounts from any previous versions of Rsubread releases won't work on 'exon_id'.

featureCounts also reported in its error message that the 9th column of the first CDS feature in your annotation contains geneid "IGH.g@"; transcriptid "IGH.t@"; genename "IGH@";, which doesn't include either exonid or 'exon_id'.

ADD REPLYlink modified 6 months ago • written 6 months ago by Wei Shi3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 116 users visited in the last hour