Hi,
I am trying to use the Bioconductor package in R 3.6.3 on Windows 64-bit platform to analyse my RNA-seq data with Apis mellifera. I am facing difficulty from an error described below. I have checked the available GTF file, using readGFF, the gene_id comes in the 10th column. But that is how I have downloaded the file from:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/254/395/GCF003254395.2Amel_HAv3.1/.
No manipulation was done inside the GTF file. It would be nice to get some help Thanks in advance.
Command given: fc1=featureCounts("alignResultsPE.BAM", annot.ext = "GCF003254395.2AmelHAv3.1genomic.gtf", isPairedEnd = TRUE, isGTFAnnotationFile=TRUE, requireBothEndsMapped=TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id")
Error coming:
========== _ _ _ _ _
===== / _| | | | _ \| _ \| _| /\ | _ \
===== | (_ | | | | |) | |) | |_ / \ | | | |
==== __ \| | | | _ <| _ /| | / /\ \ | | | |
==== _) | || | |) | | \ \| | / _ \| || |
========== |/ _/|_/|| __// ____/
Rsubread 2.0.1
//========================== featureCounts setting ===========================\ || || || Input files : 1 BAM file || || o alignResultsPE.BAM || || || || Annotation : GCF003254395.2AmelHAv3.1genomic.gtf (GTF) || || Dir for temp files : . || || Threads : 1 || || Level : meta-feature level || || Paired-end : yes || || Multimapping reads : counted || || Multi-overlapping reads : not counted || || Min overlapping bases : 1 || || || || Chimeric reads : counted || || Both ends mapped : required || || || \============================================================================//
//================================= Running ==================================\ || || || Load annotation file GCF003254395.2AmelHAv3.1genomic.gtf ... ||
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'geneid' An example of attributes included in your GTF annotation is 'geneid ""; transcriptid "unknowntranscript1"; anticodon "(pos:31..33)"; gbkey "tRNA"; product "tRNA-Glu"; exonnumber "1"; ' The program has to terminate.
Error in featureCounts("alignResultsPE.BAM", annot.ext = "GCF003254395.2AmelHAv3.1genomic.gtf", : No counts were generated.
sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale: [1] English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] rtracklayer1.46.0 GenomicRanges1.38.0 GenomeInfoDb1.22.0 IRanges2.20.2
[5] S4Vectors0.24.3 BiocGenerics0.32.0 edgeR3.28.1 limma3.42.2
[9] Rsubread_2.0.1
loaded via a namespace (and not attached):
[1] Rcpp1.0.4 XVector0.26.0 zlibbioc1.32.0
[4] GenomicAlignments1.22.1 BiocParallel1.20.1 lattice0.20-40
[7] tools3.6.3 SummarizedExperiment1.16.1 grid3.6.3
[10] Biobase2.46.0 matrixStats0.56.0 yaml2.2.1
[13] Matrix1.2-18 GenomeInfoDbData1.2.2 BiocManager1.30.10
[16] bitops1.0-6 RCurl1.98-1.1 DelayedArray0.12.2
[19] compiler3.6.3 Biostrings2.54.0 Rsamtools2.2.3
[22] XML3.99-0.3 locfit_1.5-9.2
It appears that your annotation contains a line that has an empty gene_id value (""), which isn't allowed in featureCounts.
Sorry for the delay in communicating back. I have re-checked the available GTF file, using readGFF, the geneid comes in the 9th column, I misread it to be in the 10th column. But the problem persists. You said that it appeared that my annotation contained a line that has an empty geneid value(""), which isn't allowed in featureCounts. I checked the GTF file thoroughly, there are 335791 entries under each column. I tried the same in command line version of featureCounts too. The same problem is coming out. Is there a way of solving it? Thanks in advance