Potential segfault bug in featureCounts using long read data
1
0
Entering edit mode
Carlos • 0
@1a6d2168
Last seen 20 months ago
United Kingdom

Hi, I think I might have found a bug in featureCounts from Rsubread (v2.12.3). I am trying to find reads overlapping exon junctions from a personalised reference, using Nanopore long read BAMs.

I am afraid I cannot share fully reproducible code as I am using my own reference, but this is is what I run and what I get

#Get exon-specific read counts with featureCounts
library(Rsubread)
sample <- commandArgs(trailingOnly = TRUE)
gtf_file <- "personalised.gtf"

#Generate a simplified annotation format (SAF) object from the gtf to get only unique exonic regions
gtf_df <- read.table(gtf_file, sep = "\t", header = FALSE, stringsAsFactors = FALSE)

saf_df <- gtf_df[gtf_df$V3 == "exon", ]
#Keep columns of interest only
saf_df <- saf_df[, c(9, 1, 4, 5, 7)]
#Rename
colnames(saf_df) <- c("GeneID", "Chr",  "Start",    "End",  "Strand")
#Clean gene ID
saf_df$GeneID <- gsub(pattern = "(.+ gene_name )(.+)(;)",
                      replacement = "\\2",
                      x = saf_df$GeneID)

saf_df <- unique(saf_df)


featureCounts_out <- featureCounts(files = "full.bam",
                                   annot.ext = saf_df, useMetaFeatures = FALSE, isGTFAnnotationFile = FALSE,
                                   minOverlap = 3, isLongRead = TRUE,
                                   isPairedEnd = FALSE, juncCounts = TRUE,
                                   genome = "personalised_reference.fa",
                                   countMultiMappingReads = FALSE, allowMultiOverlap = TRUE)

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.12.3

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           full.bam             ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : R data.frame                                     ||
||      Dir for temp files : .                                                ||
||       Junction Counting : <output_file>.jcounts                            ||
||                 Threads : 1                                                ||
||                   Level : feature level                                    ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 3                                                ||
||          Long read mode : yes                                              ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid29999 ...         ||
||    Features : 46                                                           ||
||    Meta-features : 3                                                       ||
||    Chromosomes/contigs : 6                                                 ||
||                                                                            ||
|| Load FASTA contigs from personalised_reference.fa...                 ||
||    5 contigs were loaded                                                   ||
||                                                                            ||
|| Process BAM file full.bam...                   ||
[1]    29999 segmentation fault  R

Now, if I run it with the option juncCounts = FALSE, it does run, but I get the following error:

ERROR: sequence length in the BAM record is out of the expected region: 11207, 7084

From this post, I understand this means that a read is longer than the contig it is meant to align to. So I tried generating a bam without reads that long using

samtools view -h full.bam | awk 'length($10) < 11000 || $1 ~ /^@/' > test.bam

Using the bam without the long read then works with no issues, but I think it would be worth for future releases to either ignore such cases for junction counts or to output a more informative error.

Many thanks!

Carlos

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /flask/apps/eb/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsubread_2.12.3

loaded via a namespace (and not attached):
[1] compiler_4.2.0  Matrix_1.5-1    tools_4.2.0     grid_4.2.0     
[5] lattice_0.20-45
Bioconductor Rsubread featureCounts • 1.2k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 20 hours ago
Australia/Melbourne

This was caused by a bug in featureCounts when detecting exon junctions in long reads. We have fixed it and updated Rsubread. The new version should be available in a day or two.

ADD COMMENT

Login before adding your answer.

Traffic: 536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6