Entering edit mode
HI!
I've been trying to use RSubread to count alignments for a RNA-seq course. To get the .bam aligned and mapped file I used Trim Galore, STAR and Samtools in conda enviroment using Ubuntu for Windows. With this file I run the following in Rstudio (Windows 10) and get an error refering to an invalid EOF block. The thing is, i've checked this in samtools and every file I am using has a good EOF block. I apreciate any help to solve or understand this.
Thank you!
#### Directorio de trabajo #####
setwd("D:/Seq/Paired")
#### Achivos de entrada ####
# temp = list.files(pattern=".bam") preview
# temp
# View(temp)
bam.files <- list.files(path="D:/Seq/Paired/Sorted", pattern=".bam", full.names=TRUE, recursive=FALSE) # Creacion de lista
View(bam.files) # Ver lista de archivos del directorio (.BAM)
bam.files[12]
head(bam.files, n = 12L)
#### FeatureCount: Programa que cuenta las lecturas de los bam
mycount = list()
for (i in 1:12) {
mycount[[i]] <- featureCounts(files = bam.files[i],
annot.ext = "D:/Seq/Mus_musculus.GRCm39.104.chr.gff3", # Archivo de anotacion (GFF3, GFF,GTF)
isGTFAnnotationFile = TRUE,
GTF.attrType = "ID",
GTF.featureType = "mRNA",
isPairedEnd = TRUE,
tmpDir = "D:/Seq/temp/", # Crear una carpeta temporral previamente
nthreads = 6, # Numero de procesadores
nonSplitOnly = TRUE,
primaryOnly = TRUE,
splitOnly = FALSE)
}
> for (i in 1:12) {
+
+
+ mycount[[i]] <- featureCounts(files = bam.files[i],
+ annot.ext = "D:/Seq/Mus_musculus.GRCm39.104.chr.gff3", # Archivo de anotacion (GFF3, GFF,GTF)
+ isGTFAnnotationFile = TRUE,
+ GTF.attrType = "ID",
+ GTF.featureType = "mRNA",
+ isPairedEnd = TRUE,
+ tmpDir = "D:/Seq/temp/", # Crear una carpeta temporral previamente
+ nthreads = 6, # Numero de procesadores
+ nonSplitOnly = TRUE,
+ primaryOnly = TRUE,
+ splitOnly = FALSE)
+ }
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.8.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| Mm_NMV_Test_1_Sorted.bam ||
|| ||
|| Paired-end : yes ||
|| Count read pairs : yes ||
|| Annotation : Mus_musculus.GRCm39.104.chr.gff3 (GTF) ||
|| Dir for temp files : D:/Seq/temp/ ||
|| Threads : 6 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Split alignments : only exonic alignments ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Mus_musculus.GRCm39.104.chr.gff3 ... ||
|| Features : 66462 ||
|| Meta-features : 66462 ||
|| Chromosomes/contigs : 22 ||
|| ||
|| Process BAM file Mm_NMV_Test_1_Sorted.bam... ||
ERROR: the BAM input file, 'D:\Seq\Paired\Sorted\Mm_NMV_Test_1_Sorted.bam', doesn't have a valid EOF block.
|| ||
\\============================================================================//
No counts were generated.
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Chile.1252 LC_CTYPE=Spanish_Chile.1252 LC_MONETARY=Spanish_Chile.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Chile.1252
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocManager_1.30.16 BiocVersion_3.14.0 Rsubread_2.8.0 Rsamtools_2.10.0 Biostrings_2.62.0
[6] XVector_0.34.0 GenomicRanges_1.46.0 GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.0
[11] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] lattice_0.20-45 crayon_1.4.1 bitops_1.0-7 grid_4.1.1 zlibbioc_1.40.0
[6] rstudioapi_0.13 Matrix_1.3-4 BiocParallel_1.28.0 tools_4.1.1 RCurl_1.98-1.5
[11] parallel_4.1.1 compiler_4.1.1 GenomeInfoDbData_1.2.7
Usually that error means that you don't have a valid .bam file. Is there any chance it's really a sam file? Can you confirm that the bam finished being made without errors?
I have used samtools quickcheck command to verify th EOF block in my .bam files. I'm working with 6 files and everyoner has a good EOF block.
Thanks
Thanks, Nicolas.
Can you try this Linux/macOS command:
This will give the last hundreds of bytes in the BAM file, which should include the EOF bytes.
Also, when you generated the BAM files using Trim Galore, STAR and Samtools in conda, what was the last command that was used to create the BAM file?
Thanks Yang
this is what I get with the command
I've done the same with the others 5 files, and search for the expected EOF block. This is what I get (expected EOF taked from here https://sourceforge.net/p/samtools/mailman/samtools-help/thread/4EC52844.3090808@broadinstitute.org/)
This is the commands used with every program
Trim Galore
STAR
Samtools
Thanks!
Thanks for the detailed results!
It seems that the BAM files all have correct EOF tags. However the program for checking the tag is rather simple. It reads the last 28 bytes from the file and compares them with the known EOF tag. I didn't find problems in this part of the program.
I noticed that it was in Windows. We test Rsubread in Windows before every release (including the featureCounts function), but we haven't tested Rsubread on the sorting results of samtools in Windows. Is it possible to share the BAM file with us? I hope this can help us to reproduce this issue, therefore find the cause of it.
Thanks Yang.
I have no problem to share the BAM with you, I took the RNAseq packages from https://www.encodeproject.org/, the files with "release" status. I can upload to MEGA one of the .bam archive, the link is below. If you have a better way to share the procesed files with you I can try it.
Again Thanks for your help
https://mega.nz/file/UotlwCZK#V6b8Y2hWvJTI7b4IXrnq5ywFLoZ5T1_Sk1Rww2P6y1U
Thanks, Nicolas!
I found that the BAM file is indeed intact, but it was larger than 4GB. I then found that the mingw32 library in Windows (including its 64-bit version) has file I/O functions behaving differently to glibc. The file seeking function works incorrectly on files larger than 4GB in Windows, even if the relative offset is only 28 bytes. This was introduced in the 2.8.0 version of Rsubread. A new version (v2.8.1) has been released, using the 64-bit version of file seeking function in Windows to solve this problem.
I've intalled the new version and works like a charm.
Thanks for the assitance Yang.