Help with this in Rstudio - ERROR: the BAM input file, 'sample.bam', doesn't have a valid EOF block.
0
0
Entering edit mode
@5c7ba598
Last seen 3.1 years ago
Chile

HI!

I've been trying to use RSubread to count alignments for a RNA-seq course. To get the .bam aligned and mapped file I used Trim Galore, STAR and Samtools in conda enviroment using Ubuntu for Windows. With this file I run the following in Rstudio (Windows 10) and get an error refering to an invalid EOF block. The thing is, i've checked this in samtools and every file I am using has a good EOF block. I apreciate any help to solve or understand this.

Thank you!

#### Directorio de trabajo #####
setwd("D:/Seq/Paired")

#### Achivos de entrada ####

# temp = list.files(pattern=".bam") preview
# temp
# View(temp)

bam.files <- list.files(path="D:/Seq/Paired/Sorted", pattern=".bam", full.names=TRUE, recursive=FALSE) # Creacion de lista 
View(bam.files) # Ver lista de archivos del directorio (.BAM)
bam.files[12]
head(bam.files, n = 12L)

#### FeatureCount: Programa que cuenta las lecturas de los bam

mycount = list()

for (i in 1:12) {


  mycount[[i]] <- featureCounts(files = bam.files[i],
                                annot.ext = "D:/Seq/Mus_musculus.GRCm39.104.chr.gff3", # Archivo de anotacion (GFF3, GFF,GTF)
                                isGTFAnnotationFile = TRUE, 
                                GTF.attrType = "ID", 
                                GTF.featureType = "mRNA", 
                                isPairedEnd = TRUE, 
                                tmpDir = "D:/Seq/temp/", # Crear una carpeta temporral previamente
                                nthreads = 6, # Numero de procesadores
                                nonSplitOnly = TRUE,
                                primaryOnly = TRUE,
                                splitOnly = FALSE)
}


 > for (i in 1:12) {
+   
+   
+   mycount[[i]] <- featureCounts(files = bam.files[i],
+                                 annot.ext = "D:/Seq/Mus_musculus.GRCm39.104.chr.gff3", # Archivo de anotacion (GFF3, GFF,GTF)
+                                 isGTFAnnotationFile = TRUE, 
+                                 GTF.attrType = "ID", 
+                                 GTF.featureType = "mRNA", 
+                                 isPairedEnd = TRUE, 
+                                 tmpDir = "D:/Seq/temp/", # Crear una carpeta temporral previamente
+                                 nthreads = 6, # Numero de procesadores
+                                 nonSplitOnly = TRUE,
+                                 primaryOnly = TRUE,
+                                 splitOnly = FALSE)
+ }

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.8.0

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           Mm_NMV_Test_1_Sorted.bam                         ||
||                                                                            ||
||              Paired-end : yes                                              ||
||        Count read pairs : yes                                              ||
||              Annotation : Mus_musculus.GRCm39.104.chr.gff3 (GTF)           ||
||      Dir for temp files : D:/Seq/temp/                                     ||
||                 Threads : 6                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
||     Multiple alignments : primary alignment only                           ||
|| Multi-overlapping reads : not counted                                      ||
||        Split alignments : only exonic alignments                           ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Mus_musculus.GRCm39.104.chr.gff3 ...                  ||
||    Features : 66462                                                        ||
||    Meta-features : 66462                                                   ||
||    Chromosomes/contigs : 22                              ||
||                                                                            ||
|| Process BAM file Mm_NMV_Test_1_Sorted.bam...                               ||
ERROR: the BAM input file, 'D:\Seq\Paired\Sorted\Mm_NMV_Test_1_Sorted.bam', doesn't have a valid EOF block.
||                                                                            ||
\\============================================================================//

No counts were generated.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Chile.1252  LC_CTYPE=Spanish_Chile.1252    LC_MONETARY=Spanish_Chile.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Chile.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocManager_1.30.16  BiocVersion_3.14.0   Rsubread_2.8.0       Rsamtools_2.10.0     Biostrings_2.62.0   
 [6] XVector_0.34.0       GenomicRanges_1.46.0 GenomeInfoDb_1.30.0  IRanges_2.28.0       S4Vectors_0.32.0    
[11] BiocGenerics_0.40.0 

loaded via a namespace (and not attached):
 [1] lattice_0.20-45        crayon_1.4.1           bitops_1.0-7           grid_4.1.1             zlibbioc_1.40.0       
 [6] rstudioapi_0.13        Matrix_1.3-4           BiocParallel_1.28.0    tools_4.1.1            RCurl_1.98-1.5        
[11] parallel_4.1.1         compiler_4.1.1         GenomeInfoDbData_1.2.7
Rsubread • 4.1k views
ADD COMMENT
0
Entering edit mode

Usually that error means that you don't have a valid .bam file. Is there any chance it's really a sam file? Can you confirm that the bam finished being made without errors?

ADD REPLY
0
Entering edit mode

I have used samtools quickcheck command to verify th EOF block in my .bam files. I'm working with 6 files and everyoner has a good EOF block.

samtools quickcheck -vvv *.bam

verbosity set to 3
checking Mm_NMV_Test_1_Sorted.bam
opened Mm_NMV_Test_1_Sorted.bam
Mm_NMV_Test_1_Sorted.bam is sequence data
Mm_NMV_Test_1_Sorted.bam has 61 targets in header.
Mm_NMV_Test_1_Sorted.bam has good EOF block.
checking Mm_NMV_Test_2_Sorted.bam
opened Mm_NMV_Test_2_Sorted.bam
Mm_NMV_Test_2_Sorted.bam is sequence data
Mm_NMV_Test_2_Sorted.bam has 61 targets in header.
Mm_NMV_Test_2_Sorted.bam has good EOF block.
checking Mm_NMV_Test_3_Sorted.bam
opened Mm_NMV_Test_3_Sorted.bam
Mm_NMV_Test_3_Sorted.bam is sequence data
Mm_NMV_Test_3_Sorted.bam has 61 targets in header.
Mm_NMV_Test_3_Sorted.bam has good EOF block.
checking Mm_NMV_Test_4_Sorted.bam
opened Mm_NMV_Test_4_Sorted.bam
Mm_NMV_Test_4_Sorted.bam is sequence data
Mm_NMV_Test_4_Sorted.bam has 61 targets in header.
Mm_NMV_Test_4_Sorted.bam has good EOF block.
checking Mm_NMV_Test_5_Sorted.bam
opened Mm_NMV_Test_5_Sorted.bam
Mm_NMV_Test_5_Sorted.bam is sequence data
Mm_NMV_Test_5_Sorted.bam has 61 targets in header.
Mm_NMV_Test_5_Sorted.bam has good EOF block.
checking Mm_NMV_Test_6_Sorted.bam
opened Mm_NMV_Test_6_Sorted.bam
Mm_NMV_Test_6_Sorted.bam is sequence data
Mm_NMV_Test_6_Sorted.bam has 61 targets in header.
Mm_NMV_Test_6_Sorted.bam has good EOF block.

Thanks

ADD REPLY
0
Entering edit mode

Thanks, Nicolas.

Can you try this Linux/macOS command:

hexdump -C Mm_NMV_Test_1_Sorted.bam  | tail -n 50

This will give the last hundreds of bytes in the BAM file, which should include the EOF bytes.

Also, when you generated the BAM files using Trim Galore, STAR and Samtools in conda, what was the last command that was used to create the BAM file?

ADD REPLY
0
Entering edit mode

Thanks Yang

this is what I get with the command

hexdump -C Mm_NMV_Test_1_Sorted.bam  | tail -n 50

5f6fa38f0  a3 bb f1 c2 66 ed cb 58  fb db 4e 97 17 bd 9f e8  |....f..X..N.....|
5f6fa3900  31 da 11 bf a7 23 43 11  13 04 56 49 96 63 b4 1a  |1....#C...VI.c..|
5f6fa3910  c1 34 15 a0 da c9 f4 eb  49 e2 8a 5a ec 01 af cf  |.4......I..Z....|
5f6fa3920  19 c6 c6 a3 04 97 2b 98  01 06 0c 41 14 ec 31 9a  |......+....A..1.|
5f6fa3930  06 92 9a dc 25 ca 1d 82  20 50 fb 43 f1 6b 05 e8  |....%... P.C.k..|
5f6fa3940  82 20 07 b0 3d 19 76 89  9d d3 c1 e3 85 79 e2 f8  |. ..=.v......y..|
5f6fa3950  4d 2c b2 09 50 01 99 63  7f 02 82 b6 e5 c8 fd 0d  |M,..P..c........|
5f6fa3960  1d f3 df 36 2c ee f6 15  65 5d 1b af 6b 27 e8 90  |...6,...e]..k'..|
5f6fa3970  3f 43 31 df 1b 16 94 ab  15 be 2a b8 9a ed 43 f0  |?C1.......*...C.|
5f6fa3980  3a 19 e3 56 61 e2 6a a2  59 89 e8 e2 f1 c3 c7 c7  |:..Va.j.Y.......|
5f6fa3990  85 f8 4b 51 81 53 0c 47  60 98 c8 ac 30 92 6e d6  |..KQ.S.G`...0.n.|
5f6fa39a0  ee c6 c4 54 55 4b e0 d7  e6 44 51 47 ba be 2c 5c  |...TUK...DQG..,\|
5f6fa39b0  db f5 f3 bc b5 27 76 fd  dc 73 a6 d7 fb bb 4e 37  |.....'v..s....N7|
5f6fa39c0  f6 10 9c 83 83 3a 1e 95  88 92 8f 49 48 e8 a5 fa  |.....:.....IH...|
5f6fa39d0  66 33 6b 04 33 f2 f1 99  35 34 0f 05 15 7a 98 1b  |f3k.3...54...z..|
5f6fa39e0  0c 00 04 34 e2 10 db 68  4a 6d 21 34 3a 12 d3 4f  |...4...hJm!4:..O|
5f6fa39f0  e8 97 a7 10 62 62 1a be  07 72 21 fe 3e a6 be 03  |....bb...r!.>...|
5f6fa3a00  62 f1 78 1a b8 92 b6 98  76 c0 45 7a 30 45 96 00  |b.x.....v.Ez0E..|
5f6fa3a10  a6 4b c4 23 26 18 40 b8  66 81 fe 84 fc ec 95 2d  |.K.#&.@.f......-|
5f6fa3a20  7f b6 55 b1 ed fa 89 17  1e 85 d3 c9 40 37 6a 53  |..U.........@7jS|
5f6fa3a30  7a 06 c2 6c 56 81 36 e2  1c 5f 7f a8 02 b7 04 13  |z..lV.6.._......|
5f6fa3a40  28 e6 ba 88 7a 88 cb 49  b7 c1 ae fd 07 bc 37 56  |(...z..I......7V|
5f6fa3a50  f1 eb 4e 28 de 99 d9 4b  94 17 21 dc 47 9a 00 ec  |..N(...K..!.G...|
5f6fa3a60  27 ae de 79 e6 fc f1 4b  e3 16 fe a1 b5 19 d6 9e  |'..y...K........|
5f6fa3a70  de e9 c0 1c 61 e8 16 b5  20 50 99 13 94 2f e0 0e  |....a... P.../..|
5f6fa3a80  32 a0 11 6c 43 95 29 e0  22 83 82 47 bd c8 70 1f  |2..lC.)."..G..p.|
5f6fa3a90  40 49 11 92 a2 d7 bb 1c  81 c0 42 d5 35 30 13 ce  |@I........B.50..|
5f6fa3aa0  f6 2d ea 37 c2 d7 f0 06  78 e9 a8 14 e0 17 26 20  |.-.7....x.....& |
5f6fa3ab0  f3 c5 93 0e c9 60 99 e6  b9 73 7d b4 d4 7a ea c0  |.....`...s}..z..|
5f6fa3ac0  77 6c ea eb 89 7c 8b 08  a1 51 cb b0 3a 59 ff 17  |wl...|...Q..:Y..|
5f6fa3ad0  04 ab 7b 17 d4 20 d5 56  17 e2 4d 52 f7 be 4e 97  |..{.. .V..MR..N.|
5f6fa3ae0  bf 51 8d cc ee 2b 04 fa  80 80 8e 55 78 f6 0f 3f  |.Q...+.....Ux..?|
5f6fa3af0  e4 27 29 11 3d 11 5a e3  62 8a 67 6c 12 06 e1 68  |.').=.Z.b.gl...h|
5f6fa3b00  4e f3 f0 25 83 d8 69 54  d5 21 06 86 fd 44 1a fb  |N..%..iT.!...D..|
5f6fa3b10  ca 23 cd ec ed 73 5f bb  1f bf 6f e3 36 d6 5e 09  |.#...s_...o.6.^.|
5f6fa3b20  f1 db 67 be 7a ff a9 ff  d9 e5 4a ec ec a0 1d 62  |..g.z.....J....b|
5f6fa3b30  bf 6e 84 cb 10 23 31 85  10 1f b4 4c 61 40 a3 9f  |.n...#1....La@..|
5f6fa3b40  62 e2 14 aa cd 18 3d 1e  27 a8 33 4b 10 1a 89 19  |b.....=.'.3K....|
5f6fa3b50  7e c3 16 86 33 96 8c a6  b4 b8 da 78 0c eb a9 e1  |~...3......x....|
5f6fa3b60  f1 2b 4a df b1 30 7a bc  6f 02 98 b6 9c 09 58 2b  |.+J..0z.o.....X+|
5f6fa3b70  b6 36 40 70 cf 1d 78 78  43 cc 63 a4 f6 29 09 43  |.6@p..xxC.c..).C|
5f6fa3b80  fc 90 2b c3 59 84 6e d8  4c 28 8c 88 01 1a d1 dc  |..+.Y.n.L(......|
5f6fa3b90  34 9f e6 b6 86 9e 31 fb  66 e3 c2 a8 b7 72 f5 4d  |4.....1.f....r.M|
5f6fa3ba0  0b 2a d6 6a d7 5b d3 3e  8a 1b bb 5c 89 56 6d 01  |.*.j.[.>...\.Vm.|
5f6fa3bb0  8a ce 31 e0 1a 34 61 0c  77 7b 58 41 e3 3b 02 97  |..1..4a.w{XA.;..|
5f6fa3bc0  82 8b 98 6a ea 08 17 61  68 88 fc db 98 ca 50 c5  |...j...ah.....P.|
5f6fa3bd0  5e d5 2e dd 3a 19 d3 0a  f9 fe 1f d4 cd 5c 8a e8  |^...:........\..|
5f6fa3be0  6c 00 00 1f 8b 08 04 00  00 00 00 00 ff 06 00 42  |l..............B|
5f6fa3bf0  43 02 00 1b 00 03 00 00  00 00 00 00 00 00 00     |C..............|
5f6fa3bff

I've done the same with the others 5 files, and search for the expected EOF block. This is what I get (expected EOF taked from here https://sourceforge.net/p/samtools/mailman/samtools-help/thread/4EC52844.3090808@broadinstitute.org/)

Expected EOF            1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00
my EOF                  1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00    Mm_NMV_Test_1_Sorted.bam
                        1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00    Mm_NMV_Test_2_Sorted.bam
                        1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00    Mm_NMV_Test_3_sorted.bam
                        1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00    Mm_NMV_Test_4_sorted.bam
                        1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00    Mm_NMV_Test_5_Sorted.bam
                        1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00 00 00    Mm_NMV_Test_6_Sorted.bam

This is the commands used with every program

Trim Galore

trim_galore --phred33 --fastqc --illumina --trim-n --paired *.gz

STAR

STAR --runThreadN 6 --runMode genomeGenerate --genomeDir home/imaris/genome --genomeFastaFiles ../../mnt/d/Seq/Mus_musculus.GRCm39.dna_sm.primary_assembly.fa

STAR --runMode alignReads --runThreadN 6 --genomeDir /home/imaris/home/imaris/genome --readFilesIn /mnt/d/Seq/Paired/ENCFF002FA/ENCFF002FAA_val_1.fq /mnt/d/Seq/Paired/ENCFF002FA/ENCFF002FAB_val_2.fq --outFileNamePrefix Resultados/Mm_NMV_Test_1 --outSAMtype BAM Unsorted 

STAR --runMode alignReads --runThreadN 6 --genomeDir /home/imaris/home/imaris/genome --readFilesIn /mnt/d/Seq/Paired/ENCFF002FA/ENCFF002FAC_val_1.fq /mnt/d/Seq/Paired/ENCFF002FA/ENCFF002FAD_val_2.fq --outFileNamePrefix Resultados/Mm_NMV_Test_2 --outSAMtype BAM Unsorted

Samtools

samtools sort -@ 6 -n Mm_NMV_Test_1Aligned.out.bam -o Sort/Mm_NMV_Test_1_Sorted.bam

samtools sort -@ 6 -n Mm_NMV_Test_2Aligned.out.bam -o Sort/Mm_NMV_Test_2_Sorted.bam

Thanks!

ADD REPLY
0
Entering edit mode

Thanks for the detailed results!

It seems that the BAM files all have correct EOF tags. However the program for checking the tag is rather simple. It reads the last 28 bytes from the file and compares them with the known EOF tag. I didn't find problems in this part of the program.

I noticed that it was in Windows. We test Rsubread in Windows before every release (including the featureCounts function), but we haven't tested Rsubread on the sorting results of samtools in Windows. Is it possible to share the BAM file with us? I hope this can help us to reproduce this issue, therefore find the cause of it.

ADD REPLY
0
Entering edit mode

Thanks Yang.

I have no problem to share the BAM with you, I took the RNAseq packages from https://www.encodeproject.org/, the files with "release" status. I can upload to MEGA one of the .bam archive, the link is below. If you have a better way to share the procesed files with you I can try it.

Again Thanks for your help

https://mega.nz/file/UotlwCZK#V6b8Y2hWvJTI7b4IXrnq5ywFLoZ5T1_Sk1Rww2P6y1U

ADD REPLY
0
Entering edit mode

Thanks, Nicolas!

I found that the BAM file is indeed intact, but it was larger than 4GB. I then found that the mingw32 library in Windows (including its 64-bit version) has file I/O functions behaving differently to glibc. The file seeking function works incorrectly on files larger than 4GB in Windows, even if the relative offset is only 28 bytes. This was introduced in the 2.8.0 version of Rsubread. A new version (v2.8.1) has been released, using the 64-bit version of file seeking function in Windows to solve this problem.

ADD REPLY
1
Entering edit mode

I've intalled the new version and works like a charm.

Thanks for the assitance Yang.

ADD REPLY

Login before adding your answer.

Traffic: 517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6