Search
Question: ChIPQC - can't read in bam files
1
12 days ago by
molly_the_minotaur20 wrote:

Having trouble to get the ChIPQC to work properly. My ultimate goal is to have the html report generated using ChIPQCreport().

I read in the bam files through dba function in the DiffBind package. All went well and I was able to run all DiffBind functions.

Using the same sample sheet and dba object does not work with ChIPQC.

I tried read in sample sheet directly or the dba object, both failed.

I have 8 ChIP-seq samples and two input files as controls. All bam files are sorted and indexed. Both .bam and .bai files are in the same folder. All chromosome number are the same in bam and bed files, only digits and no "chr" before the digits. T

here are some discrepancy between the chromosomes, in .bam files, there are more chromosomes but in the .bed file, there were only "chromosome 1, 3, 5, 6, 7, 9, 10, 11, 13, 16, 17, 18, X". Although I don't think this should be an issue since the peak file (summit file) (.bed) are called peaks from MACS, which may not have all chromosomes as the .bam files do.

Here's the error message I got when using dba object to run ChIPQC (using only chromosome 18 as example)

Using the dba object:

dataset <- dba(sampleSheet="/path_to_peakset/peaksets_20180910.csv")
dataset_QC = ChIPQC(dataset, annotation="mm9", chromosomes = "18")

These are the error message appeared:

Bam file has 211 contigs
Error in value[[3L]](cond) :
failed to open BamFile: failed to load BAM index
1: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
2: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
3: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
4: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
5: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
6: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
7: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
8: In doTryCatch(return(expr), name, parentenv, handler) :
[bam_index_load] fail to load BAM index.

Using the ChIPQC function directly:

dataset2 = ChIPQC(experiment = "/path_to_samplesheet/peaksets_20180910.csv",
annotation="mm9", bCount = TRUE, chromosomes = "18")

Error message from ChIPQC function:

Bam file has 211 contigs
Error in value[[3L]](cond) :
failed to open BamFile: failed to load BAM index
1: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
2: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
3: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
4: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
5: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
6: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
7: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
8: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
9: In if (is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
10: In doTryCatch(return(expr), name, parentenv, handler) :
[bam_index_load] fail to load BAM index.

sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BiocParallel_1.10.1        ChIPQC_1.12.3              ggplot2_3.0.0              XML_3.98-1.16              bindrcpp_0.2.2             DiffBind_2.4.8             SummarizedExperiment_1.6.5
[8] DelayedArray_0.2.7         matrixStats_0.54.0         Biobase_2.36.2             GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7
[15] BiocGenerics_0.22.1        BiocInstaller_1.26.1   

(don't have enough space to paste all info - plz let me know if you need the complete session info, thanks)

modified 12 days ago • written 12 days ago by molly_the_minotaur20

Does that file have an index file? It appears that it doesn't. Also, you should first update to the current version of R/Bioc, as we don't support outdated versions (your installation is well over a year out of date).

1

I do have the index files (.bai) under the same folder. I have R 3.5 installed on a server that I’ll test the same function again this afternoon. Thanks for the suggestion  @James

I can't get the ChIPQC installed on the server R 3.5 because anaconda conflicts. There are many dependencies couldn't be installed. Do you think the errors in this post are due to the older R version 3.4.1?

I think you should simplify things and investigate the problem -- unable to

Bam file has 211 contigs
Error in value[[3L]](cond) :
failed to open BamFile: failed to load BAM index
file: /Users/yiko/Dropbox/0_Rader_Lab/Lab_Members/Xin_Bi/diffbind/BAM/Input_KOAligned.sortedByCoord.out.bam

I think you trigger this with

library(Rsamtools)
open(BamFile("/Users/yiko/Dropbox/0_Rader_Lab/Lab_Members/Xin_Bi/diffbind/BAM/Input_KOAligned.sortedByCoord.out.bam"))

and then it would be helpful to look at the files at that location, whether the index file is there, the specific name of the index file, and whether the name of the index file is consistent with the expectation in the documentation -- I *think* in your version if the base file name was foo.bam, then the index had to be foo.bam.bai

I don't know. Packages may or may not get updated at each release cycle, and that becomes even more complicated when a given package depends on functions from other packages which may or may not have been updated as well.

However, Rsamtools has a function called open.BamFile:

open.BamFile <- function (con, ...)
{
tryCatch({
.io_check_exists(path(con))
index <- sub("\\.bai$", "", index(con, asNA = FALSE)) con$.extptr <- .Call(.bamfile_open, path(con), index,
"rb")
}, error = function(err) {
stop("failed to open BamFile: ", conditionMessage(err))
})
invisible(con)
}

That tries to open a bam file that you have specified, which in turn calls some underlying C code that tries to open the bam index file, and which returns the error 'failed to open BAM index'.

So if you get an error that says

failed to open BamFile: failed to load BAM index
file: /Users/yiko/Dropbox/0_Rader_Lab/Lab_Members/Xin_Bi/diffbind/BAM/Input_KOAligned.sortedByCoord.out.bam

That's a really good indication that there isn't an index file for that particular bam file.

You might try doing

library(Rsamtools)

con <- open("/Users/yiko/Dropbox/0_Rader_Lab/Lab_Members/Xin_Bi/diffbind/BAM/Input_KOAligned.sortedByCoord.out.bam")

Directly. Dollars to donuts you get the same error, which again is a really good indication that you either don't have an index file for that bam file, or if you do, it's borked.