Hi,
I am trying to run ChIPQC with my 6 samples, with no success so far.
The commands I ran was:
exp <- ChIPQC(metadata, annotation = "mm10")
It returns:
samp1 brain TF 1 bed
samp2 brain TF 2 bed
samp3 brain TF 1 bed
samp4 brain TF 2 bed
samp5 brain TF 1 bed
samp6 brain TF 2 bed
Checking chromosomes:
[1] "chr1"
Compiling annotation...
Computing metrics for 6 samples...
Error: 6 errors; first error:
Error in sampleQC(bamFile = reads, bedFile = peaks, GeneAnnotation = annotation, : Contigs of interest are not all in Bam file!!
For more information, use bplasterror(). To resume calculation, re-call the function and set the argument 'BPRESUME' to TRUE or wrap the previous call in bpresume().
First traceback:
30: ChIPQC(metadata, annotation = "mm10")
29: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
annotation, mapQCth, blacklist, profileWin, fragmentLength,
shifts)
28: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
annotation, mapQCth, blacklist, profileWin, fragmentLength,
shifts)
27: bplapply(X, FUN, ..., BPRESUME = BPRESUME, BPPARAM = BPPARAM)
26: bplapply(X, FUN, ..., BPRESUME = BPRESUME, BPPARAM = BPPARAM)
25: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed,
mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM),
mc.cl
metadata is the summary description of the experimental design, as a data.frame:
SampleID Tissue Factor Replicate bamReads Peaks
samp1 brain TF 1 /path2/samp1.bam /path2/samp1.bed
samp2 brain TF 2 /path2/samp2.bam /path2/samp2.bed
samp3 brain TF 1 /path2/samp3.bam /path2/samp3.bed
samp4 brain TF 2 /path2/samp4.bam /path2/samp4.bed
samp5 brain TF 1 /path2/samp5.bam /path2/samp5.bed
samp6 brain TF 2 /path2/samp6.bam /path2/samp6.bed
The .bam files (mapping with bowtie2) are of course sorted with samtools sort.
They have been also indexed with samtools index.
The .bed files (peak calling with MACS) are correctly formatted (tab separated, 5-columns as chr, start, end, name, score) and look like this:
chr1 12000000 12000200 MACS_peak_1 115.92
My sessionInfo() is as following:
R version 3.1.3 (2015-03-09)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_CH.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_CH.UTF-8 LC_NAME=de_CH.UTF-8 LC_ADDRESS=de_CH.UTF-8 LC_TELEPHONE=de_CH.UTF-8 LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=de_CH.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.0.0 GenomicFeatures_1.18.3 AnnotationDbi_1.28.1 Biobase_2.26.0 XLConnect_0.2-11
[6] XLConnectJars_0.2-9 ChIPQC_1.2.2 DiffBind_1.12.3 GenomicAlignments_1.2.2 Rsamtools_1.18.3
[11] Biostrings_2.34.1 XVector_0.6.0 limma_3.22.7 GenomicRanges_1.18.4 GenomeInfoDb_1.2.4
[16] IRanges_2.0.1 S4Vectors_0.4.0 BiocGenerics_0.12.1 ggplot2_1.0.0
loaded via a namespace (and not attached):
[1] amap_0.8-14 base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.9 BiocParallel_1.0.3 biomaRt_2.22.0 bitops_1.0-6 brew_1.0-6 BSgenome_1.34.1 caTools_1.17.1
[11] checkmate_1.5.1 chipseq_1.16.0 codetools_0.2-10 colorspace_1.2-6 DBI_0.3.1 digest_0.6.8 edgeR_3.8.6 fail_1.2 foreach_1.4.2 gdata_2.13.3
[21] gplots_2.16.0 grid_3.1.3 gtable_0.1.2 gtools_3.4.1 hwriter_1.3.2 iterators_1.0.7 KernSmooth_2.23-14 lattice_0.20-30 latticeExtra_0.6-26 MASS_7.3-39
[31] munsell_0.4.2 Nozzle.R1_1.1-1 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.1-2 Rcpp_0.11.5 RCurl_1.95-4.5 reshape2_1.4.1 rJava_0.9-6 RSQLite_1.0.0
[41] rtracklayer_1.26.2 scales_0.2.4 sendmailR_1.2-1 ShortRead_1.24.0 stringr_0.6.2 tools_3.1.3 XML_3.98-1.1 zlibbioc_1.12.0
Also, when I try to run ChIPQCsample with a single sample instead, it "seems" to work, in the way that there is no error message, the method runs till the end. But the result is quite strange:
samp <- ChIPQCsample(reads = samp1.bam, peaks = samp1.bed, annotation = "mm10")
> samp
ChIPQCsample
Number of Mapped reads: 30767939
Number of Mapped reads passing MapQ filter: 24496740
Percentage Of Reads as Non-Duplicates (NRF): 100(0)
Percentage Of Reads in Blacklisted Regions: NA
SSD: 4.27506515515619
Fragment Length Cross-Coverage: 0.000228443139299127
Relative Cross-Coverage: 0.497333607597129
Percentage Of Reads in GenomicFeature:
ProportionOfCounts
Peaks 0
LongPromoter20000to2000 0
Promoters2000to500 0
Promoters500 0
All5utrs 0
Alltranscripts 0
Allcds 0
Allintrons 0
All3utrs 0
Percentage Of Reads in Peaks: 0
Number of Peaks: 0
GRanges object with 0 ranges and 2 metadata columns:
seqnames ranges strand | Counts bedRangesSummitsTemp
<Rle> <IRanges> <Rle> | <integer> <numeric>
-------
seqinfo: 22 sequences from an unspecified genome; no seqlengths
Although there are no many peaks (this is normal/expected for that TF in those conditions, ~100-1000 peaks per sample/condition), there are some specific significant peaks. So how come ChIPQCsample returns null values for the different proportions counts, especially for Percentage Of Reads in Peaks (instead of something like 4e-04 etc)? Why does it give null for the number of Peaks (I know there are at least ~100 peaks) just by looking at the bed file returned by MACS?
Thanks a lot for your help in advance,
Kind regards,
H.

For what it's worth a Bioconductor way to look at the names of the 'chromosomes' in BAM files is
seqlevels(Rsamtools::BamFile("my.bam"))seqinfo()provides the seqlengths as well; these functions work on lots of GRanges-related objects and (e.g., rtracklayer) file types that have sequence information.Ah, thanks Martin,
Always good to show the Bioconductor way
best,
tom