Hi, I'm a beginner researcher who's having trouble using ChIPQC bioconductor package..
This may not be a serious problem, but may be my English is not that good to understand what's going wrong with my sample.
Can somebody please help me what's wrong with my sample compared with ChIPQC's tutorial samples?
I want to get the RiBL% score(blacklist regions related) but I couldn't get it. Because of the occuring errors, I'm not sure if the other scores are accurate to use, either.
So I tried to find out what’s wrong with my bam, bed files and looked it up.
I used bed file(narrowpeak file) that was made with MACS2.
Differences between my files and the tutorial files were that
- my files’ chromosome names didn’t have “chr” at the front of the numbers, (I think this doesn’t matter because both of my bam , narrowpeak files don’t have “chr”, maybe reference I used for mapping made this situation)
- my files have unknown reads such as “GL000XXX.X” and example files(files used in tutorials) such as “A549_GATA3”..etc don’t have these unknown reads.
- My bed file is MACS2 made narrowpeak file.
I wonder which tool was used to make Bed files in the ChIPQC's tutorial.
Below, there's is traceback info and sessioninfo, and the yellow blocked texts are my commands and results in R.
Thank you very much.
Woong Jae
> traceback()
4: pv.peakset(model, peaks = as.character(samples$Peaks[i]), sampID = as.character(samples$SampleID[i]),
tissue = as.character(samples$Tissue[i]), factor = as.character(samples$Factor[i]),
condition = as.character(samples$Condition[i]), treatment = as.character(samples$Treatment[i]),
consensus = F, peak.caller = peakcaller, peak.format = peakformat,
scoreCol = peakscores, bLowerScoreBetter = peaksLowerBetter,
control = controlid, reads = NA, replicate = as.integer(samples$Replicate[i]),
readBam = as.character(samples$bamReads[i]), controlBam = as.character(samples$bamControl[i]),
filter = peakfilter, counts = counts, bRemoveM = bRemoveM,
bRemoveRandom = bRemoveRandom, skipLines = skipLines)
3: pv.model(DBA, mask = mask, minOverlap = minOverlap, samplesheet = sampleSheet,
config = config, caller = peakCaller, format = peakFormat,
scorecol = scoreCol, bLowerBetter = bLowerScoreBetter, skipLines = skipLines,
bAddCallerConsensus = bAddCallerConsensus, bRemoveM = bRemoveM,
bRemoveRandom = bRemoveRandom, filter = filter, attributes = attributes)
2: dba(sampleSheet = experiment, bCorPlot = FALSE, peakCaller = "bed")
1: ChIPQC(samples, annotation = "hg19")
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.26.0 ChIPQC_1.12.2 DiffBind_2.4.6 SummarizedExperiment_1.6.3 DelayedArray_0.2.7
[6] matrixStats_0.52.2 Biobase_2.36.2 GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 IRanges_2.10.2
[11] S4Vectors_0.14.3 BiocGenerics_0.22.0 ggplot2_2.2.1
loaded via a namespace (and not attached):
[82] xtable_1.8-2 survival_2.41-3 tibble_1.3.3
[85] pheatmap_1.0.8 GenomicAlignments_1.12.1 AnnotationDbi_1.38.1
[88] memoise_1.1.0 bindrcpp_0.2 brew_1.0-6
[91] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 GSEABase_1.38.0
> samples1 = read.csv("samples1.csv")
> samples1
SampleID Tissue Factor Replicate bamReads Peaks
1 1_hg19 NA NA 1 1.bwa_hg37_sorted.bam 1_bwa_macs_hg19_peaks.narrowPeak
> exampleExp1 = ChIPQC(samples1, annotation="hg19")
1_hg19 1 bed
Checking chromosomes:
[1] "1"
Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 1
Calculating SSD for 1
Calculating unique positions per strand for 1
Calculating shift for 1
300 / 300
Counting reads in features for 1
Signal over peaks for 1
done
Calculating coverage
Calculating Summits on 1 ..[1] 1
Warning messages:
1: In if is.na(peaks)) peaks = NULL :
the condition has length > 1 and only the first element will be used
2: In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used
> exampleExp1
Samples: 1 : 1_hg19
Replicate Peaks
1_hg19 1 93100
Reads Map% Filt% Dup% ReadL FragL RelCC SSD RiP% RiBL%
1_hg19 2359580 100 10.2 0 99 199 2.03 8.11 49.5 0