Question

ChIPQC bam and bed files question

0

Entering edit mode

woongjaej • 0

@woongjaej-13388

Last seen 21 months ago

Korea/Seoul/Yonsei University, College …

Hi, I'm a beginner researcher who's having trouble using ChIPQC bioconductor package..

This may not be a serious problem, but may be my English is not that good to understand what's going wrong with my sample.

Can somebody please help me what's wrong with my sample compared with ChIPQC's tutorial samples?

I want to get the RiBL% score(blacklist regions related) but I couldn't get it. Because of the occuring errors, I'm not sure if the other scores are accurate to use, either.

So I tried to find out what’s wrong with my bam, bed files and looked it up.

I used bed file(narrowpeak file) that was made with MACS2.

Differences between my files and the tutorial files were that

my files’ chromosome names didn’t have “chr” at the front of the numbers, (I think this doesn’t matter because both of my bam , narrowpeak files don’t have “chr”, maybe reference I used for mapping made this situation)
my files have unknown reads such as “GL000XXX.X” and example files(files used in tutorials) such as “A549_GATA3”..etc don’t have these unknown reads.
My bed file is MACS2 made narrowpeak file.

I wonder which tool was used to make Bed files in the ChIPQC's tutorial.

Below, there's is traceback info and sessioninfo, and the yellow blocked texts are my commands and results in R.

Thank you very much.

Woong Jae

> traceback()

4: pv.peakset(model, peaks = as.character(samples$Peaks[i]), sampID = as.character(samples$SampleID[i]),
tissue = as.character(samples$Tissue[i]), factor = as.character(samples$Factor[i]),
condition = as.character(samples$Condition[i]), treatment = as.character(samples$Treatment[i]),
consensus = F, peak.caller = peakcaller, peak.format = peakformat,
scoreCol = peakscores, bLowerScoreBetter = peaksLowerBetter,
control = controlid, reads = NA, replicate = as.integer(samples$Replicate[i]),
readBam = as.character(samples$bamReads[i]), controlBam = as.character(samples$bamControl[i]),
filter = peakfilter, counts = counts, bRemoveM = bRemoveM,
bRemoveRandom = bRemoveRandom, skipLines = skipLines)
3: pv.model(DBA, mask = mask, minOverlap = minOverlap, samplesheet = sampleSheet,
config = config, caller = peakCaller, format = peakFormat,
scorecol = scoreCol, bLowerBetter = bLowerScoreBetter, skipLines = skipLines,
bAddCallerConsensus = bAddCallerConsensus, bRemoveM = bRemoveM,
bRemoveRandom = bRemoveRandom, filter = filter, attributes = attributes)
2: dba(sampleSheet = experiment, bCorPlot = FALSE, peakCaller = "bed")
1: ChIPQC(samples, annotation = "hg19")

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.26.0 ChIPQC_1.12.2 DiffBind_2.4.6 SummarizedExperiment_1.6.3 DelayedArray_0.2.7
[6] matrixStats_0.52.2 Biobase_2.36.2 GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 IRanges_2.10.2
[11] S4Vectors_0.14.3 BiocGenerics_0.22.0 ggplot2_2.2.1

loaded via a namespace (and not attached):

[82] xtable_1.8-2 survival_2.41-3 tibble_1.3.3
[85] pheatmap_1.0.8 GenomicAlignments_1.12.1 AnnotationDbi_1.38.1
[88] memoise_1.1.0 bindrcpp_0.2 brew_1.0-6
[91] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 GSEABase_1.38.0

> samples1 = read.csv("samples1.csv")
> samples1
SampleID Tissue Factor Replicate bamReads Peaks
1 1_hg19 NA NA 1 1.bwa_hg37_sorted.bam 1_bwa_macs_hg19_peaks.narrowPeak
> exampleExp1 = ChIPQC(samples1, annotation="hg19")
1_hg19 1 bed
Checking chromosomes:
[1] "1"
Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

300 / 300
Counting reads in features for 1

Signal over peaks for 1

done
Calculating coverage
Calculating Summits on 1 ..[1] 1
Warning messages: 1: In if is.na(peaks)) peaks = NULL : the condition has length > 1 and only the first element will be used 2: In if is.na(blacklist)) blacklist = NULL : the condition has length > 1 and only the first element will be used
> exampleExp1
Samples: 1 : 1_hg19
       Replicate Peaks
1_hg19         1 93100
         Reads Map% Filt% Dup% ReadL FragL RelCC SSD RiP% RiBL%
1_hg19 2359580 100 10.2    0    99   199 2.03 8.11 49.5     0

chipqc • 1.8k views

ADD COMMENT • link updated 6.8 years ago by Thomas Carroll ▴ 420 • written 6.8 years ago by woongjaej • 0

score 1 · Answer 1 · 2017-07-05

hi,

I believe your peak and Bam files are fine, the blacklist used by default in ChIPQC for hg19 however has a different naming scheme to your data (it includes the "chr").

You can extract the blacklist for hg19 from ChIPQC, convert the names using the GenomeInfoDB package and then supply the newly converted Blacklist to the ChIPQC function.

> library(ChIPQC)
> library(GenomeInfoDb)
> data("blacklist_hg19")
> seqlevels(blacklist.hg19)
[1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3"  "chr4"
[18] "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY"
> seqlevelsStyle(blacklist.hg19) <- "Ensembl"
> seqlevels(blacklist.hg19)
[1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2"  "20" "21" "22" "3"  "4"  "5"  "6"  "7"  "8"  "9"  "MT" "X"  "Y" 
>samples <- read.csv("samples.csv")
>examplesExp <- ChIPQC(samples, annotation = "hg19", blacklist = blacklist.hg19)

Hopefully that works. I will look to add a warning when no chromosomes match between BAM header and the Blacklist contigs.

best,

tom