ChIPQC bam and bed files question
1
0
Entering edit mode
woongjaej • 0
@woongjaej-13388
Last seen 2.5 years ago
Korea/Seoul/Yonsei University, College …

Hi, I'm a beginner researcher who's having trouble using ChIPQC bioconductor package..

This may not be a serious problem, but may be my English is not that good to understand what's going wrong with my sample.

Can somebody please help me what's wrong with my sample compared with ChIPQC's tutorial samples?

I want to get the RiBL% score(blacklist regions related) but I couldn't get it. Because of the occuring errors, I'm not sure if the other scores are accurate to use, either.

So I tried to find out what’s wrong with my bam, bed files and looked it up.

I used bed file(narrowpeak file) that was made with MACS2.

Differences between my files and the tutorial files were that

  1. my files’ chromosome names didn’t have “chr” at the front of the numbers, (I think this doesn’t matter because both of my bam , narrowpeak files don’t have “chr”, maybe reference I used for mapping made this situation)
  2. my files have unknown reads such as “GL000XXX.X” and example files(files used in tutorials) such as “A549_GATA3”..etc don’t have these unknown reads.
  3. My bed file is MACS2 made narrowpeak file.

I wonder which tool was used to make Bed files in the ChIPQC's tutorial.

Below, there's is traceback info and sessioninfo, and the yellow blocked texts are my commands and results in R.

Thank you very much.

Woong Jae

 

> traceback()

4: pv.peakset(model, peaks = as.character(samples$Peaks[i]), sampID = as.character(samples$SampleID[i]), 
       tissue = as.character(samples$Tissue[i]), factor = as.character(samples$Factor[i]), 
       condition = as.character(samples$Condition[i]), treatment = as.character(samples$Treatment[i]), 
       consensus = F, peak.caller = peakcaller, peak.format = peakformat, 
       scoreCol = peakscores, bLowerScoreBetter = peaksLowerBetter, 
       control = controlid, reads = NA, replicate = as.integer(samples$Replicate[i]), 
       readBam = as.character(samples$bamReads[i]), controlBam = as.character(samples$bamControl[i]), 
       filter = peakfilter, counts = counts, bRemoveM = bRemoveM, 
       bRemoveRandom = bRemoveRandom, skipLines = skipLines)
3: pv.model(DBA, mask = mask, minOverlap = minOverlap, samplesheet = sampleSheet, 
       config = config, caller = peakCaller, format = peakFormat, 
       scorecol = scoreCol, bLowerBetter = bLowerScoreBetter, skipLines = skipLines, 
       bAddCallerConsensus = bAddCallerConsensus, bRemoveM = bRemoveM, 
       bRemoveRandom = bRemoveRandom, filter = filter, attributes = attributes)
2: dba(sampleSheet = experiment, bCorPlot = FALSE, peakCaller = "bed")
1: ChIPQC(samples, annotation = "hg19")

 

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.26.0       ChIPQC_1.12.2              DiffBind_2.4.6             SummarizedExperiment_1.6.3 DelayedArray_0.2.7        
 [6] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
[11] S4Vectors_0.14.3           BiocGenerics_0.22.0        ggplot2_2.2.1             

loaded via a namespace (and not attached):
 
[82] xtable_1.8-2                              survival_2.41-3                           tibble_1.3.3                             
[85] pheatmap_1.0.8                            GenomicAlignments_1.12.1                  AnnotationDbi_1.38.1                     
[88] memoise_1.1.0                             bindrcpp_0.2                              brew_1.0-6                               
[91] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 GSEABase_1.38.0 

> samples1 = read.csv("samples1.csv")
> samples1
  SampleID Tissue Factor Replicate              bamReads                            Peaks
1   1_hg19     NA     NA         1 1.bwa_hg37_sorted.bam 1_bwa_macs_hg19_peaks.narrowPeak
> exampleExp1 = ChIPQC(samples1, annotation="hg19")
1_hg19     1 bed
Checking chromosomes:
[1] "1"
Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

 300 / 300
Counting reads in features for 1

Signal over peaks for 1

done
Calculating coverage
Calculating Summits on  1  ..[1] 1
Warning messages:
1: In if is.na(peaks)) peaks = NULL :
  the condition has length > 1 and only the first element will be used
2: In if is.na(blacklist)) blacklist = NULL :
  the condition has length > 1 and only the first element will be used

> exampleExp1
Samples: 1 : 1_hg19
       Replicate Peaks
1_hg19         1 93100
         Reads Map% Filt% Dup% ReadL FragL RelCC  SSD RiP% RiBL%
1_hg19 2359580  100  10.2    0    99   199  2.03 8.11 49.5     0

chipqc • 2.1k views
ADD COMMENT
1
Entering edit mode
@thomas-carroll-7019
Last seen 2.2 years ago
United States/New York/The Rockefeller …

 

hi,

I believe your peak and Bam files are fine, the blacklist used by default in ChIPQC for hg19 however has a different naming scheme to your data (it includes the "chr").

You can extract the blacklist for hg19 from ChIPQC, convert the names using the GenomeInfoDB package and then supply the newly converted Blacklist to the ChIPQC function.

> library(ChIPQC)
> library(GenomeInfoDb)
> data("blacklist_hg19")
> seqlevels(blacklist.hg19)
[1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3"  "chr4"
[18] "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY"
> seqlevelsStyle(blacklist.hg19) <- "Ensembl"
> seqlevels(blacklist.hg19)
[1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2"  "20" "21" "22" "3"  "4"  "5"  "6"  "7"  "8"  "9"  "MT" "X"  "Y" 
>samples <- read.csv("samples.csv")
>examplesExp <- ChIPQC(samples, annotation = "hg19", blacklist = blacklist.hg19)

Hopefully that works. I will look to add a warning when no chromosomes match between BAM header and the Blacklist contigs.

best,

tom

 

ADD COMMENT

Login before adding your answer.

Traffic: 886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6