Question

DiffBind & Chip-exo

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 13 months ago

Cambridge, UK

Hello Giuseppe- There shouldn't be any problem not having control reads. This looks like it could a mismatch with the peak file format. Could you send me * The mysamples.csv file * The GM06986_peaks.bed.gz file (or just the first 100 lines or so) I'll take a look and let you know what the problem is. Cheers- Rory On 23/07/2013 17:11, "Giuseppe Gallone" <giuseppe.gallone@dpag.ox.ac.u k<mailto:giuseppe.gallone@dpag.ox.ac.uk="">> wrote: Dear Rory I'm contacting you to know your thoughts on the possibility of using DiffBind with a Chip-exo dataset. The dataset is composed of transcription factor binding data for a large number of hapmap LCLs. I have made a first attempt at utilising the program - I am however experiencing some problems. I called peaks using MACS2 and also have raw read data in .bed format. I tried to build an initial .csv file in the following format SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks,P eakCaller,PeakFormat,ScoreCol,LowerBetter GM06986,,TF,stimulated,1,d_BED/TF_GM06986_reads.bed,,GM06986_peaks.bed .gz,MACS,raw,4,F The bamControl field is empty, as is the Tissue field - The data is not tissue specific, and as you might be aware chip-exo data does not currently come with background/input control. This is the command I use tfr = dba(sampleSheet="mysamples.csv") and this is the error: Error in if (res >= minval) { : missing value where TRUE/FALSE needed In addition: There were 30 warnings (use warnings() to see them) with the generic warning being: 1: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors Is the error due to the lack of control reads? Thanks for your help & suggestions. Giuseppe [[alternative HTML version deleted]]

HapMap HapMap • 1.8k views

ADD COMMENT • link 12.5 years ago Rory Stark ★ 5.2k

score 0 · Answer 1 · 2013-07-23

>Hi Giuseppe- > >I'm glad to sorted the column thing out, that was what I suspected. > >There shouldn't be much problem doing the analysis without a control >track, particularly if the samples come from the same tissue. The main >role of the control tracks is for peak calling. The reason the control >track is less important for differential analysis is that youy are looking >at the relative differences in read density at the same genomic intervals >across samples, and not comparing read densities across intervals. So if >the control track were similar at that location for all samples, it will >not affect the differential analysis. The main issue would be if there >were something like big copy number differences between samples. Then you >could get sites that show as differentially bound when the real difference >was the copy number. But the difference would be real regardless. > >Regarding sequencing depth, this should be taken care of by the >normalisation step. It takes the library size (either full library size, >which is the total number of reads, or the default effective library size, >the number of reads within peaks for each sample) and adjusts the read >counts. You can can an idea of how this is working by using the >dba.plotBox (with bAll=TRUE) comparing bNormalized=TRUE and >bNormalized=FALSE to see if things even out. Also, after counting, you can >look at the clustering (dba.plotPCA and dba.plotHeatmap) to see if samples >are grouping by sequencing depth -- try doing the same plots with >different score, eg score=DBA_SCORE_READS, score=DBA_SCORE_RPKM, and >score=DBA_SCORE_TMM_READS_EFFECTIVE or score=DBA_SCORE_TMM_READS_FULL to >see which gives to the best clustering. > >Hope this helps! > >Cheers- >Rory > >On 23/07/2013 17:58, "Giuseppe Gallone" <giuseppe.gallone at="" dpag.ox.ac.uk=""> >wrote: > >>Hi Rory >> >>I figured out what the problem was - the score column in my MACS2 beds >>was the 5th, not the 4th (specified in the csv file) >> >>Having said this, do you have specific suggestions on running >>control-less samples through DiffBind? Is there a mailing list where I >>could learn to use the program properly / ask questions? >> >>A further question - I have sequence depth differences across the >>samples. Should I manually sub-sample my biggest (in terms of read >>depth) samples to a small common denominator before plotting >>correlations in DiffBind - or will the software do it for me? >> >>Best regards >>Giuseppe >> >>On 07/23/13 17:17, Rory Stark wrote: >>> Hello Giuseppe- >>> >>> There shouldn't be any problem not having control reads. This looks >>>like >>> it could a mismatch with the peak file format. Could you send me >>> >>> * The mysamples.csv file >>> * The GM06986_peaks.bed.gz file (or just the first 100 lines or so) >>> >>> I'll take a look and let you know what the problem is. >>> >>> Cheers- >>> Rory >>> >>> On 23/07/2013 17:11, "Giuseppe Gallone" <giuseppe.gallone at="" dpag.ox.ac.uk="">>> <mailto:giuseppe.gallone at="" dpag.ox.ac.uk="">> wrote: >>> >>> Dear Rory >>> >>> I'm contacting you to know your thoughts on the possibility of >>>using >>> DiffBind with a Chip-exo dataset. The dataset is composed of >>> transcription factor binding data for a large number of hapmap >>>LCLs. >>> >>> I have made a first attempt at utilising the program - I am however >>> experiencing some problems. I called peaks using MACS2 and also >>>have >>> raw >>> read data in .bed format. I tried to build an initial .csv file in >>>the >>> following format >>> >>> >>>SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peak s,Pea >>>k >>>Caller,PeakFormat,ScoreCol,LowerBetter >>> >>>GM06986,,TF,stimulated,1,d_BED/TF_GM06986_reads.bed,,GM06986_peaks. bed.g >>>z >>>,MACS,raw,4,F >>> >>> The bamControl field is empty, as is the Tissue field - The data is >>>not >>> tissue specific, and as you might be aware chip-exo data does not >>> currently come with background/input control. >>> >>> This is the command I use >>> tfr = dba(sampleSheet="mysamples.csv") >>> >>> >>> and this is the error: >>> Error in if (res >= minval) { : missing value where TRUE/FALSE >>>needed >>> In addition: There were 30 warnings (use warnings() to see them) >>> >>> with the generic warning being: >>> 1: In Ops.factor(peaks[, pCol], width) : / not meaningful for >>>factors >>> >>> Is the error due to the lack of control reads? Thanks for your help >>>& >>> suggestions. >>> >>> Giuseppe >>> >> >>-- >>Dr Giuseppe Gallone >>MRC career development fellow >>MRC Functional Genomics Unit - DPAG >>University of Oxford, UK >