DiffBind & Chip-exo
1
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK
Hello Giuseppe- There shouldn't be any problem not having control reads. This looks like it could a mismatch with the peak file format. Could you send me * The mysamples.csv file * The GM06986_peaks.bed.gz file (or just the first 100 lines or so) I'll take a look and let you know what the problem is. Cheers- Rory On 23/07/2013 17:11, "Giuseppe Gallone" <giuseppe.gallone@dpag.ox.ac.u k<mailto:giuseppe.gallone@dpag.ox.ac.uk="">> wrote: Dear Rory I'm contacting you to know your thoughts on the possibility of using DiffBind with a Chip-exo dataset. The dataset is composed of transcription factor binding data for a large number of hapmap LCLs. I have made a first attempt at utilising the program - I am however experiencing some problems. I called peaks using MACS2 and also have raw read data in .bed format. I tried to build an initial .csv file in the following format SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks,P eakCaller,PeakFormat,ScoreCol,LowerBetter GM06986,,TF,stimulated,1,d_BED/TF_GM06986_reads.bed,,GM06986_peaks.bed .gz,MACS,raw,4,F The bamControl field is empty, as is the Tissue field - The data is not tissue specific, and as you might be aware chip-exo data does not currently come with background/input control. This is the command I use tfr = dba(sampleSheet="mysamples.csv") and this is the error: Error in if (res >= minval) { : missing value where TRUE/FALSE needed In addition: There were 30 warnings (use warnings() to see them) with the generic warning being: 1: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors Is the error due to the lack of control reads? Thanks for your help & suggestions. Giuseppe [[alternative HTML version deleted]]
HapMap HapMap • 1.6k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK
>Hi Giuseppe- > >I'm glad to sorted the column thing out, that was what I suspected. > >There shouldn't be much problem doing the analysis without a control >track, particularly if the samples come from the same tissue. The main >role of the control tracks is for peak calling. The reason the control >track is less important for differential analysis is that youy are looking >at the relative differences in read density at the same genomic intervals >across samples, and not comparing read densities across intervals. So if >the control track were similar at that location for all samples, it will >not affect the differential analysis. The main issue would be if there >were something like big copy number differences between samples. Then you >could get sites that show as differentially bound when the real difference >was the copy number. But the difference would be real regardless. > >Regarding sequencing depth, this should be taken care of by the >normalisation step. It takes the library size (either full library size, >which is the total number of reads, or the default effective library size, >the number of reads within peaks for each sample) and adjusts the read >counts. You can can an idea of how this is working by using the >dba.plotBox (with bAll=TRUE) comparing bNormalized=TRUE and >bNormalized=FALSE to see if things even out. Also, after counting, you can >look at the clustering (dba.plotPCA and dba.plotHeatmap) to see if samples >are grouping by sequencing depth -- try doing the same plots with >different score, eg score=DBA_SCORE_READS, score=DBA_SCORE_RPKM, and >score=DBA_SCORE_TMM_READS_EFFECTIVE or score=DBA_SCORE_TMM_READS_FULL to >see which gives to the best clustering. > >Hope this helps! > >Cheers- >Rory > >On 23/07/2013 17:58, "Giuseppe Gallone" <giuseppe.gallone at="" dpag.ox.ac.uk=""> >wrote: > >>Hi Rory >> >>I figured out what the problem was - the score column in my MACS2 beds >>was the 5th, not the 4th (specified in the csv file) >> >>Having said this, do you have specific suggestions on running >>control-less samples through DiffBind? Is there a mailing list where I >>could learn to use the program properly / ask questions? >> >>A further question - I have sequence depth differences across the >>samples. Should I manually sub-sample my biggest (in terms of read >>depth) samples to a small common denominator before plotting >>correlations in DiffBind - or will the software do it for me? >> >>Best regards >>Giuseppe >> >>On 07/23/13 17:17, Rory Stark wrote: >>> Hello Giuseppe- >>> >>> There shouldn't be any problem not having control reads. This looks >>>like >>> it could a mismatch with the peak file format. Could you send me >>> >>> * The mysamples.csv file >>> * The GM06986_peaks.bed.gz file (or just the first 100 lines or so) >>> >>> I'll take a look and let you know what the problem is. >>> >>> Cheers- >>> Rory >>> >>> On 23/07/2013 17:11, "Giuseppe Gallone" <giuseppe.gallone at="" dpag.ox.ac.uk="">>> <mailto:giuseppe.gallone at="" dpag.ox.ac.uk="">> wrote: >>> >>> Dear Rory >>> >>> I'm contacting you to know your thoughts on the possibility of >>>using >>> DiffBind with a Chip-exo dataset. The dataset is composed of >>> transcription factor binding data for a large number of hapmap >>>LCLs. >>> >>> I have made a first attempt at utilising the program - I am however >>> experiencing some problems. I called peaks using MACS2 and also >>>have >>> raw >>> read data in .bed format. I tried to build an initial .csv file in >>>the >>> following format >>> >>> >>>SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peak s,Pea >>>k >>>Caller,PeakFormat,ScoreCol,LowerBetter >>> >>>GM06986,,TF,stimulated,1,d_BED/TF_GM06986_reads.bed,,GM06986_peaks. bed.g >>>z >>>,MACS,raw,4,F >>> >>> The bamControl field is empty, as is the Tissue field - The data is >>>not >>> tissue specific, and as you might be aware chip-exo data does not >>> currently come with background/input control. >>> >>> This is the command I use >>> tfr = dba(sampleSheet="mysamples.csv") >>> >>> >>> and this is the error: >>> Error in if (res >= minval) { : missing value where TRUE/FALSE >>>needed >>> In addition: There were 30 warnings (use warnings() to see them) >>> >>> with the generic warning being: >>> 1: In Ops.factor(peaks[, pCol], width) : / not meaningful for >>>factors >>> >>> Is the error due to the lack of control reads? Thanks for your help >>>& >>> suggestions. >>> >>> Giuseppe >>> >> >>-- >>Dr Giuseppe Gallone >>MRC career development fellow >>MRC Functional Genomics Unit - DPAG >>University of Oxford, UK >
ADD COMMENT

Login before adding your answer.

Traffic: 523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6