Question

diffbind, paired end reads

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 11 months ago

Cambridge, UK

Hi Kasper- >From 1.6 on, if the bLowMem parameter is set to TRUE in dba.count, DiffBind will use summarizeOverlaps. Changing the config value DBA$config$singleEnd to FALSE allows it to use paired-end BAM files. The documentation for summarizeOverlaps in the GenomicRanges package explains how it handles paired-end data. Gord, how does the default count code handle paired-end data (in pre-1.6 DiffBind, or when bLowMem=FALSE)? Cheers- Rory ________________________________ From: Kasper Daniel Hansen [kasperdanielhansen@gmail.com] Sent: 17 June 2013 21:51 To: Rory Stark Subject: diffbind, paired end reads Hi Rory Just skimming DiffBind. I was surprised there is not a longer discussion of dba.count() in the vignette regarding supported input formats. I also read (skimmed) the man pages. It is not clear to me if DiffBind supports paired end reads in the counting step (and also how it deals with for example a paired end read where only one mate aligns). Hope all is well in Cambridge. Best, Kasper NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:20}}

GenomicRanges DiffBind GenomicRanges DiffBind • 2.5k views

ADD COMMENT • link updated 12.5 years ago by Gord Brown ▴ 670 • written 12.5 years ago by Rory Stark ★ 5.2k

score 0 · Answer 1 · 2013-06-18

0

Entering edit mode

Gord Brown ▴ 670

@gord-brown-5664

Last seen 4.9 years ago

United Kingdom

Hi, Rory, Kasper, DiffBind's original counting code treats paired-end data the same as single-end, i.e. each read is considered separately, with no attention paid to whether it's part of a pair. As long as all the libraries are the same (all S.E. or all P.E.) it shouldn't affect the outcome; counts will just be double for P.E. cases (disregarding improperly paired reads). To fix it properly, since we accept bed as well as bam, we'd have to test for properly paired reads ourselves, possibly doing it just that bit differently from other tools. Personally, I'd rather not open that box... Cheers, - Gord On 2013-06-17 23:55, "Rory Stark" <rory.stark at="" cruk.cam.ac.uk=""> wrote: >Hi Kasper- > >From 1.6 on, if the bLowMem parameter is set to TRUE in dba.count, >DiffBind will use summarizeOverlaps. Changing the config value >DBA$config$singleEnd to FALSE allows it to use paired-end BAM files. The >documentation for summarizeOverlaps in the GenomicRanges > package explains how it handles paired-end data. > >Gord, how does the default count code handle paired-end data (in pre-1.6 >DiffBind, or when bLowMem=FALSE)? > >Cheers- >Rory > >________________________________________ >From: Kasper Daniel Hansen [kasperdanielhansen at gmail.com] >Sent: 17 June 2013 21:51 >To: Rory Stark >Subject: diffbind, paired end reads > > >Hi Rory > > >Just skimming DiffBind. I was surprised there is not a longer discussion >of dba.count() in the vignette regarding supported input formats. I also >read (skimmed) the man pages. It is not clear to me if DiffBind supports >paired end reads in the counting > step (and also how it deals with for example a paired end read where >only one mate aligns). > > >Hope all is well in Cambridge. > > >Best, >Kasper

ADD COMMENT • link 12.5 years ago Gord Brown ▴ 670

0

Entering edit mode

Both, thanks a lot for your help. I understand the problems inherent in dealing with this, I was just asking how it is currently handled. I suggest expanding the relevant section in the vignette, giving some more detail, but perhaps I am the only one wondering about this. I ended up doing a fair amount by hand myself, so I was in control. I was actually surprised to learn that the mapping settings had a dramatic effect on the results (going from 1500 DE peaks to 5000 DE peaks) and to some extent the overlapping calculation as well. Is this something you have investigated? Best, Kasper On Tue, Jun 18, 2013 at 10:51 AM, Gordon Brown <gordon.brown@cruk.cam.ac.uk>wrote: > Hi, Rory, Kasper, > > DiffBind's original counting code treats paired-end data the same as > single-end, i.e. each read is considered separately, with no attention > paid to whether it's part of a pair. As long as all the libraries are the > same (all S.E. or all P.E.) it shouldn't affect the outcome; counts will > just be double for P.E. cases (disregarding improperly paired reads). > > To fix it properly, since we accept bed as well as bam, we'd have to test > for properly paired reads ourselves, possibly doing it just that bit > differently from other tools. Personally, I'd rather not open that box... > > Cheers, > > - Gord > > > On 2013-06-17 23:55, "Rory Stark" <rory.stark@cruk.cam.ac.uk> wrote: > > >Hi Kasper- > > > >From 1.6 on, if the bLowMem parameter is set to TRUE in dba.count, > >DiffBind will use summarizeOverlaps. Changing the config value > >DBA$config$singleEnd to FALSE allows it to use paired-end BAM files. The > >documentation for summarizeOverlaps in the GenomicRanges > > package explains how it handles paired-end data. > > > >Gord, how does the default count code handle paired-end data (in pre-1.6 > >DiffBind, or when bLowMem=FALSE)? > > > >Cheers- > >Rory > > > >________________________________________ > >From: Kasper Daniel Hansen [kasperdanielhansen@gmail.com] > >Sent: 17 June 2013 21:51 > >To: Rory Stark > >Subject: diffbind, paired end reads > > > > > >Hi Rory > > > > > >Just skimming DiffBind. I was surprised there is not a longer discussion > >of dba.count() in the vignette regarding supported input formats. I also > >read (skimmed) the man pages. It is not clear to me if DiffBind supports > >paired end reads in the counting > > step (and also how it deals with for example a paired end read where > >only one mate aligns). > > > > > >Hope all is well in Cambridge. > > > > > >Best, > >Kasper > > [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Hi, Not sure what you mean by "mapping settings". Do you mean filtering by mapping quality? If so, well, I've never investigated it with any care, but we certainly do see dramatic changes in peak calling, hence also in downstream analysis, based on inclusion or exclusion of multiply- mapped reads. These days I filter quite aggressively prior to peak calling: 1) Remove reads with mapping quality less than 5 2) Remove reads that overlap with ENCODE's "black listed" regions http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability 3) Remove reads from regions that show abnormally high signal in the input/control Since MACS in particular depends heavily on an accurate estimate of the insert size, taking out all these noise regions first seems to give more reliable, consistent results, and for our work it's better to miss a few regions in exchange for overall consistency. But if you meant something else by "mapping settings", then ignore all that! Cheers, - Gord On 2013-06-21 09:18, "Kasper Daniel Hansen" <kasperdanielhansen at="" gmail.com=""> wrote: >Both, thanks a lot for your help. I understand the problems inherent in >dealing with this, I was just asking how it is currently handled. I >suggest expanding the relevant section in the vignette, giving some more >detail, but perhaps I am the > only one wondering about this. > > >I ended up doing a fair amount by hand myself, so I was in control. I >was actually surprised to learn that the mapping settings had a dramatic >effect on the results (going from 1500 DE peaks to 5000 DE peaks) and to >some extent the overlapping > calculation as well. Is this something you have investigated? > > >Best, >Kasper > > > >On Tue, Jun 18, 2013 at 10:51 AM, Gordon Brown ><gordon.brown at="" cruk.cam.ac.uk=""> wrote: > >Hi, Rory, Kasper, > >DiffBind's original counting code treats paired-end data the same as >single-end, i.e. each read is considered separately, with no attention >paid to whether it's part of a pair. As long as all the libraries are the >same (all S.E. or all P.E.) it shouldn't affect the outcome; counts will >just be double for P.E. cases (disregarding improperly paired reads). > >To fix it properly, since we accept bed as well as bam, we'd have to test >for properly paired reads ourselves, possibly doing it just that bit >differently from other tools. Personally, I'd rather not open that box... > >Cheers, > > - Gord > > >On 2013-06-17 23:55, "Rory Stark" <rory.stark at="" cruk.cam.ac.uk=""> wrote: > >>Hi Kasper- >> >>From 1.6 on, if the bLowMem parameter is set to TRUE in dba.count, >>DiffBind will use summarizeOverlaps. Changing the config value >>DBA$config$singleEnd to FALSE allows it to use paired-end BAM files. The >>documentation for summarizeOverlaps in the GenomicRanges >> package explains how it handles paired-end data. >> >>Gord, how does the default count code handle paired-end data (in pre-1.6 >>DiffBind, or when bLowMem=FALSE)? >> >>Cheers- >>Rory >> > >>________________________________________ >>From: Kasper Daniel Hansen [kasperdanielhansen at gmail.com] >>Sent: 17 June 2013 21:51 >>To: Rory Stark >>Subject: diffbind, paired end reads >> >> >>Hi Rory >> >> >>Just skimming DiffBind. I was surprised there is not a longer discussion >>of dba.count() in the vignette regarding supported input formats. I also >>read (skimmed) the man pages. It is not clear to me if DiffBind supports >>paired end reads in the counting >> step (and also how it deals with for example a paired end read where >>only one mate aligns). >> >> >>Hope all is well in Cambridge. >> >> >>Best, >>Kasper > > > > > > > >

ADD REPLY • link 12.5 years ago Gord Brown ▴ 670