Question: DiffBind (Please add me to the Dropbox containing the vignette data)
0
gravatar for Rory Stark
5.6 years ago by
Rory Stark2.8k
CRUK, Cambridge, UK
Rory Stark2.8k wrote:
Hi Roy- I don't believe these questions ever got answered, we've been very busy! In the first instance, this looks like a good example of a paired experiment, with matched samples for each patient before and after treatment. I'd suggest filling in the "Tissue" field with the patient ID, so there are two samples with "aaa" and two with "bbb" etc. Then you can set the contrast based on the "Condition" but use the "Tissue" to match samples, by saying "block=DBA_TISSUE" in the call to dba.contrast: > exp = dba.contrast(exp,categories=DBA_CONDITION, block=DBA_TISSUE) > exp = dba.analyze(exp) > report = dba.report(exp, method=DBA_EDGER_BLOCK) The sites identified as significantly differentially bound (using as the method either DBA_EDGER_BLOCK or DBA_DESEQ2_BLOCK) will take into account the sample relationship. The edgeR vignette has a good explanation of such matched designs. If you want to analyse each patient separately, you can't really get a reliable result as each is an n=1 experiment. You can look for peak call overlaps to get some ideas of but just because a) a peak is called in both conditions, that doesn't mean its rate of binding didn't change significantly and b) just because a peak is called on only one condition that doesn't mean the binding rate did change significantly. You can only get this by looking at patterns over multiple samples as suggested above. Regarding minOverlap, setting it to 1 will include all peaks identified anywhere. This will include more spurious peaks, but only the ones with consistent changes will be identified by the differential analysis. There may be a small penalty to pay in the multiple testing correction, but the real differences should come through. If you are very interested in specific peaks that are seen on only one patient in only one condition, DiffBind can help you locate these peaks that don't overlap with anything else (using dba.overlap and dba.plotVenn), but it won't help with the statistical significance of such a peak (nothing that I am aware of will). Once identified, you'd need to look at the region in a browser for all the samples, and convince yourself the binding even is real and unique, then devise some other way to validate it. Finally, regarding counts, this applies for the differential binding analysis, which should involve more than two samples. Adding a single count to a specific peak to a single sample shouldn't change the results of the statistical analysis. I wouldn't trust an analysis that was that sensitive to a single read difference as the nature of ChIP-seq includes random sampling variation that should be greater than this. The difference between a sample with zero reads at a binding site and a sample with one read at that binding site is generally not meaningful. Likewise, if all the samples in one condition have zero or one reads at that site, and all the samples in another condition have hundreds of reads, then there is a meaningful (significant) difference. The bottom line is that DiffBind is mostly designed for comparing sample groups, and not so much for comparing individual samples. Hope this helps! -Rory On 14/02/2014 17:28, "Blum, Roy" <roy.blum at="" nyumc.org=""> wrote: >Dear Rory, > >Thanks a lot for the elaborate and clear explanation! >It helps a lot for understanding your pipeline. > >Our experimental layout is as follows: >Clinical epigenetic data from 5 human patients, before and after >treatment was collected individually, per each patient. > >The summary table for DiffBind (following the example in your tutorial) >should be like this: > >ID Tissue Factor Condition Replicate >aaa-p NA H3K9ac Pre-treatment 1 >bbb-p NA H3K9ac Pre-treatment 2 >ccc-p NA H3K9ac Pre-treatment 3 >ddd-p NA H3K9ac Pre-treatment 4 >eee-p NA H3K9ac Pre-treatment 5 >aaa-t NA H3K9ac Treatment 1 >bbb-t NA H3K9ac Treatment 2 >ccc-t NA H3K9ac Treatment 3 >ddd-t NA H3K9ac Treatment 4 >eee-t NA H3K9ac Treatment 5 > > >So basically we have for each individual patient (for example patient >"aaa" ) his analysis data from before and after treatment - (aaa-p and >aaa-t). >But there are no repeats per each of the patients. >However, since we determined the status of each factor (for example, >H3K9ac) across multiple patients we should be able to refer to each of >the five patients as a replica. > >My question is this - and I would be very glad to get your current input >on: >From what I understand from your tutorial and emails - running DiffBind >on this dataset would basically treat it like two batches (two groups) - >Is there currently a way to perform a pair-wise analysis to firstly >compare aaa-p with aaa-t, then bbb-p with bbb-t, and so on, and only then >draw the statistical analysis? So something like pairwise t-test as >oppose to two groups t-test?... In case there is currently a way to do >it with DiffBind I would be glad to learn it from you. In addition, I >would like to get your opinion on what value should be set for the >minOverlap parameter? Would you recommend setting up to "=1" to allow >exploration of peaks that are condition specific ? on the other hand >would these "singleton" would ever be reported significant, given that >they deferentially deposited only in one sample out of 10?... > > >Finally, >When I come to read your first two lines (here again): >"When counting reads overlapping an interval, DiffBind sets the value to a >minimum of one to eliminate any issues created by having zero values." > >I am still confused - in the scenario of two samples only, each from >different condition - when an enriched peak is detected only in one >condition (where it has for example 456 tags), but it is NOT detected in >the other condition and has simply zero tags there (so not even one tag!) >- would this 'standing alone' peak be ignored by DiffBind (for inability >to drive a statistical calculation)...? Is there currently a way to >obtain from DiffBind a list of all these condition-specific peaks that do >not meet even one tag in the corresponding condition? (in our hands we >sometime find that there are quite few cases like this and we would not >like to ignore them). > >I'll be really glad to get your professional input on these crucial >issues. >We are trying to decide whether to use DiffBind for our project and these >aspects should be regarded. > >Thank you very much, >Roy > > >-- >Roy Blum, Ph.D. >Senior Research Scientist >Cancer Institute, Smilow Research Building, >New York University School of Medicine, >12th Floor, Room 1206 >552 First Ave. >New York, NY, 10016 >Mob: +1 (646)-716-2875 >Lab: +1 (212)-263-2327 >http://blumroy.googlepages.com > >________________________________________ >From: Rory Stark [Rory.Stark at cruk.cam.ac.uk] >Sent: Friday, February 14, 2014 11:32 AM >To: Blum, Roy >Cc: Gordon Brown; bioconductor at r-project.org >Subject: Re: DiffBind (Please add me to the Dropbox containing the >vignette data) > >Hi Roy- > >When counting reads overlapping an interval, DiffBind sets the value to a >minimum of one to eliminate any issues created by having zero values. > >The minOverlap parameter in dba.count includes all peaks that occur in at >least this many peaksets, regardless of if they are in replicates or >different conditions. So in the case case where there is only one sample >for each condition, minOverlap=2would eliminate peaks that appear in only >one condition. But if you had two replicates of each condition, >minOverlap=2 would include peaks identified in only one conditions so long >as they were identified in both replicates. > >Currently DiffBind merges peaks that overlap by at least 1bp. The ability >to control that (e.g. 50%) has been a requested feature in the past -- >actually internally, the overlapping code does handle different >overlapping percentages (including negative values for peaks near to each >other but not actually overlapping). We will consider adding this feature >in a future release. > >Cheers- >Rory > >On 13/02/2014 22:08, "Blum, Roy" <roy.blum at="" nyumc.org=""> wrote: > >>Dear Rory, >> >>Thanks a lot for your clarifying response! >>It helps a lot for understanding your pipeline. >> >>If I understand correctly - since dba.report calculates fold changes by >>computing log2 normalized counts in the first condition minus the log2 >>normalized counts in the second condition (across each of the peaks >>presented by the two conditions - in case that minOverlap was set as >>"=1") - then even in the case of 'condition-exclusive' peaks (with zero >>tags in the peak location) we would still get a fold-change value, simply >>since we'll have a log2-normalized value minus zero, which would be equal >>to the log2 normalized value. Am I correct on this? This aspect wasn't >>very clear.. >> >>In addition, if I understand correctly - in case of using minOverlap=2 >>(for analysis that employs one sample per each condition, across two >>conditions) would tell DiffBind to ignore all the condition- exclusive >>peaks and to perform calculations only on the overlapping peaks? Am I >>correct on this? >> >>Finally, how does DiffBind define overlapping peaks? Is there a way to >>redefine this criteria? (for example based on overlap of 1bp vs. overlap >>of 50% of each peak span, etc.) >> >>Thanks a lot!! >>Roy >> >>-- >>Roy Blum, Ph.D. >>Senior Research Scientist >>Cancer Institute, Smilow Research Building, >>New York University School of Medicine, >>12th Floor, Room 1206 >>552 First Ave. >>New York, NY, 10016 >>Mob: +1 (646)-716-2875 >>Lab: +1 (212)-263-2327 >>http://blumroy.googlepages.com >> >>________________________________________ >>From: Rory Stark [Rory.Stark at cruk.cam.ac.uk] >>Sent: Thursday, February 13, 2014 3:26 PM >>To: Blum, Roy >>Cc: Gordon Brown; bioconductor at r-project.org >>Subject: Re: Please add me to the Dropbox containing the vignette data >> >>Hi Roy- >> >>First, I am obliged to discourage you from doing this type of analysis >>without replicates, for two reasons: 1) it is not good science, as >>biological and experimental variability is high in these types of >>experiments, and your samples may not be representative; and 2) because >>the statistical techniques that DiffBind relies on (embodied in the >>edgeR, >>DESeq, and DESeq2 packages) require replication to properly calculate >>confidence statistics. >> >>Technically, DiffBind will handle this comparison. You may want to do >>some >>simpler overlaps (dba.plotVenn, dba.overlap) to detect regions identified >>as enriched in only one condition. If you want to compute fold changes >>based on read counts, you can call dba.count with minOverlap=1, which >>will >>include all the called peaks including those that do not overlap. Then >>set >>up a contrast using dba.contrast with one condition as group1 and the >>other as group2 (you will be warned again about the lack of replication). >>You can call dba.analyze (again, the underlying method is likely to issue >>a warning relating to the lack of replication) to do the comparison, then >>call dba.report with th=1 to get all the fold changes, computed as the >>log2 normalized counts in the first condition minus the log2 normalized >>counts in the second condition for each interval. This report will also >>include confidence statistics that you probably shouldn't take very >>seriously for the reasons described above. >> >>Cheers- >>Rory >> >>On 13/02/2014 19:16, "Blum, Roy" <roy.blum at="" nyumc.org=""> wrote: >> >>>Dear Gord and Rory, >>> >>>I am exploring your DiffBind software and would like to inquire >>>regarding >>>the following - >>> >>>I would refer to a very simple scenario in which DiffBind is loaded with >>>data of one histone mark tested across two conditions - before and after >>>treatment (no replicates for any of the conditions). >>> >>>Would it be still possible to draw the basic analysis presented in the >>>tutorial? >>> >>>In general - would condition-specific peaks (that do not overlap with a >>>corresponding peak in the other condition) be still considered as part >>>of >>>the statistical analysis performed by DiffBind? Or, does the statistical >>>analysis limited only to the 'shared peaks' and reports on affinity >>>changes only within 'shared' peaks (which shared within the two >>>conditions)? >>>Is there a way that DiffBind can report on all the condition- exclusive >>>peaks (ones that are deposited only in one condition but have zero >>>deposition in the other?) - how would the fold change difference be >>>calculated in such events? >>> >>> >>>Thanks a lot! >>>Roy >>>-- >>>Roy Blum, Ph.D. >>>Senior Research Scientist >>>Cancer Institute, Smilow Research Building, >>>New York University School of Medicine, >>>12th Floor, Room 1206 >>>552 First Ave. >>>New York, NY, 10016 >>>Mob: +1 (646)-716-2875 >>>Lab: +1 (212)-263-2327 >>>http://blumroy.googlepages.com >>> >>>________________________________________ >>>From: Blum, Roy >>>Sent: Thursday, February 13, 2014 10:01 AM >>>To: Gordon Brown >>>Subject: RE: Please add me to the Dropbox containing the vignette data >>> >>>Hi Gord, >>> >>>Thanks for you reply and for the wonderful DiffBind tool! >>> >>>I've got the link for the data files from Rory by now. >>>Btw, this is the link: >>>https://www.dropbox.com/s/bqxnqhvr7sol1za/DiffBindVignette.zip >>>in case that someone inquires for it in the future. >>> >>>Best wishes! >>>Roy >>> >>>-- >>>Roy Blum, Ph.D. >>>Senior Research Scientist >>>Cancer Institute, Smilow Research Building, >>>New York University School of Medicine, >>>12th Floor, Room 1206 >>>552 First Ave. >>>New York, NY, 10016 >>>Mob: +1 (646)-716-2875 >>>Lab: +1 (212)-263-2327 >>>http://blumroy.googlepages.com >>> >>>________________________________________ >>>From: Gordon Brown [Gordon.Brown at cruk.cam.ac.uk] >>>Sent: Thursday, February 13, 2014 9:24 AM >>>To: Blum, Roy >>>Subject: Re: Please add me to the Dropbox containing the vignette data >>> >>>Hi, Roy, >>> >>>Sorry for the slow response. As far as I know, the data should be >>>publicly visible, so I suspect the error was just a transient error. >>>Can >>>you re-try? (Or maybe Rory has already responded, in which case ignore >>>this...). >>> >>>Cheers, >>> >>> - Gord >>> >>> >>>On 2014-02-10 18:11, "Blum, Roy" <roy.blum at="" nyumc.org=""> wrote: >>> >>>>Dear Gordon, >>>> >>>> >>>>I am currently interested in learning how to use your DiffBind >>>>software. >>>> >>>> >>>>Would you kindly add me to the Dropbox containing the vignette data? >>>> >>>> >>>>My attempt to execute the command line: >>>>source(file.path(system.file("extra", >>>>package="DiffBind"),"tamoxifen_GEO.R")) >>>>failed .... >>>> >>>>Here's the output which was plotted on my R screen: >>>>Thanks a lot in advance! (Rory Stark seems to be away..) >>>> >>>> >>>>Roy Blum >>>> >>>>The email address which I use for my Dropbox activity is: >>>>blumroy at gmail.com (please add this email address as well!, Thanks!) >>>> >>>> >>>> >>>> >>>> >>>>> source(file.path(system.file("extra", >>>>>package="DiffBind"),"tamoxifen_GEO.R")) >>>>Loading required package: Biobase >>>>Welcome to Bioconductor >>>> >>>> >>>> Vignettes contain introductory material; view with >>>>'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', >>>>and >>>>for >>>> packages 'citation("pkgname")'. >>>> >>>> >>>> >>>> >>>>Attaching package: ?Biobase? >>>> >>>> >>>>The following object is masked _by_ ?.GlobalEnv?: >>>> >>>> >>>> exprs >>>> >>>> >>>>Setting options('download.file.method.GEOquery'='auto') >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798430/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798430/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>30_SLX-2645.443.s_5_SLX-2577.443.s_8_peaks.txt.gz' >>>>ftp data connection made, file length 889489 bytes >>>>opened URL >>>>downloaded 868 Kb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798431/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798431/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>31_SLX-2576.443.s_7_SLX-2577.443.s_8_peaks.txt.gz' >>>>ftp data connection made, file length 863440 bytes >>>>opened URL >>>>downloaded 843 Kb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798443/suppl/" >>>>No supplemental files found >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798440/suppl/" >>>>No supplemental files found >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798423/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798423/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>23_SLX-2640.438.s_1_SLX-2574.433.s_2_peaks.txt.gz' >>>>ftp data connection made, file length 1566858 bytes >>>>opened URL >>>>downloaded 1.5 Mb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798424/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798424/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>24_SLX-2773.448.s_1_SLX-2574.433.s_2_peaks.txt.gz' >>>>ftp data connection made, file length 1047867 bytes >>>>opened URL >>>>downloaded 1023 Kb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798425/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798425/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>25_SLX-2943.469.s_2_SLX-2574.433.s_2_peaks.txt.gz' >>>>ftp data connection made, file length 1436673 bytes >>>>opened URL >>>>downloaded 1.4 Mb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798428/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798428/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>28_SLX-2775.448.s_3_T47D_Input_peaks.txt.gz' >>>>ftp data connection made, file length 621444 bytes >>>>opened URL >>>>downloaded 606 Kb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798429/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798429/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>29_SLX-2867.466.s_6_T47D_Input_peaks.txt.gz' >>>>ftp data connection made, file length 508000 bytes >>>>opened URL >>>>downloaded 496 Kb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798442/suppl/" >>>>No supplemental files found >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798432/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798432/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>32_SLX-3229.521.s_5_SLX-1651.307.s_1_peaks.txt.gz' >>>>ftp data connection made, file length 1099858 bytes >>>>opened URL >>>>downloaded 1.0 Mb >>>> >>>> >>>>[1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798433/suppl/" >>>>trying URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798433/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>33_SLX-3230.526.s_4_SLX-3231.526.s_5_peaks.txt.gz' >>>>Error in download.file(file.path(url, i), destfile = >>>>file.path(storedir, >>>>: >>>> cannot open URL >>>>'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM798nnn/GSM798433/suppl/ /GSM7 >>>>9 >>>>8 >>>>4 >>>>33_SLX-3230.526.s_4_SLX-3231.526.s_5_peaks.txt.gz' >>>> >>>> >>>> >>>>-- >>>>Roy Blum, Ph.D. >>>>Senior Research Scientist >>>>Cancer >>>> Institute, Smilow Research Building, >>>>New York University School of Medicine, >>>>12th Floor, Room 1206 >>>>552 First Ave. >>>>New York, NY, 10016 >>>>Mob: +1 (646)-716-2875 >>>>Lab: +1 (212)-263-2327 >>>>http://blumroy.googlepages.com <http: blumroy.googlepages.com=""/> >>>> <http: blumroy.googlepages.com=""/> >>>> >>>> >>>>________________________________________ >>>>From: Rory Stark [Rory.Stark at cruk.cam.ac.uk] >>>>Sent: Monday, February 10, 2014 11:39 AM >>>>To: Blum, Roy >>>>Subject: Automatic reply: Please add me to the Dropbox containing the >>>>vignette data >>>> >>>> >>>>I am out of the office until 3 January. If it is urgent, please contact >>>>Matt Eldridge. >>>> >>>> >>>> >>>> >>>> >>>>------------------------------------------------------------ >>>>This email message, including any attachments, is for the sole use of >>>>the >>>>intended recipient(s) and may contain information that is proprietary, >>>>confidential, and exempt from disclosure under applicable law. Any >>>>unauthorized review, use, disclosure, or distribution >>>> is prohibited. If you have received this email in error please notify >>>>the sender by return email and delete the original message. Please >>>>note, >>>>the recipient should check this email and any attachments for the >>>>presence of viruses. The organization accepts no >>>> liability for any damage caused by any virus transmitted by this >>>>email. >>>>================================= >>>> >>>> >>> >>> >>>------------------------------------------------------------ >>>This email message, including any attachments, is for the sole use of >>>the >>>intended recipient(s) and may contain information that is proprietary, >>>confidential, and exempt from disclosure under applicable law. Any >>>unauthorized review, use, disclosure, or distribution is prohibited. If >>>you have received this email in error please notify the sender by return >>>email and delete the original message. Please note, the recipient should >>>check this email and any attachments for the presence of viruses. The >>>organization accepts no liability for any damage caused by any virus >>>transmitted by this email. >>>================================= >>> >> >> >>------------------------------------------------------------ >>This email message, including any attachments, is for the sole use of the >>intended recipient(s) and may contain information that is proprietary, >>confidential, and exempt from disclosure under applicable law. Any >>unauthorized review, use, disclosure, or distribution is prohibited. If >>you have received this email in error please notify the sender by return >>email and delete the original message. Please note, the recipient should >>check this email and any attachments for the presence of viruses. The >>organization accepts no liability for any damage caused by any virus >>transmitted by this email. >>================================= >> > > >------------------------------------------------------------ >This email message, including any attachments, is for the sole use of the >intended recipient(s) and may contain information that is proprietary, >confidential, and exempt from disclosure under applicable law. Any >unauthorized review, use, disclosure, or distribution is prohibited. If >you have received this email in error please notify the sender by return >email and delete the original message. Please note, the recipient should >check this email and any attachments for the presence of viruses. The >organization accepts no liability for any damage caused by any virus >transmitted by this email. >================================= >
glad edger diffbind deseq2 • 777 views
ADD COMMENTlink written 5.6 years ago by Rory Stark2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 200 users visited in the last hour