DESeq on CCAT identified chipseq peaks

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.3 years ago

I plan on using DESeq downstream of CCAT identified peaks on 5 tumor and 5 normal samples and I was unsure of how to best create a unified list of peaks and corresponding read counts - CCAT outputs different peak regions from each sample. Thus to create a unified list of peak regions and their read counts would you suggest - A. Taking a union of all the CCAT called peaks and calculating read count in each biological replicate OR B. Calculating the read count for each peak in each replicate whether or not it has been called in the replicate or not I saw both being suggested earlier online and I am not sure which is appropriate. 2. Since this is chipseq and not rna seq data, do you agree that using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks > countdata) would work as good as HTseqcount ? Thanks ! -- output of sessionInfo(): - -- Sent via the guest posting facility at bioconductor.org.

ChIPSeq chipseq DESeq ChIPSeq chipseq DESeq • 1.7k views

ADD COMMENT • link updated 10.6 years ago by Rory Stark ★ 5.2k • written 10.6 years ago by Guest User ★ 13k

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 7 weeks ago

Cambridge, UK

You do indeed want to form a consensus peakset from the replicates. How you do this depends on exactly what question you are trying to ask. You can take the union of all peak and count the reads for each peak in each replicate, or you use more stringent criteria in determining the consensus peakset, such as peaks that appear in at least 2 (or 3) replicates, or perhaps the union of peaks that appear in a majority of each condition (ie peaks identified in at least 3 of 5 tumors OR in at least 3 of 5 normals). The DiffBind package provides tools to do exactly this, and the user guide/vignette walks through an example in some detail. Besides assembling consensus peaksets, DiffBind will handle the counting (with various options) and differential analysis using edgeR, DESeq, and/or DESeq2, and has convenient tools for reporting and plotting results. Cheers- Rory on Wed May 14 18:16:36 CEST 2014 Aditi [guest] guest at bioconductor.org wrote: > I plan on using DESeq downstream of CCAT identified peaks on 5 tumor and 5 normal samples and I was unsure of how to best create a > unified list of peaks and corresponding read counts - > CCAT outputs different peak regions from each sample. Thus to create a unified list of peak regions and their read counts would you suggest - > > A. Taking a union of all the CCAT called peaks and calculating read count in each biological replicate OR > > B. Calculating the read count for each peak in each replicate whether or not it has been called in the replicate or not > > I saw both being suggested earlier online and I am not sure which is appropriate. > > 2. Since this is chipseq and not rna seq data, do you agree that using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks > countdata) would work as > good as HTseqcount ?

ADD COMMENT • link 10.6 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Aditi on Wed May 14 18:16:36 CEST 2014 Aditi [guest] guest at bioconductor.org wrote: [...] Thus to create a unified list of peak regions and their read counts would you suggest - >> >> A. Taking a union of all the CCAT called peaks and calculating read >> count in each biological replicate OR >> >> B. Calculating the read count for each peak in each replicate >> whether or not it has been called in the replicate or not >> >> I saw both being suggested earlier online and I am not sure which >> is appropriate. Both approaches will give statistically valid results. I don't have much experience with ChIP-Seq myself, so I suggest to follow Rory's advice (and the DiffBind vignette) to get most inferential power. >> 2. Since this is chipseq and not rna seq data, do you agree that >> using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks > >> countdata) would work as > good as HTseqcount ? Yes. As the features do not overlap, this should not make a difference. BEDtools might be easier to use here, as you probably have the peaks in BED format, anyway. Or you use Rory's DiffBind package. Simon

ADD REPLY • link 10.6 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Hi Dr. Rory, Thanks a lot for pointing this out. I wanted to confirm one thing while using diffbind - If my sample sheet looks like - SampleID Tissue Factor Condition Treatment Replicate bamReads bamControl Peaks PeakCaller PeakFormat ScoreCol LowerBetter 1 T h3k4me3 tumor none 1 PATH PATH PATH raw raw 4 FALSE 2 N h3k4me3 normal none 1 PATH PATH PATH raw raw 4 FALSE 3 T h3k4me3 tumor none 2 PATH PATH PATH raw raw 4 FALSE 4 N h3k4me3 normal none 2 PATH PATH PATH raw raw 4 FALSE 5 T h3k4me3 tumor none 3 PATH PATH PATH raw raw 4 FALSE 6 N h3k4me3 normal none 3 PATH PATH PATH raw raw 4 FALSE 7 T h3k4me4 tumor none 4 PATH PATH PATH raw raw 5 FALSE 8 N h3k4me5 normal none 4 PATH PATH PATH raw raw 6 FALSE 9 T h3k4me6 tumor none 5 PATH PATH PATH raw raw 7 FALSE 10 N h3k4me7 normal none 5 PATH PATH PATH raw raw 8 FALSE Then to create a consensus peakset from the union of peaks that appear in atleast 3 of 5 samples of each condition, the commandline would be - h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus = DBA_CONDITION, minOverlap=0.6) I am not too clear on how to use this command and thus wanted to confirm. Thanks ! Aditi -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Rory Stark Sent: Thursday, May 15, 2014 12:59 AM To: bioconductor@r-project.org Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks You do indeed want to form a consensus peakset from the replicates. How you do this depends on exactly what question you are trying to ask. You can take the union of all peak and count the reads for each peak in each replicate, or you use more stringent criteria in determining the consensus peakset, such as peaks that appear in at least 2 (or 3) replicates, or perhaps the union of peaks that appear in a majority of each condition (ie peaks identified in at least 3 of 5 tumors OR in at least 3 of 5 normals). The DiffBind package provides tools to do exactly this, and the user guide/vignette walks through an example in some detail. Besides assembling consensus peaksets, DiffBind will handle the counting (with various options) and differential analysis using edgeR, DESeq, and/or DESeq2, and has convenient tools for reporting and plotting results. Cheers- Rory on Wed May 14 18:16:36 CEST 2014 Aditi [guest] guest at bioconductor.org wrote: > I plan on using DESeq downstream of CCAT identified peaks on 5 tumor > and 5 normal samples and I was unsure of how to best create a > > unified list of peaks and corresponding read counts - CCAT outputs > different peak regions from each sample. Thus to create a unified list > of peak regions and their read counts would you suggest - > > A. Taking a union of all the CCAT called peaks and calculating read > count in each biological replicate OR > > B. Calculating the read count for each peak in each replicate whether > or not it has been called in the replicate or not > > I saw both being suggested earlier online and I am not sure which is appropriate. > > 2. Since this is chipseq and not rna seq data, do you agree that using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks > countdata) would work as > good as HTseqcount ? _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago QAMRA Aditi GIS ▴ 120

0

Entering edit mode

Hello Aditi- It is a bit more complicated to derive a consensus-of-consensus peakset, but it can be done in a few steps. Assuming you've read your data into h3k4me3_readin, you first have to create a new object with the two consensus peaksets (one for each condition): > h3k4me3_consensus <- dba.peakset(h3k4me3_readin, consensus = DBA_CONDITION, minOverlap=0.6) If you look at h3k4me3_consensus, it will have two new consensus peaksets added (as sets 11 and 12). Now you want to make the final consensus peakset as the union of these: > h3k4me3_consensus <- dba.peakset( h3k4me3_consensus, consensus=11:12, minOverlap=1) Now you can retrieve the final peakset as a GRanges object: > h3k4me3_peakset <- dba.peakset(h3k4me3_consensus, 13, bRetrieve=T) And supply it to dba.count for counting: > h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset) Hope this helps! Cheers- Rory >> on Fri, 16 May 2014 01:40:30 +0800 Aditi [guest] guest at bioconductor.org wrote: >> >> Hi Dr. Rory, >> >> Thanks a lot for pointing this out. >> >> I wanted to confirm one thing while using diffbind - If my sample sheet looks like - >> >> SampleID Tissue Factor Condition Treatment Replicate bamReads bamControl Peaks PeakCaller PeakFormat ScoreCol LowerBetter 1 T h3k4me3 tumor none 1 PATH PATH PATH raw raw 4 FALSE 2 N h3k4me3 normal none 1 PATH PATH PATH raw raw 4 FALSE 3 T h3k4me3 tumor none 2 PATH PATH PATH raw raw 4 FALSE 4 N h3k4me3 normal none 2 PATH PATH PATH raw raw 4 FALSE 5 T h3k4me3 tumor none 3 PATH PATH PATH raw raw 4 FALSE 6 N h3k4me3 normal none 3 PATH PATH PATH raw raw 4 FALSE 7 T h3k4me4 tumor none 4 PATH PATH PATH raw raw 5 FALSE 8 N h3k4me5 normal none 4 PATH PATH PATH raw raw 6 FALSE 9 T h3k4me6 tumor none 5 PATH PATH PATH raw raw 7 FALSE 10 N h3k4me7 normal none 5 PATH PATH PATH raw raw 8 FALSE >> Then to create a consensus peakset from the union of peaks that appear in atleast 3 of 5 samples of each condition, the commandline would be >> >> h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus = DBA_CONDITION, minOverlap=0.6) >> >> I am not too clear on how to use this command and thus wanted to confirm. >> >> Thanks ! >> Aditi [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Dr. Rory, I understand now. Thank you ! A last question (hopefully) - Can you explain a little more on how the use of a blocking factor works in the case of matched normal tumor pairs ? Does it mean that using the DBA_REPLICATE condition as a blocking factor in such a case adjusts (?) and removes any sort of batch effects between replicates ? Thanks ! Aditi ________________________________________ From: Rory Stark [Rory.Stark@cruk.cam.ac.uk] Sent: Friday, May 16, 2014 2:08 AM To: QAMRA Aditi (GIS) Cc: bioconductor at r-project.org Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks Hello Aditi- It is a bit more complicated to derive a consensus-of-consensus peakset, but it can be done in a few steps. Assuming you've read your data into h3k4me3_readin, you first have to create a new object with the two consensus peaksets (one for each condition): > h3k4me3_consensus <- dba.peakset(h3k4me3_readin, consensus = DBA_CONDITION, minOverlap=0.6) If you look at h3k4me3_consensus, it will have two new consensus peaksets added (as sets 11 and 12). Now you want to make the final consensus peakset as the union of these: > h3k4me3_consensus <- dba.peakset( h3k4me3_consensus, consensus=11:12, minOverlap=1) Now you can retrieve the final peakset as a GRanges object: > h3k4me3_peakset <- dba.peakset(h3k4me3_consensus, 13, bRetrieve=T) And supply it to dba.count for counting: > h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset) Hope this helps! Cheers- Rory >> on Fri, 16 May 2014 01:40:30 +0800 Aditi [guest] guest at bioconductor.org wrote: >> >> Hi Dr. Rory, >> >> Thanks a lot for pointing this out. >> >> I wanted to confirm one thing while using diffbind - If my sample sheet looks like - >> >> SampleID Tissue Factor Condition Treatment Replicate bamReads bamControl Peaks PeakCaller PeakFormat ScoreCol LowerBetter 1 T h3k4me3 tumor none 1 PATH PATH PATH raw raw 4 FALSE 2 N h3k4me3 normal none 1 PATH PATH PATH raw raw 4 FALSE 3 T h3k4me3 tumor none 2 PATH PATH PATH raw raw 4 FALSE 4 N h3k4me3 normal none 2 PATH PATH PATH raw raw 4 FALSE 5 T h3k4me3 tumor none 3 PATH PATH PATH raw raw 4 FALSE 6 N h3k4me3 normal none 3 PATH PATH PATH raw raw 4 FALSE 7 T h3k4me4 tumor none 4 PATH PATH PATH raw raw 5 FALSE 8 N h3k4me5 normal none 4 PATH PATH PATH raw raw 6 FALSE 9 T h3k4me6 tumor none 5 PATH PATH PATH raw raw 7 FALSE 10 N h3k4me7 normal none 5 PATH PATH PATH raw raw 8 FALSE >> Then to create a consensus peakset from the union of peaks that appear in atleast 3 of 5 samples of each condition, the commandline would be ? >> >> h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus = DBA_CONDITION, minOverlap=0.6) >> >> I am not too clear on how to use this command and thus wanted to confirm. >> >> Thanks ! >> Aditi ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act.

ADD REPLY • link 10.6 years ago QAMRA Aditi GIS ▴ 120

0

Entering edit mode

Hello Aditi- What you want is to use a "matched" design. There is a good explanation of this design (in a differential expression context) in the edgeR vignette. Basically, the matched tumour-normal pairs are going to have certain similarities to each other as they each come from the same patient. A matched design will model this to detect consistent differences in enrichment between tumor and normal that is independent of individual patients. You can analyze a matched design by setting up the contrast with block=DBA_REPLICATE: > h3k4me3_counts = dba.contrast(h3k4me3_counts, categories=DBA_CONDITION, >block=DBA_REPLICATE) > h3k4me3_counts = dba.analyze(h3k4me3_counts, method=DBA_DESEQ2) You'll see that two analyses are run (unmatched and matched): > h3k4me3_counts Is is useful to look at the MA plot: > dba.plotMA(h3k4me3_counts, method=DBA_DESEQ2_BLOCK) You can get the list of all the sites with statistics relating to how confidently they can be identified as being differentially enriched: > matchedReport = dba.report(h3k4me3_counts, method=DBA_DESEQ2_BLOCK, th=1) Cheers- Rory On 16/05/2014 06:54, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg=""> wrote: >Hi Dr. Rory, > >I understand now. Thank you ! > >A last question (hopefully) - Can you explain a little more on how the >use of a blocking factor works in the case of matched normal tumor pairs >? Does it mean that using the DBA_REPLICATE condition as a blocking >factor in such a case adjusts (?) and removes any sort of batch effects >between replicates ? > >Thanks ! >Aditi >________________________________________

ADD REPLY • link 10.6 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you so much for explaining all of it so well. I read an answer you just gave about using the summits option in dba.counts - What version of diffbind is that ? I am using version 1.8.5 in Rversion 3.0.2 and I can't use the summits option. Aditi -----Original Message----- From: Rory Stark [mailto:Rory.Stark@cruk.cam.ac.uk] Sent: Friday, May 16, 2014 10:17 PM To: QAMRA Aditi (GIS) Cc: bioconductor at r-project.org Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks Hello Aditi- What you want is to use a "matched" design. There is a good explanation of this design (in a differential expression context) in the edgeR vignette. Basically, the matched tumour-normal pairs are going to have certain similarities to each other as they each come from the same patient. A matched design will model this to detect consistent differences in enrichment between tumor and normal that is independent of individual patients. You can analyze a matched design by setting up the contrast with block=DBA_REPLICATE: > h3k4me3_counts = dba.contrast(h3k4me3_counts, >categories=DBA_CONDITION, >block=DBA_REPLICATE) > h3k4me3_counts = dba.analyze(h3k4me3_counts, method=DBA_DESEQ2) You'll see that two analyses are run (unmatched and matched): > h3k4me3_counts Is is useful to look at the MA plot: > dba.plotMA(h3k4me3_counts, method=DBA_DESEQ2_BLOCK) You can get the list of all the sites with statistics relating to how confidently they can be identified as being differentially enriched: > matchedReport = dba.report(h3k4me3_counts, method=DBA_DESEQ2_BLOCK, > th=1) Cheers- Rory On 16/05/2014 06:54, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg=""> wrote: >Hi Dr. Rory, > >I understand now. Thank you ! > >A last question (hopefully) - Can you explain a little more on how the >use of a blocking factor works in the case of matched normal tumor >pairs ? Does it mean that using the DBA_REPLICATE condition as a >blocking factor in such a case adjusts (?) and removes any sort of >batch effects between replicates ? > >Thanks ! >Aditi >________________________________________ ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act.

ADD REPLY • link 10.6 years ago QAMRA Aditi GIS ▴ 120

0

Entering edit mode

The current released version of DiffBind is 1.10, Bioconductor is at 2.14, and it all requires the most recent release of R, 3.1.0 (Spring Dance). -Rory On 16/05/2014 16:34, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg=""> wrote: >Thank you so much for explaining all of it so well. >I read an answer you just gave about using the summits option in >dba.counts - What version of diffbind is that ? I am using version 1.8.5 >in Rversion 3.0.2 and I can't use the summits option. > >Aditi > >-----Original Message----- >From: Rory Stark [mailto:Rory.Stark at cruk.cam.ac.uk] >Sent: Friday, May 16, 2014 10:17 PM >To: QAMRA Aditi (GIS) >Cc: bioconductor at r-project.org >Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks > >Hello Aditi- > >What you want is to use a "matched" design. There is a good explanation >of this design (in a differential expression context) in the edgeR >vignette. >Basically, the matched tumour-normal pairs are going to have certain >similarities to each other as they each come from the same patient. A >matched design will model this to detect consistent differences in >enrichment between tumor and normal that is independent of individual >patients. > >You can analyze a matched design by setting up the contrast with >block=DBA_REPLICATE: > >> h3k4me3_counts = dba.contrast(h3k4me3_counts, >>categories=DBA_CONDITION, >>block=DBA_REPLICATE) >> h3k4me3_counts = dba.analyze(h3k4me3_counts, method=DBA_DESEQ2) > >You'll see that two analyses are run (unmatched and matched): > >> h3k4me3_counts > >Is is useful to look at the MA plot: > >> dba.plotMA(h3k4me3_counts, method=DBA_DESEQ2_BLOCK) > >You can get the list of all the sites with statistics relating to how >confidently they can be identified as being differentially enriched: > >> matchedReport = dba.report(h3k4me3_counts, method=DBA_DESEQ2_BLOCK, >> th=1) > >Cheers- >Rory > > >On 16/05/2014 06:54, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg=""> >wrote: > >>Hi Dr. Rory, >> >>I understand now. Thank you ! >> >>A last question (hopefully) - Can you explain a little more on how the >>use of a blocking factor works in the case of matched normal tumor >>pairs ? Does it mean that using the DBA_REPLICATE condition as a >>blocking factor in such a case adjusts (?) and removes any sort of >>batch effects between replicates ? >> >>Thanks ! >>Aditi >>________________________________________ > > >------------------------------- >This e-mail and any attachments are only for the use of the intended >recipient and may be confidential and/or privileged. If you are not the >recipient, please delete it or notify the sender immediately. Please do >not copy or use it for any purpose or disclose the contents to any other >person as it may be an offence under the Official Secrets Act. >-------------------------------

ADD REPLY • link 10.6 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory, I ran DiffBind using DEseq2 to get list of differential peaks between tumors and normals (3 biological replicates each). At the same time, I extracted the raw read count matrix from DiffBind and ran DESeq2 independently. However I get different results. To explain further - DiffBind- > h3k4me3_readin <- dba(sampleSheet="h3k4me3.csv") ## Read in datasheet > h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset, score=DBA_SCORE_READS) > h3k4me3_contrast = dba.contrast(h3k4me3_counts, categories=DBA_CONDITION, block=DBA_REPLICATE) > h3k4me3_deseq2_analysis <- dba.analyze(h3k4me3_contrast, method= DBA_DESEQ2, bReduceObjects=FALSE, bSubControl=FALSE) On applying a filter of 0.05 FDR - I got 1261 DE peaks. DESeq2- ## CountData - Dataframe created from CSV file extracted from h3k4me3_counts ## PhenoData - Attached group/condition information in a dataframe > dds <- DESeqDataSetFromMatrix(countData = CountData, colData = phenodata, design = ~ Replicate + Condition) > dds2 <- DESeq(dds) > res <- results(dds2) On applying a filter of 0.05 on the adjusted p value in 'res' dataframe, I got 743 DE peaks. Could you please explain why I see this difference ? Thanks ! Aditi ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago QAMRA Aditi GIS ▴ 120

0

Entering edit mode

Hi Dr. Rory, I had another question about the creation of consensus peaksets - After creating a consensus peakset each for tumor and normal, I also wanted to create a peakset for tumor sample 1 and consensus peakset for normals i.e set 1 and 12 in the example given below. I thought the command for the same was - > h3k4me3_consensus_sample1 <- dba.peakset( h3k4me3_consensus, consensus=c(1,12), minOverlap=1) But it doesn't seem to be creating any peakset. Is this command correct ? Thanks ! Aditi From: Rory Stark [mailto:Rory.Stark@cruk.cam.ac.uk] Sent: Friday, May 16, 2014 2:09 AM To: QAMRA Aditi (GIS) Cc: bioconductor@r-project.org Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks Hello Aditi- It is a bit more complicated to derive a consensus-of-consensus peakset, but it can be done in a few steps. Assuming you've read your data into h3k4me3_readin, you first have to create a new object with the two consensus peaksets (one for each condition): > h3k4me3_consensus <- dba.peakset(h3k4me3_readin, consensus = DBA_CONDITION, minOverlap=0.6) If you look at h3k4me3_consensus, it will have two new consensus peaksets added (as sets 11 and 12). Now you want to make the final consensus peakset as the union of these: > h3k4me3_consensus <- dba.peakset( h3k4me3_consensus, consensus=11:12, minOverlap=1) Now you can retrieve the final peakset as a GRanges object: > h3k4me3_peakset <- dba.peakset(h3k4me3_consensus, 13, bRetrieve=T) And supply it to dba.count for counting: > h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset) Hope this helps! Cheers- Rory >> on Fri, 16 May 2014 01:40:30 +0800 Aditi [guest] guest at bioconductor.org wrote: >> >> Hi Dr. Rory, >> >> Thanks a lot for pointing this out. >> >> I wanted to confirm one thing while using diffbind - If my sample sheet looks like - >> >> SampleID Tissue Factor Condition Treatment Replicate bamReads bamControl Peaks PeakCaller PeakFormat ScoreCol LowerBetter 1 T h3k4me3 tumor none 1 PATH PATH PATH raw raw 4 FALSE 2 N h3k4me3 normal none 1 PATH PATH PATH raw raw 4 FALSE 3 T h3k4me3 tumor none 2 PATH PATH PATH raw raw 4 FALSE 4 N h3k4me3 normal none 2 PATH PATH PATH raw raw 4 FALSE 5 T h3k4me3 tumor none 3 PATH PATH PATH raw raw 4 FALSE 6 N h3k4me3 normal none 3 PATH PATH PATH raw raw 4 FALSE 7 T h3k4me4 tumor none 4 PATH PATH PATH raw raw 5 FALSE 8 N h3k4me5 normal none 4 PATH PATH PATH raw raw 6 FALSE 9 T h3k4me6 tumor none 5 PATH PATH PATH raw raw 7 FALSE 10 N h3k4me7 normal none 5 PATH PATH PATH raw raw 8 FALSE >> Then to create a consensus peakset from the union of peaks that appear in atleast 3 of 5 samples of each condition, the commandline would be - >> >> h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus = DBA_CONDITION, minOverlap=0.6) >> >> I am not too clear on how to use this command and thus wanted to confirm. >> >> Thanks ! >> Aditi ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago QAMRA Aditi GIS ▴ 120

Login before adding your answer.