I plan on using DESeq downstream of CCAT identified peaks on 5 tumor
and 5 normal samples and I was unsure of how to best create a unified
list of peaks and corresponding read counts -
CCAT outputs different peak regions from each sample. Thus to create a
unified list of peak regions and their read counts would you suggest -
A. Taking a union of all the CCAT called peaks and calculating read
count in each biological replicate OR
B. Calculating the read count for each peak in each replicate whether
or not it has been called in the replicate or not
I saw both being suggested earlier online and I am not sure which is
appropriate.
2. Since this is chipseq and not rna seq data, do you agree that using
coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks > countdata)
would work as good as HTseqcount ?
Thanks !
-- output of sessionInfo():
-
--
Sent via the guest posting facility at bioconductor.org.
You do indeed want to form a consensus peakset from the replicates.
How you do this depends on exactly what question you are trying to
ask. You can take the union of all peak and count the reads for each
peak in each replicate, or you use more stringent criteria in
determining the consensus peakset, such as peaks that appear in at
least 2 (or 3) replicates, or perhaps the union of peaks that appear
in a majority of each condition (ie peaks identified in at least 3 of
5 tumors OR in at least 3 of 5 normals).
The DiffBind package provides tools to do exactly this, and the user
guide/vignette walks through an example in some detail. Besides
assembling consensus peaksets, DiffBind will handle the counting (with
various options) and differential analysis using edgeR, DESeq, and/or
DESeq2, and has convenient tools for reporting and plotting results.
Cheers-
Rory
on Wed May 14 18:16:36 CEST 2014 Aditi [guest] guest at
bioconductor.org wrote:
> I plan on using DESeq downstream of CCAT identified peaks on 5 tumor
and 5 normal samples and I was unsure of how to best create a >
unified list of peaks and corresponding read counts -
> CCAT outputs different peak regions from each sample. Thus to create
a unified list of peak regions and their read counts would you suggest
-
>
> A. Taking a union of all the CCAT called peaks and calculating read
count in each biological replicate OR
>
> B. Calculating the read count for each peak in each replicate
whether or not it has been called in the replicate or not
>
> I saw both being suggested earlier online and I am not sure which is
appropriate.
>
> 2. Since this is chipseq and not rna seq data, do you agree that
using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks >
countdata) would work as > good as HTseqcount ?
Hi Aditi
on Wed May 14 18:16:36 CEST 2014 Aditi [guest] guest at
bioconductor.org
wrote:
[...]
Thus to create a unified list of peak regions and their read counts
would you suggest -
>>
>> A. Taking a union of all the CCAT called peaks and calculating read
>> count in each biological replicate OR
>>
>> B. Calculating the read count for each peak in each replicate
>> whether or not it has been called in the replicate or not
>>
>> I saw both being suggested earlier online and I am not sure which
>> is appropriate.
Both approaches will give statistically valid results. I don't have
much
experience with ChIP-Seq myself, so I suggest to follow Rory's advice
(and the DiffBind vignette) to get most inferential power.
>> 2. Since this is chipseq and not rna seq data, do you agree that
>> using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks >
>> countdata) would work as > good as HTseqcount ?
Yes. As the features do not overlap, this should not make a
difference.
BEDtools might be easier to use here, as you probably have the peaks
in
BED format, anyway. Or you use Rory's DiffBind package.
Simon
Hi Dr. Rory,
Thanks a lot for pointing this out.
I wanted to confirm one thing while using diffbind - If my sample
sheet looks like -
SampleID
Tissue
Factor
Condition
Treatment
Replicate
bamReads
bamControl
Peaks
PeakCaller
PeakFormat
ScoreCol
LowerBetter
1
T
h3k4me3
tumor
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
2
N
h3k4me3
normal
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
3
T
h3k4me3
tumor
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
4
N
h3k4me3
normal
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
5
T
h3k4me3
tumor
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
6
N
h3k4me3
normal
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
7
T
h3k4me4
tumor
none
4
PATH
PATH
PATH
raw
raw
5
FALSE
8
N
h3k4me5
normal
none
4
PATH
PATH
PATH
raw
raw
6
FALSE
9
T
h3k4me6
tumor
none
5
PATH
PATH
PATH
raw
raw
7
FALSE
10
N
h3k4me7
normal
none
5
PATH
PATH
PATH
raw
raw
8
FALSE
Then to create a consensus peakset from the union of peaks that appear
in atleast 3 of 5 samples of each condition, the commandline would be
-
h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus =
DBA_CONDITION, minOverlap=0.6)
I am not too clear on how to use this command and thus wanted to
confirm.
Thanks !
Aditi
-----Original Message-----
From: bioconductor-bounces@r-project.org [mailto:bioconductor-
bounces@r-project.org] On Behalf Of Rory Stark
Sent: Thursday, May 15, 2014 12:59 AM
To: bioconductor@r-project.org
Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks
You do indeed want to form a consensus peakset from the replicates.
How you do this depends on exactly what question you are trying to
ask. You can take the union of all peak and count the reads for each
peak in each replicate, or you use more stringent criteria in
determining the consensus peakset, such as peaks that appear in at
least 2 (or 3) replicates, or perhaps the union of peaks that appear
in a majority of each condition (ie peaks identified in at least 3 of
5 tumors OR in at least 3 of 5 normals).
The DiffBind package provides tools to do exactly this, and the user
guide/vignette walks through an example in some detail. Besides
assembling consensus peaksets, DiffBind will handle the counting (with
various options) and differential analysis using edgeR, DESeq, and/or
DESeq2, and has convenient tools for reporting and plotting results.
Cheers-
Rory
on Wed May 14 18:16:36 CEST 2014 Aditi [guest] guest at
bioconductor.org wrote:
> I plan on using DESeq downstream of CCAT identified peaks on 5 tumor
> and 5 normal samples and I was unsure of how to best create a >
> unified list of peaks and corresponding read counts - CCAT outputs
> different peak regions from each sample. Thus to create a unified
list
> of peak regions and their read counts would you suggest -
>
> A. Taking a union of all the CCAT called peaks and calculating read
> count in each biological replicate OR
>
> B. Calculating the read count for each peak in each replicate
whether
> or not it has been called in the replicate or not
>
> I saw both being suggested earlier online and I am not sure which is
appropriate.
>
> 2. Since this is chipseq and not rna seq data, do you agree that
using coverageBed ( coverageBed -abam $bamfile -b $CCATpeaks >
countdata) would work as > good as HTseqcount ?
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
-------------------------------
This e-mail and any attachments are only for the use of the intended
recipient and may be confidential and/or privileged. If you are not
the recipient, please delete it or notify the sender immediately.
Please do not copy or use it for any purpose or disclose the contents
to any other person as it may be an offence under the Official Secrets
Act.
-------------------------------
[[alternative HTML version deleted]]
Hello Aditi-
It is a bit more complicated to derive a consensus-of-consensus
peakset, but it can be done in a few steps. Assuming you've read your
data into h3k4me3_readin, you first have to create a new object with
the two consensus peaksets (one for each condition):
> h3k4me3_consensus <- dba.peakset(h3k4me3_readin, consensus =
DBA_CONDITION, minOverlap=0.6)
If you look at h3k4me3_consensus, it will have two new consensus
peaksets added (as sets 11 and 12). Now you want to make the final
consensus peakset as the union of these:
> h3k4me3_consensus <- dba.peakset( h3k4me3_consensus,
consensus=11:12, minOverlap=1)
Now you can retrieve the final peakset as a GRanges object:
> h3k4me3_peakset <- dba.peakset(h3k4me3_consensus, 13, bRetrieve=T)
And supply it to dba.count for counting:
> h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset)
Hope this helps!
Cheers-
Rory
>> on Fri, 16 May 2014 01:40:30 +0800 Aditi [guest] guest at
bioconductor.org wrote:
>>
>> Hi Dr. Rory,
>>
>> Thanks a lot for pointing this out.
>>
>> I wanted to confirm one thing while using diffbind - If my sample
sheet looks like -
>>
>>
SampleID
Tissue
Factor
Condition
Treatment
Replicate
bamReads
bamControl
Peaks
PeakCaller
PeakFormat
ScoreCol
LowerBetter
1
T
h3k4me3
tumor
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
2
N
h3k4me3
normal
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
3
T
h3k4me3
tumor
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
4
N
h3k4me3
normal
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
5
T
h3k4me3
tumor
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
6
N
h3k4me3
normal
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
7
T
h3k4me4
tumor
none
4
PATH
PATH
PATH
raw
raw
5
FALSE
8
N
h3k4me5
normal
none
4
PATH
PATH
PATH
raw
raw
6
FALSE
9
T
h3k4me6
tumor
none
5
PATH
PATH
PATH
raw
raw
7
FALSE
10
N
h3k4me7
normal
none
5
PATH
PATH
PATH
raw
raw
8
FALSE
>> Then to create a consensus peakset from the union of peaks that
appear in atleast 3 of 5 samples of each condition, the commandline
would be
>>
>> h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus =
DBA_CONDITION, minOverlap=0.6)
>>
>> I am not too clear on how to use this command and thus wanted to
confirm.
>>
>> Thanks !
>> Aditi
[[alternative HTML version deleted]]
Hi Dr. Rory,
I understand now. Thank you !
A last question (hopefully) - Can you explain a little more on how the
use of a blocking factor works in the case of matched normal tumor
pairs ? Does it mean that using the DBA_REPLICATE condition as a
blocking factor in such a case adjusts (?) and removes any sort of
batch effects between replicates ?
Thanks !
Aditi
________________________________________
From: Rory Stark [Rory.Stark@cruk.cam.ac.uk]
Sent: Friday, May 16, 2014 2:08 AM
To: QAMRA Aditi (GIS)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks
Hello Aditi-
It is a bit more complicated to derive a consensus-of-consensus
peakset, but it can be done in a few steps. Assuming you've read your
data into h3k4me3_readin, you first have to create a new object with
the two consensus peaksets (one for each condition):
> h3k4me3_consensus <- dba.peakset(h3k4me3_readin, consensus =
DBA_CONDITION, minOverlap=0.6)
If you look at h3k4me3_consensus, it will have two new consensus
peaksets added (as sets 11 and 12). Now you want to make the final
consensus peakset as the union of these:
> h3k4me3_consensus <- dba.peakset( h3k4me3_consensus,
consensus=11:12, minOverlap=1)
Now you can retrieve the final peakset as a GRanges object:
> h3k4me3_peakset <- dba.peakset(h3k4me3_consensus, 13, bRetrieve=T)
And supply it to dba.count for counting:
> h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset)
Hope this helps!
Cheers-
Rory
>> on Fri, 16 May 2014 01:40:30 +0800 Aditi [guest] guest at
bioconductor.org wrote:
>>
>> Hi Dr. Rory,
>>
>> Thanks a lot for pointing this out.
>>
>> I wanted to confirm one thing while using diffbind - If my sample
sheet looks like -
>>
>>
SampleID
Tissue
Factor
Condition
Treatment
Replicate
bamReads
bamControl
Peaks
PeakCaller
PeakFormat
ScoreCol
LowerBetter
1
T
h3k4me3
tumor
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
2
N
h3k4me3
normal
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
3
T
h3k4me3
tumor
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
4
N
h3k4me3
normal
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
5
T
h3k4me3
tumor
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
6
N
h3k4me3
normal
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
7
T
h3k4me4
tumor
none
4
PATH
PATH
PATH
raw
raw
5
FALSE
8
N
h3k4me5
normal
none
4
PATH
PATH
PATH
raw
raw
6
FALSE
9
T
h3k4me6
tumor
none
5
PATH
PATH
PATH
raw
raw
7
FALSE
10
N
h3k4me7
normal
none
5
PATH
PATH
PATH
raw
raw
8
FALSE
>> Then to create a consensus peakset from the union of peaks that
appear in atleast 3 of 5 samples of each condition, the commandline
would be ?
>>
>> h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus =
DBA_CONDITION, minOverlap=0.6)
>>
>> I am not too clear on how to use this command and thus wanted to
confirm.
>>
>> Thanks !
>> Aditi
-------------------------------
This e-mail and any attachments are only for the use of the intended
recipient and may be confidential and/or privileged. If you are not
the recipient, please delete it or notify the sender immediately.
Please do not copy or use it for any purpose or disclose the contents
to any other person as it may be an offence under the Official Secrets
Act.
Hello Aditi-
What you want is to use a "matched" design. There is a good
explanation of
this design (in a differential expression context) in the edgeR
vignette.
Basically, the matched tumour-normal pairs are going to have certain
similarities to each other as they each come from the same patient. A
matched design will model this to detect consistent differences in
enrichment between tumor and normal that is independent of individual
patients.
You can analyze a matched design by setting up the contrast with
block=DBA_REPLICATE:
> h3k4me3_counts = dba.contrast(h3k4me3_counts,
categories=DBA_CONDITION,
>block=DBA_REPLICATE)
> h3k4me3_counts = dba.analyze(h3k4me3_counts, method=DBA_DESEQ2)
You'll see that two analyses are run (unmatched and matched):
> h3k4me3_counts
Is is useful to look at the MA plot:
> dba.plotMA(h3k4me3_counts, method=DBA_DESEQ2_BLOCK)
You can get the list of all the sites with statistics relating to how
confidently they can be identified as being differentially enriched:
> matchedReport = dba.report(h3k4me3_counts, method=DBA_DESEQ2_BLOCK,
th=1)
Cheers-
Rory
On 16/05/2014 06:54, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg="">
wrote:
>Hi Dr. Rory,
>
>I understand now. Thank you !
>
>A last question (hopefully) - Can you explain a little more on how
the
>use of a blocking factor works in the case of matched normal tumor
pairs
>? Does it mean that using the DBA_REPLICATE condition as a blocking
>factor in such a case adjusts (?) and removes any sort of batch
effects
>between replicates ?
>
>Thanks !
>Aditi
>________________________________________
Thank you so much for explaining all of it so well.
I read an answer you just gave about using the summits option in
dba.counts - What version of diffbind is that ? I am using version
1.8.5 in Rversion 3.0.2 and I can't use the summits option.
Aditi
-----Original Message-----
From: Rory Stark [mailto:Rory.Stark@cruk.cam.ac.uk]
Sent: Friday, May 16, 2014 10:17 PM
To: QAMRA Aditi (GIS)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks
Hello Aditi-
What you want is to use a "matched" design. There is a good
explanation of this design (in a differential expression context) in
the edgeR vignette.
Basically, the matched tumour-normal pairs are going to have certain
similarities to each other as they each come from the same patient. A
matched design will model this to detect consistent differences in
enrichment between tumor and normal that is independent of individual
patients.
You can analyze a matched design by setting up the contrast with
block=DBA_REPLICATE:
> h3k4me3_counts = dba.contrast(h3k4me3_counts,
>categories=DBA_CONDITION,
>block=DBA_REPLICATE)
> h3k4me3_counts = dba.analyze(h3k4me3_counts, method=DBA_DESEQ2)
You'll see that two analyses are run (unmatched and matched):
> h3k4me3_counts
Is is useful to look at the MA plot:
> dba.plotMA(h3k4me3_counts, method=DBA_DESEQ2_BLOCK)
You can get the list of all the sites with statistics relating to how
confidently they can be identified as being differentially enriched:
> matchedReport = dba.report(h3k4me3_counts, method=DBA_DESEQ2_BLOCK,
> th=1)
Cheers-
Rory
On 16/05/2014 06:54, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg="">
wrote:
>Hi Dr. Rory,
>
>I understand now. Thank you !
>
>A last question (hopefully) - Can you explain a little more on how
the
>use of a blocking factor works in the case of matched normal tumor
>pairs ? Does it mean that using the DBA_REPLICATE condition as a
>blocking factor in such a case adjusts (?) and removes any sort of
>batch effects between replicates ?
>
>Thanks !
>Aditi
>________________________________________
-------------------------------
This e-mail and any attachments are only for the use of the intended
recipient and may be confidential and/or privileged. If you are not
the recipient, please delete it or notify the sender immediately.
Please do not copy or use it for any purpose or disclose the contents
to any other person as it may be an offence under the Official Secrets
Act.
The current released version of DiffBind is 1.10, Bioconductor is at
2.14,
and it all requires the most recent release of R, 3.1.0 (Spring
Dance).
-Rory
On 16/05/2014 16:34, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg="">
wrote:
>Thank you so much for explaining all of it so well.
>I read an answer you just gave about using the summits option in
>dba.counts - What version of diffbind is that ? I am using version
1.8.5
>in Rversion 3.0.2 and I can't use the summits option.
>
>Aditi
>
>-----Original Message-----
>From: Rory Stark [mailto:Rory.Stark at cruk.cam.ac.uk]
>Sent: Friday, May 16, 2014 10:17 PM
>To: QAMRA Aditi (GIS)
>Cc: bioconductor at r-project.org
>Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks
>
>Hello Aditi-
>
>What you want is to use a "matched" design. There is a good
explanation
>of this design (in a differential expression context) in the edgeR
>vignette.
>Basically, the matched tumour-normal pairs are going to have certain
>similarities to each other as they each come from the same patient. A
>matched design will model this to detect consistent differences in
>enrichment between tumor and normal that is independent of individual
>patients.
>
>You can analyze a matched design by setting up the contrast with
>block=DBA_REPLICATE:
>
>> h3k4me3_counts = dba.contrast(h3k4me3_counts,
>>categories=DBA_CONDITION,
>>block=DBA_REPLICATE)
>> h3k4me3_counts = dba.analyze(h3k4me3_counts, method=DBA_DESEQ2)
>
>You'll see that two analyses are run (unmatched and matched):
>
>> h3k4me3_counts
>
>Is is useful to look at the MA plot:
>
>> dba.plotMA(h3k4me3_counts, method=DBA_DESEQ2_BLOCK)
>
>You can get the list of all the sites with statistics relating to how
>confidently they can be identified as being differentially enriched:
>
>> matchedReport = dba.report(h3k4me3_counts, method=DBA_DESEQ2_BLOCK,
>> th=1)
>
>Cheers-
>Rory
>
>
>On 16/05/2014 06:54, "QAMRA Aditi (GIS)" <qamraa99 at="" gis.a-star.edu.sg="">
>wrote:
>
>>Hi Dr. Rory,
>>
>>I understand now. Thank you !
>>
>>A last question (hopefully) - Can you explain a little more on how
the
>>use of a blocking factor works in the case of matched normal tumor
>>pairs ? Does it mean that using the DBA_REPLICATE condition as a
>>blocking factor in such a case adjusts (?) and removes any sort of
>>batch effects between replicates ?
>>
>>Thanks !
>>Aditi
>>________________________________________
>
>
>-------------------------------
>This e-mail and any attachments are only for the use of the intended
>recipient and may be confidential and/or privileged. If you are not
the
>recipient, please delete it or notify the sender immediately. Please
do
>not copy or use it for any purpose or disclose the contents to any
other
>person as it may be an offence under the Official Secrets Act.
>-------------------------------
Hi Rory,
I ran DiffBind using DEseq2 to get list of differential peaks between
tumors and normals (3 biological replicates each). At the same time, I
extracted the raw read count matrix from DiffBind and ran DESeq2
independently. However I get different results. To explain further -
DiffBind-
> h3k4me3_readin <- dba(sampleSheet="h3k4me3.csv") ## Read in
datasheet
> h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset,
score=DBA_SCORE_READS)
> h3k4me3_contrast = dba.contrast(h3k4me3_counts,
categories=DBA_CONDITION, block=DBA_REPLICATE)
> h3k4me3_deseq2_analysis <- dba.analyze(h3k4me3_contrast, method=
DBA_DESEQ2, bReduceObjects=FALSE, bSubControl=FALSE)
On applying a filter of 0.05 FDR - I got 1261 DE peaks.
DESeq2-
## CountData - Dataframe created from CSV file extracted from
h3k4me3_counts
## PhenoData - Attached group/condition information in a dataframe
> dds <- DESeqDataSetFromMatrix(countData = CountData, colData =
phenodata, design = ~ Replicate + Condition)
> dds2 <- DESeq(dds)
> res <- results(dds2)
On applying a filter of 0.05 on the adjusted p value in 'res'
dataframe, I got 743 DE peaks.
Could you please explain why I see this difference ?
Thanks !
Aditi
-------------------------------
This e-mail and any attachments are only for the use of the intended
recipient and may be confidential and/or privileged. If you are not
the recipient, please delete it or notify the sender immediately.
Please do not copy or use it for any purpose or disclose the contents
to any other person as it may be an offence under the Official Secrets
Act.
-------------------------------
[[alternative HTML version deleted]]
Hi Dr. Rory,
I had another question about the creation of consensus peaksets -
After creating a consensus peakset each for tumor and normal, I also
wanted to create a peakset for tumor sample 1 and consensus peakset
for normals i.e set 1 and 12 in the example given below.
I thought the command for the same was -
> h3k4me3_consensus_sample1 <- dba.peakset( h3k4me3_consensus,
consensus=c(1,12), minOverlap=1)
But it doesn't seem to be creating any peakset. Is this command
correct ?
Thanks !
Aditi
From: Rory Stark [mailto:Rory.Stark@cruk.cam.ac.uk]
Sent: Friday, May 16, 2014 2:09 AM
To: QAMRA Aditi (GIS)
Cc: bioconductor@r-project.org
Subject: Re: [BioC] DESeq on CCAT identified chipseq peaks
Hello Aditi-
It is a bit more complicated to derive a consensus-of-consensus
peakset, but it can be done in a few steps. Assuming you've read your
data into h3k4me3_readin, you first have to create a new object with
the two consensus peaksets (one for each condition):
> h3k4me3_consensus <- dba.peakset(h3k4me3_readin, consensus =
DBA_CONDITION, minOverlap=0.6)
If you look at h3k4me3_consensus, it will have two new consensus
peaksets added (as sets 11 and 12). Now you want to make the final
consensus peakset as the union of these:
> h3k4me3_consensus <- dba.peakset( h3k4me3_consensus,
consensus=11:12, minOverlap=1)
Now you can retrieve the final peakset as a GRanges object:
> h3k4me3_peakset <- dba.peakset(h3k4me3_consensus, 13, bRetrieve=T)
And supply it to dba.count for counting:
> h3k4me3_counts <- dba.count(h3k4me3_readin, peaks=h3k4me3_peakset)
Hope this helps!
Cheers-
Rory
>> on Fri, 16 May 2014 01:40:30 +0800 Aditi [guest] guest at
bioconductor.org wrote:
>>
>> Hi Dr. Rory,
>>
>> Thanks a lot for pointing this out.
>>
>> I wanted to confirm one thing while using diffbind - If my sample
sheet looks like -
>>
>>
SampleID
Tissue
Factor
Condition
Treatment
Replicate
bamReads
bamControl
Peaks
PeakCaller
PeakFormat
ScoreCol
LowerBetter
1
T
h3k4me3
tumor
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
2
N
h3k4me3
normal
none
1
PATH
PATH
PATH
raw
raw
4
FALSE
3
T
h3k4me3
tumor
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
4
N
h3k4me3
normal
none
2
PATH
PATH
PATH
raw
raw
4
FALSE
5
T
h3k4me3
tumor
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
6
N
h3k4me3
normal
none
3
PATH
PATH
PATH
raw
raw
4
FALSE
7
T
h3k4me4
tumor
none
4
PATH
PATH
PATH
raw
raw
5
FALSE
8
N
h3k4me5
normal
none
4
PATH
PATH
PATH
raw
raw
6
FALSE
9
T
h3k4me6
tumor
none
5
PATH
PATH
PATH
raw
raw
7
FALSE
10
N
h3k4me7
normal
none
5
PATH
PATH
PATH
raw
raw
8
FALSE
>> Then to create a consensus peakset from the union of peaks that
appear in atleast 3 of 5 samples of each condition, the commandline
would be -
>>
>> h3k4me3_peakset = dba.peakset(h3k4me3_readin,consensus =
DBA_CONDITION, minOverlap=0.6)
>>
>> I am not too clear on how to use this command and thus wanted to
confirm.
>>
>> Thanks !
>> Aditi
-------------------------------
This e-mail and any attachments are only for the use of the intended
recipient and may be confidential and/or privileged. If you are not
the recipient, please delete it or notify the sender immediately.
Please do not copy or use it for any purpose or disclose the contents
to any other person as it may be an offence under the Official Secrets
Act.
-------------------------------
[[alternative HTML version deleted]]