Hello Ant?nio-
Regarding your sample sheet, you should be able to use the same sample
ID and the same read files for each sample, and specify the peak
caller as MACS and QuEST as in your initial sample sheet. This should
load two sets of peak for each sample, differing only peak caller name
(as DiffBind doesn't recognize the MACS or QuEST strings, it will
default to raw, which is the format you are using). Does this work for
you? I've attached an example sample sheet using the sample tamoxifen
data included with DiffBind.
The next issue is deriving "a set of common peaks for the peak callers
". You can do this using all the peaks from all the callers for all
the replicates but invoking dba.count directly (by default, any peak
that is identified at least twice is included; these could come form
two peak callers for a single replicate, or from single peak callers
for different replicates). You may want to perform an intermediate
step of deriving consensus peaksets for each replicate. In the
development version of DiffBind, this is easy done with the call:
data = dba.peakset(data, consensus = -DBA_CALLER)
Which will add a consensus peakset for each sample (consisting of peak
identified in both peak callers for that replicate). I have added a
feature to dba.count to make it easy to use only these peaksets in
deriving the master consensus peakset used for counting:
data = dba.count(data, peaks=data$masks$Consensus)
But this won't be in the build for a few more days. In the meantime
you have to retrieve the peakset and pass it to dba.count:
consensus = dba(data,data$masks$Consensus) # make new DBA object with
only the consensus peaksets
conspeaks = dba.peakset(consensus,bRetrieve=T) # retrieve master
consensus peakset
data = dba.count(data,peaks=conspeaks) # pass in master consensus
peakset for counting
As Gord said in another response, the reads must be in BED or BAM
format (possible zipped); you can easily convert SAM files to BAM
files.
Hope this helps!
Cheers-
Rory
On 14/09/2012 11:00, "bioconductor-request at r-project.org<mailto :bioconductor-request="" at="" r-project.org="">" <bioconductor-request at="" r-project.org<mailto:bioconductor-request="" at="" r-project.org="">> wrote:
------------------------------
Message: 10
Date: Thu, 13 Sep 2012 18:06:05 +0200
From: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: [BioC] DiffBind - sample sheet for multiple replicates and
peak callers
Message-ID:
<capacvobu89nekmnu-2jojczybqwdhd=ut=b5o4hl10k+axxgnq at="" mail.gmail.com<mailto:capacvobu89nekmnu-="" 2jojczybqwdhd="ut=b5O4hL10k+axXgNQ" at="" mail.gmail.com="">>
Content-Type: text/plain
Hi all,
I am trying to use DiffBind to compare peaks called in control vs
condition. I have 2 replicates for each and I've also called peaks
using 2
different peak callers (to wi, MACS and QuEST). I've also prepared a
sample
data sheet that looks like this:
SampleID Tissue Factor Condition Replicate Peak.caller
bamReads
bamControl Peaks
control Hela TF wt 1
MACS path path path
control Hela TF wt 1
QuEST path path path
control2 Hela TF wt 2
MACS path path path
control 2 Hela TF wt 2
QuEST path path path
(and the same for the conditions)
My plan was to load all the data and then using diffbind selecte a set
of
common peaks for the peak callers before proceeding with the analysis.
However, when I load the data (data =
dba(sampleSheet="samplesheet.csv"))
the peaks for each caller are not recognized as a different variable.
How
I can do that and is this silly?
I could also derive a set of common peaks independently but it would
be
neat to do it all with the same package and that seems to be possible
but I
could not find how to do it in the documentation.
Thanks,
Ant?nio
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue at mpi-cbg.de<mailto:domingue at="" mpi-cbg.de="">
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]
------------------------------
------------------------------
Message: 20
Date: Fri, 14 Sep 2012 11:23:26 +0200
From: Paolo Kunderfranco
<paolo.kunderfranco@gmail.com<mailto:paolo.kunderfranco@gmail.com>>
To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: Re: [BioC] DiffBind - sample sheet for multiple replicates
and peak
Message-ID:
<cagxwfc-bywww9mjmr7nadcofnlj=zym3sssbc2mynydhe5+haq at="" mail.gmail.com<mailto:cagxwfc-="" bywww9mjmr7nadcofnlj="ZyM3sssbc2mYNyDhE5+haQ" at="" mail.gmail.com="">>
Content-Type: text/plain; charset=ISO-8859-1
Dear Antonio
I think that this problem was resolved in previous messagges, have a
look to:
https://stat.ethz.ch/pipermail/bioconductor/2012-August/047351.html?
For the complete code you could browse August mailing list.
SampleID should be unique for each sample, and moreover also bam file
file should be unique, you should make a copy of all your bamReads and
bamControl
In my example, I compared 2 different peak callers in 3 cell lines:
SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks
mES_H3K27me3_m,ES,H3K27,mES_H3K27me3,1,reads/H3K27me3/ES_H3K27me3_m.be
d,reads/H3K27me3/ES_input_m.bed,peaks/H3K27me3_ES_M.bed
CMp_H3K27me3_m,CMN,H3K27,CMp_H3K27me3,1,reads/H3K27me3/CMN_H3K27me3_m.
bed,reads/H3K27me3/CMN_input_m.bed,peaks/H3K27me3_CMN_M.bed
CMa_H3K27me3_m,CMA,H3K27,CMa_H3K27me3,1,reads/H3K27me3/CMA_H3K27me3_m.
bed,reads/H3K27me3/CMA_input_m.bed,peaks/H3K27me3_CMA_M.bed
mES_H3K27me3_s,ES,H3K27,mES_H3K27me3,2,reads/H3K27me3/ES_H3K27me3_s.be
d,reads/H3K27me3/ES_input_s.bed,peaks/H3K27me3_ES_S.bed
CMp_H3K27me3_s,CMN,H3K27,CMp_H3K27me3,2,reads/H3K27me3/CMN_H3K27me3_s.
bed,reads/H3K27me3/CMN_input_s.bed,peaks/H3K27me3_CMN_S.bed
CMa_H3K27me3_s,CMA,H3K27,CMa_H3K27me3,2,reads/H3K27me3/CMA_H3K27me3_s.
bed,reads/H3K27me3/CMA_input_s.bed,peaks/H3K27me3_CMA_S.bed
Like this it should work,
Cheers,
Paolo
------------------------------
Message: 21
Date: Fri, 14 Sep 2012 11:58:09 +0200
From: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: [BioC] DiffBind - error in dba.count
Message-ID:
<capacvoc+9hgaxzatgj7j3guktemaummn6sjhoxm1b4ygwvr4rq at="" mail.g="" mail.com<mailto:capacvoc+9hgaxzatgj7j3guktemaummn6sjhoxm1b4ygwvr4rq="" at="" mail.gmail.com="">>
Content-Type: text/plain
Hi again,
I am trying DiffBind and loaded my data that looks like this:
H3K4m3
4 Samples, 13203 sites in matrix (13792 total):
ID Tissue Factor Condition Peak.caller Replicate Intervals
1 wt1 Hela H3K4me3 control1 raw 1 14111
2 wt2 Hela H3K4me3 control2 raw 2 13771
3 treat1 Hela H3K4me3 condition1 raw 1 14865
4 treat2 Hela H3K4me3 condition2 raw 2 13393
But I ran into problems trying to calculate the affinity scores with
dba.count:
H3K4m3 = dba.count(H3K4m3)
Error in cond$counts : $ operator is invalid for atomic vectors
In addition: Warning message:
In mclapply(arglist, fn, ..., mc.preschedule = FALSE) :
6 function calls resulted in an error
The peaks are in bed files (chr, start, end, score) and the reads are
in
SAM format.
Can anyone help me with this?
Cheers.
Ant?nio
sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] DiffBind_1.0.9 Biobase_2.14.0
loaded via a namespace (and not attached):
[1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7
edgeR_2.4.6
[5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0
limma_3.10.3
[9] zlibbioc_1.0.1
On 13 September 2012 18:06, Ant?nio Miguel de Jesus Domingues <
amjdomingues at gmail.com<mailto:amjdomingues at="" gmail.com="">> wrote:
Hi all,
I am trying to use DiffBind to compare peaks called in control vs
condition. I have 2 replicates for each and I've also called peaks
using 2
different peak callers (to wi, MACS and QuEST). I've also prepared a
sample
data sheet that looks like this:
SampleID Tissue Factor Condition Replicate Peak.caller
bamReads
bamControl Peaks
control Hela TF wt 1
MACS path path path
control Hela TF wt 1
QuEST path path path
control2 Hela TF wt 2
MACS path path path
control 2 Hela TF wt 2
QuEST path path path
(and the same for the conditions)
My plan was to load all the data and then using diffbind selecte a set
of
common peaks for the peak callers before proceeding with the analysis.
However, when I load the data (data =
dba(sampleSheet="samplesheet.csv"))
the peaks for each caller are not recognized as a different variable.
How
I can do that and is this silly?
I could also derive a set of common peaks independently but it would
be
neat to do it all with the same package and that seems to be possible
but I
could not find how to do it in the documentation.
Thanks,
Ant?nio
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue at mpi-cbg.de<mailto:domingue at="" mpi-cbg.de="">
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue at mpi-cbg.de<mailto:domingue at="" mpi-cbg.de="">
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 115, Issue 14
*********************************************
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.