Hi Roberta-
If you want to look a little deeper, there are a couple of things you
can do. It is always nice to look at the MA plot and see if there
appear to be any cbig changes. Next you can get a report of all the
sites but setting th=1:
> report = dba.report(H3K9Me2_wm4h_input_3, th=1, bCounts=T)
>From this, you can see what the top hits are, and have a look in the
browser to see if they are actually changing consistently.
In general, as you have only two replicates of each condition, you may
not have enough power to detect smaller, systematic changes. If the
replicates vary a lot, then you would need even more replicates to
capture the variance. I would definitely avoid eliminating a replicate
that doesn't "match" as any results obtained this way would be
meaningless.
The final place to look is in the peak calling. You are getting 22K-
100k peaks, have you looked at some of these to verify they are
capturing the enrichment you are interested in? If you are
particularly interested in centromeric heterochromatin, perhaps avoid
peak calling, and use a fixed set of windows in the appropriate
genomic regions it may take some experimentation to find the optimal
window size. We did something like this in this paper (cf Figure 3):
"Independence of Repressive Histone Marks and Chromatin Compaction
during Senescent Heterochromatic Layer Formation"
http://www.cell.com/molecular-cell/abstract/S1097-2765(12)00505-9
Cheers-
Rory
From: Roberta Sanfilippo
<rsanfilippo@dti.telethon.it<mailto:rsanfilippo@dti.telethon.it>>
Date: Wed, 16 Apr 2014 16:08:47 +0000
To: Rory Stark
<rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>>
Subject: RE: [BioC] unknown error in DiffBind package
Hi Rory! apparently the problem has been solved only pressing enter
after the last raw(the name of the last bed file) in the csv file.
Now I run
H3K9Me2_wm4h_input =
dba(sampleSheet="~/ChIP_seq_wm4h_DiffBind/H3K9Me2_wm4h_input_1.csv")
germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 raw
Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw
germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 raw
Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw
> H3K9Me2_wm4h_input
4 Samples, 22069 sites in matrix (99905 total):
ID Factor Condition Replicate Caller
Intervals
1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 raw
22746
2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw
23941
3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 raw
51065
4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw
99385
> H3K9Me2_wm4h_input_2 = dba.count(H3K9Me2_wm4h_input, minOverlap=2)
> H3K9Me2_wm4h_input_2
4 Samples, 22069 sites in matrix:
ID Factor Condition Replicate Caller
Intervals SN
1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 counts
22069 0.18
2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 counts
22069 0.14
3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 counts
22069 0.11
4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 counts
22069 0.11
> H3K9Me2_wm4h_input_3 = dba.contrast(H3K9Me2_wm4h_input_2,
minMembers=2, categories=DBA_CONDITION)
> H3K9Me2_wm4h_input_3 = dba.contrast(H3K9Me2_wm4h_input_2,
minMembers=2, categories=DBA_CONDITION)
> H3K9Me2_wm4h_input_3
4 Samples, 22069 sites in matrix:
ID Factor Condition Replicate Caller
Intervals SN
1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 counts
22069 0.18
2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 counts
22069 0.14
3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 counts
22069 0.11
4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 counts
22069 0.11
1 Contrast:
Group1 Members1 Group2 Members2
1 germfree 2 conventional 2
> H3K9Me2_wm4h_input_4 = dba.analyze(H3K9Me2_wm4h_input_3)
Warning message:
In dba.analyze(H3K9Me2_wm4h_input_3) :
No correlation heatmap plotted -- contrast 1 does not have enough
differentially bound sites.
and the same results came up setting method=DBA_ALL_METHODS
I have the feeling that my peaks are not so abundant since I am
analyzing a histone modification that is a hallmark of centromeric
heterochromatin, so it is not very well dsiributed among all
chromosomes and so probably I need to use more "relaxed" parameters to
check for differences.
Moreover my question is:I have a replica for each condition (germ free
and conventional), how can I exclude that one is not so rapresentative
and so I am not able to find enough peaks to look for differential
binding?
I really appreciate your help
Cheers,
ROberta
________________________________
Da: Rory Stark
[Rory.Stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>]
Inviato: martedì 15 aprile 2014 18.10
A: Roberta Sanfilippo
Oggetto: Re: [BioC] unknown error in DiffBind package
Hi Roberta-
This one is difficult to debug without having all the bam files here.
I;m pretty sure the error is in a function that is checking for the
existence of 8 files:
[1] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdupwm
4_GF_filteredIP.sorted.bam"
[2] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdupwm
4conv_filteredIP.sorted.bam"
[3] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w
m4_GF_filteredIP.sorted.bam"
[4] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w
m4_conv_filteredIP.sorted.bam"
[5] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdup_w
m4_GF_filteredinput.sorted.bam"
[6] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdupwm
4_conv_filteredinput.sorted.bam"
[7] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w
m4_GF_filteredinput.sorted.bam"
[8] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w
m4_conv_filteredinput.sorted.bam"
Are you sure all eight of these files are there? Some have underscores
and some do not, is the naming correct?
The odd thing is that even if these files aren't there, it should
report the problem cleanly and not throw an error.
Without having these eight files myself, there are only a few things
to try:
1. Try running dba.count with bParallel=FALSE. This is a long shot
as the parallization occurs after the file check that is failing.
2. Try running without control bams (remove the column form the
sample sheet) to see if it has something to do with that?
3. Try running this on a reduced set. You can either make a sample
sheet with only two samples, or make a new DBA object:
> reduced = dba(H3K9Me2_wm4h_input , mask=1:2)
> count s= dba.count(reduced,bParallel=F)
-Rory
From: Roberta Sanfilippo
<rsanfilippo@dti.telethon.it<mailto:rsanfilippo@dti.telethon.it>>
Date: Tue, 15 Apr 2014 16:00:09 +0000
To: Rory Stark
<rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>>
Subject: RE: [BioC] unknown error in DiffBind package
Hi Rory,
here it is the file, even if apparently the error was related to the
csv file, I just press enter after the last bed file and it worked :)
________________________________
Da: Rory Stark
[Rory.Stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>]
Inviato: martedì 15 aprile 2014 16.37
A: Roberta Sanfilippo
Oggetto: Re: [BioC] unknown error in DiffBind package
Hi Roberta-
Can you send me the DBA object "H3K9Me2_wm4h_input ", preferably saved
using dba,.save()?
Thanks-
Rory
From: Roberta Sanfilippo
<rsanfilippo@dti.telethon.it<mailto:rsanfilippo@dti.telethon.it>>
Date: Tue, 15 Apr 2014 09:28:22 +0000
To: Rory Stark
<rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>>
Subject: RE: [BioC] unknown error in DiffBind package
Hello Rory,
I don't know if I am supposed to write you sing this e-mail address,
please let me know if i have to use the bioconductor mailing lists.
sorry to bore you again but apparently the lacking of .bai files
wasn't the solution.
As you suggested, I am sending You my csv file (you can find it
attached).
ANd in the following part you wll find the commands from R:
H3K9Me2_wm4h_input = dba(sampleSheet="H3K9Me2_wm4h_input_1.csv")
germfree_wm4h_H3K9Me2_1 H3K9Me2 germ free 1 raw
Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw
germfree_wm4h_H3K9Me2_2 H3K9Me2 germ free 2 raw
Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw
Warning message:
In read.table(samplesheet, sep = ",", stringsAsFactors = F, header =
T) :
incomplete final line found by readTableHeader on
'H3K9Me2_wm4h_input_1.csv'
> H3K9Me2_wm4h_input
4 Samples, 22069 sites in matrix (99905 total):
ID Factor Condition Replicate Caller
Intervals
1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germ free 1 raw
22746
2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw
23941
3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germ free 2 raw
51065
4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw
99385
plot(H3K9Me2_wm4h_input)
> pdf("ChIP_seq_wm4h_DiffBind/H3K9Me2_wm4h_input.pdf")
> plot(H3K9Me2_wm4h_input)
> dev.off()
X11cairo
2
> H3K9Me2_wm4h_input = dba.count(H3K9Me2_wm4h_input, minOverlap=2)
Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed
> H3K9Me2_wm4h_input = dba.count(H3K9Me2_wm4h_input, minOverlap=1)
Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed
> H3K9Me2_wm4h_input = dba.count(H3K9Me2_wm4h_input, minOverlap=0)
Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed
> dba()
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'dba_samples.csv': No such file or directory
> dba(H3K9Me2_wm4h_input)
4 Samples, 22069 sites in matrix (99905 total):
ID Factor Condition Replicate Caller
Intervals
1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germ free 1 raw
22746
2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw
23941
3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germ free 2 raw
51065
4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw
99385
>
________________________________
Da: Roberta Sanfilippo
Inviato: lunedì 14 aprile 2014 19.26
A: Rory.Stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>
Oggetto: RE: [BioC] unknown error in DiffBind package
very fast answer, Thanks a lot! no, actually the index bai files are
not in the same folder as the BAM ones :)
Sorry, beginner's mistakes :)
Cheers,
Roberta
________________________________
Da: Bori Mifsud [nagybori@gmail.com<mailto:nagybori@gmail.com>]
Inviato: lunedì 14 aprile 2014 19.19
A: Roberta Sanfilippo
Oggetto: Fwd: [BioC] unknown error in DiffBind package
---------- Forwarded message ----------
From: Rory Stark
<rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>>
Date: 14 April 2014 19:10
Subject: Re: [BioC] unknown error in DiffBind package
To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Cc: Rory Stark
<rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>>
Hello Roberta-
Looking at where this line occurs in the code, it is in a check for
the existence of the .bam file (and possibly the associated .bai
file). Are the file paths correct? Is the index (.bam/bai) file there
as well?
If not, I can have a look at what may be happening if I can get access
to some of your project data. I'm interesting in seeing your sample
sheet (csv file) and the DBA object you got back form dba() before
you call dba.count(). You should be able to mail these to me.
Cheers-
Rory
On Mon Apr 14 18:41:52 CEST 2014 Roberta Sanfilippo [guest] guest at
bioconductor.org<http: bioconductor.org=""> <mailto:bioconductor%40r- project.org<mailto:bioconductor%2540r-project.org="">?Subject=Re%3A%20%5B
BioC%5D%20unknown%20error%20in%20DiffBind%20package&In-Reply-To=%3C201
40414164152.EE26A147A80%40mamba.fhcrc.org<http: 40mamba.fhcrc.org="">%3E
> wrote:
> Hello!
> I am using the DiffBind package to search for differences in histone
modifications of ChIPseq peak data, I created a .csv files using
sorted.bam > files as both bamReads and Bamcontrol and bed files as
Peaks column. I tried to count reads using the dba.count function and
changing the value > minOverlap from 0 to 4, but every time I
obtained:
> Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed
> WOuld you help me please?
> thank you
> Roberta
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
________________________________
Dona il tuo 5xmille a Telethon per far esistere una cura per le
malattie genetiche rare.
Inserisci la tua firma e il C.F. 04879781005 nel riquadro della
dichiarazione dei redditi Finanziamento della ricerca scientifica e
delle università.
www.telethon.it
________________________________
Dona il tuo 5xmille a Telethon per far esistere una cura per le
malattie genetiche rare.
Inserisci la tua firma e il C.F. 04879781005 nel riquadro della
dichiarazione dei redditi Finanziamento della ricerca scientifica e
delle università.
www.telethon.it
________________________________
Dona il tuo 5xmille a Telethon per far esistere una cura per le
malattie genetiche rare.
Inserisci la tua firma e il C.F. 04879781005 nel riquadro della
dichiarazione dei redditi Finanziamento della ricerca scientifica e
delle università.
www.telethon.it
[[alternative HTML version deleted]]
HI both,
I have the same problem as Roberta. My bam.bai files are in the same folder as my bam alignments.
I ran the following script: library("DiffBind")
read csv of sample sheet into dataframe
samples <-read.csv(ExtremePhenotypeH3K27Ac.csv) samples
Load and read the sample sheet
DBdata <-dba(sampleSheet = "ExtremePhenotypeH3K27Ac.csv")
My sample sheet looks like this:
1 C0011 PBMCCancer H3K4Me3 Resistant ExtremePhenotype 1 1059D00OLSwanseaCS001H3K27Achsi85trimbowtie2UP(copy 1).bam C0011c 3059S00OLSwanseaInputhsi87trimbowtie2UP.bam CSH3K27Acsummits.bed bed 2 C0012 PBMCCancer H3K4Me4 Resistant ExtremePhenotype 2 1059D00OLSwanseaCS001H3K27Achsi85trimbowtie2UP.bam C0011c 3059S00OLSwanseaInputhsi87trimbowtie2UP (copy 1).bam CSH3K27Acsummits(copy 1).bed bed 3 Con0011 PBMCNormal H3K4Me5 Normal Non-extremePhenotype 1 CENCFF350ISYH3K27Ac.bam Con0011c CENCFF380MFXinp.bam ConH3K27Acsummits.bed bed 4 Con0012 PBMCNormal H3K4Me6 Normal Non-extremePhenotype 2 CENCFF350ISYH3K27Ac(copy 1).bam Con0011c CENCFF380MFXinp(copy 1).bam ConH3K27Acsummits(copy 1).bed bed
I hope you don't mind me appending this.
Best wishes,
Jason