unknown error in DiffBind package

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hello! I am using the DiffBind package to search for differences in histone modifications of ChIPseq peak data, I created a .csv files using sorted.bam files as both bamReads and Bamcontrol and bed files as Peaks column. I tried to count reads using the dba.count function and changing the value minOverlap from 0 to 4, but every time I obtained: Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed WOuld you help me please? thank you Roberta -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DiffBind_1.8.4 GenomicRanges_1.14.4 XVector_0.2.0 [4] IRanges_1.20.7 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] amap_0.8-12 bitops_1.0-6 caTools_1.16 edgeR_3.4.2 [5] gdata_2.13.2 gplots_2.12.1 gtools_3.3.1 KernSmooth_2.23-10 [9] limma_3.18.9 RColorBrewer_1.0-5 stats4_3.0.2 tools_3.0.2 [13] zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.

ChIPSeq chipseq DiffBind ChIPSeq chipseq DiffBind • 3.9k views

ADD COMMENT • link updated 10.0 years ago by Rory Stark ★ 5.1k • written 10.0 years ago by Guest User ★ 13k

0

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 12 days ago

Cambridge, UK

Hello Roberta- Looking at where this line occurs in the code, it is in a check for the existence of the .bam file (and possibly the associated .bai file). Are the file paths correct? Is the index (.bam/bai) file there as well? If not, I can have a look at what may be happening if I can get access to some of your project data. I'm interesting in seeing your sample sheet (csv file) and the DBA object you got back form dba() before you call dba.count(). You should be able to mail these to me. Cheers- Rory On Mon Apr 14 18:41:52 CEST 2014 Roberta Sanfilippo [guest] guest at bioconductor.org <mailto:bioconductor%40r-project.org?subject=re%3a%20 %5bbioc%5d%20unknown%20error%20in%20diffbind%20package&in-reply-="" to="%3C20140414164152.EE26A147A80%40mamba.fhcrc.org%3E"> wrote: > Hello! > I am using the DiffBind package to search for differences in histone modifications of ChIPseq peak data, I created a .csv files using sorted.bam > files as both bamReads and Bamcontrol and bed files as Peaks column. I tried to count reads using the dba.count function and changing the value > minOverlap from 0 to 4, but every time I obtained: > Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed > WOuld you help me please? > thank you > Roberta [[alternative HTML version deleted]]

ADD COMMENT • link 10.0 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 12 days ago

Cambridge, UK

Hi Roberta- If you want to look a little deeper, there are a couple of things you can do. It is always nice to look at the MA plot and see if there appear to be any cbig changes. Next you can get a report of all the sites but setting th=1: > report = dba.report(H3K9Me2_wm4h_input_3, th=1, bCounts=T) >From this, you can see what the top hits are, and have a look in the browser to see if they are actually changing consistently. In general, as you have only two replicates of each condition, you may not have enough power to detect smaller, systematic changes. If the replicates vary a lot, then you would need even more replicates to capture the variance. I would definitely avoid eliminating a replicate that doesn't "match" as any results obtained this way would be meaningless. The final place to look is in the peak calling. You are getting 22K- 100k peaks, have you looked at some of these to verify they are capturing the enrichment you are interested in? If you are particularly interested in centromeric heterochromatin, perhaps avoid peak calling, and use a fixed set of windows in the appropriate genomic regions it may take some experimentation to find the optimal window size. We did something like this in this paper (cf Figure 3): "Independence of Repressive Histone Marks and Chromatin Compaction during Senescent Heterochromatic Layer Formation" http://www.cell.com/molecular-cell/abstract/S1097-2765(12)00505-9 Cheers- Rory From: Roberta Sanfilippo <rsanfilippo@dti.telethon.it<mailto:rsanfilippo@dti.telethon.it>> Date: Wed, 16 Apr 2014 16:08:47 +0000 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Subject: RE: [BioC] unknown error in DiffBind package Hi Rory! apparently the problem has been solved only pressing enter after the last raw(the name of the last bed file) in the csv file. Now I run H3K9Me2_wm4h_input = dba(sampleSheet="~/ChIP_seq_wm4h_DiffBind/H3K9Me2_wm4h_input_1.csv") germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 raw Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 raw Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw > H3K9Me2_wm4h_input 4 Samples, 22069 sites in matrix (99905 total): ID Factor Condition Replicate Caller Intervals 1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 raw 22746 2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw 23941 3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 raw 51065 4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw 99385 > H3K9Me2_wm4h_input_2 = dba.count(H3K9Me2_wm4h_input, minOverlap=2) > H3K9Me2_wm4h_input_2 4 Samples, 22069 sites in matrix: ID Factor Condition Replicate Caller Intervals SN 1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 counts 22069 0.18 2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 counts 22069 0.14 3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 counts 22069 0.11 4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 counts 22069 0.11 > H3K9Me2_wm4h_input_3 = dba.contrast(H3K9Me2_wm4h_input_2, minMembers=2, categories=DBA_CONDITION) > H3K9Me2_wm4h_input_3 = dba.contrast(H3K9Me2_wm4h_input_2, minMembers=2, categories=DBA_CONDITION) > H3K9Me2_wm4h_input_3 4 Samples, 22069 sites in matrix: ID Factor Condition Replicate Caller Intervals SN 1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germfree 1 counts 22069 0.18 2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 counts 22069 0.14 3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germfree 2 counts 22069 0.11 4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 counts 22069 0.11 1 Contrast: Group1 Members1 Group2 Members2 1 germfree 2 conventional 2 > H3K9Me2_wm4h_input_4 = dba.analyze(H3K9Me2_wm4h_input_3) Warning message: In dba.analyze(H3K9Me2_wm4h_input_3) : No correlation heatmap plotted -- contrast 1 does not have enough differentially bound sites. and the same results came up setting method=DBA_ALL_METHODS I have the feeling that my peaks are not so abundant since I am analyzing a histone modification that is a hallmark of centromeric heterochromatin, so it is not very well dsiributed among all chromosomes and so probably I need to use more "relaxed" parameters to check for differences. Moreover my question is:I have a replica for each condition (germ free and conventional), how can I exclude that one is not so rapresentative and so I am not able to find enough peaks to look for differential binding? I really appreciate your help Cheers, ROberta ________________________________ Da: Rory Stark [Rory.Stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>] Inviato: martedì 15 aprile 2014 18.10 A: Roberta Sanfilippo Oggetto: Re: [BioC] unknown error in DiffBind package Hi Roberta- This one is difficult to debug without having all the bam files here. I;m pretty sure the error is in a function that is checking for the existence of 8 files: [1] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdupwm 4_GF_filteredIP.sorted.bam" [2] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdupwm 4conv_filteredIP.sorted.bam" [3] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w m4_GF_filteredIP.sorted.bam" [4] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w m4_conv_filteredIP.sorted.bam" [5] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdup_w m4_GF_filteredinput.sorted.bam" [6] "/farm/home/sanfil01/ChIP_seq_wm4h_17.6.13/sortedbamfiles/1rmdupwm 4_conv_filteredinput.sorted.bam" [7] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w m4_GF_filteredinput.sorted.bam" [8] "/farm/home/sanfil01/ChIP_seq_wm4h_26.2.14/sortedbamfiles/2rmdup_w m4_conv_filteredinput.sorted.bam" Are you sure all eight of these files are there? Some have underscores and some do not, is the naming correct? The odd thing is that even if these files aren't there, it should report the problem cleanly and not throw an error. Without having these eight files myself, there are only a few things to try: 1. Try running dba.count with bParallel=FALSE. This is a long shot as the parallization occurs after the file check that is failing. 2. Try running without control bams (remove the column form the sample sheet) to see if it has something to do with that? 3. Try running this on a reduced set. You can either make a sample sheet with only two samples, or make a new DBA object: > reduced = dba(H3K9Me2_wm4h_input , mask=1:2) > count s= dba.count(reduced,bParallel=F) -Rory From: Roberta Sanfilippo <rsanfilippo@dti.telethon.it<mailto:rsanfilippo@dti.telethon.it>> Date: Tue, 15 Apr 2014 16:00:09 +0000 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Subject: RE: [BioC] unknown error in DiffBind package Hi Rory, here it is the file, even if apparently the error was related to the csv file, I just press enter after the last bed file and it worked :) ________________________________ Da: Rory Stark [Rory.Stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>] Inviato: martedì 15 aprile 2014 16.37 A: Roberta Sanfilippo Oggetto: Re: [BioC] unknown error in DiffBind package Hi Roberta- Can you send me the DBA object "H3K9Me2_wm4h_input ", preferably saved using dba,.save()? Thanks- Rory From: Roberta Sanfilippo <rsanfilippo@dti.telethon.it<mailto:rsanfilippo@dti.telethon.it>> Date: Tue, 15 Apr 2014 09:28:22 +0000 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Subject: RE: [BioC] unknown error in DiffBind package Hello Rory, I don't know if I am supposed to write you sing this e-mail address, please let me know if i have to use the bioconductor mailing lists. sorry to bore you again but apparently the lacking of .bai files wasn't the solution. As you suggested, I am sending You my csv file (you can find it attached). ANd in the following part you wll find the commands from R: H3K9Me2_wm4h_input = dba(sampleSheet="H3K9Me2_wm4h_input_1.csv") germfree_wm4h_H3K9Me2_1 H3K9Me2 germ free 1 raw Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw germfree_wm4h_H3K9Me2_2 H3K9Me2 germ free 2 raw Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw Warning message: In read.table(samplesheet, sep = ",", stringsAsFactors = F, header = T) : incomplete final line found by readTableHeader on 'H3K9Me2_wm4h_input_1.csv' > H3K9Me2_wm4h_input 4 Samples, 22069 sites in matrix (99905 total): ID Factor Condition Replicate Caller Intervals 1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germ free 1 raw 22746 2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw 23941 3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germ free 2 raw 51065 4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw 99385 plot(H3K9Me2_wm4h_input) > pdf("ChIP_seq_wm4h_DiffBind/H3K9Me2_wm4h_input.pdf") > plot(H3K9Me2_wm4h_input) > dev.off() X11cairo 2 > H3K9Me2_wm4h_input = dba.count(H3K9Me2_wm4h_input, minOverlap=2) Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed > H3K9Me2_wm4h_input = dba.count(H3K9Me2_wm4h_input, minOverlap=1) Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed > H3K9Me2_wm4h_input = dba.count(H3K9Me2_wm4h_input, minOverlap=0) Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed > dba() Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'dba_samples.csv': No such file or directory > dba(H3K9Me2_wm4h_input) 4 Samples, 22069 sites in matrix (99905 total): ID Factor Condition Replicate Caller Intervals 1 germfree_wm4h_H3K9Me2_1 H3K9Me2 germ free 1 raw 22746 2 Conv_wm4h_H3K9Me2_1 H3K9Me2 conventional 1 raw 23941 3 germfree_wm4h_H3K9Me2_2 H3K9Me2 germ free 2 raw 51065 4 Conv_wm4h_H3K9Me2_2 H3K9Me2 conventional 2 raw 99385 > ________________________________ Da: Roberta Sanfilippo Inviato: lunedì 14 aprile 2014 19.26 A: Rory.Stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk> Oggetto: RE: [BioC] unknown error in DiffBind package very fast answer, Thanks a lot! no, actually the index bai files are not in the same folder as the BAM ones :) Sorry, beginner's mistakes :) Cheers, Roberta ________________________________ Da: Bori Mifsud [nagybori@gmail.com<mailto:nagybori@gmail.com>] Inviato: lunedì 14 aprile 2014 19.19 A: Roberta Sanfilippo Oggetto: Fwd: [BioC] unknown error in DiffBind package ---------- Forwarded message ---------- From: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Date: 14 April 2014 19:10 Subject: Re: [BioC] unknown error in DiffBind package To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Cc: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Hello Roberta- Looking at where this line occurs in the code, it is in a check for the existence of the .bam file (and possibly the associated .bai file). Are the file paths correct? Is the index (.bam/bai) file there as well? If not, I can have a look at what may be happening if I can get access to some of your project data. I'm interesting in seeing your sample sheet (csv file) and the DBA object you got back form dba() before you call dba.count(). You should be able to mail these to me. Cheers- Rory On Mon Apr 14 18:41:52 CEST 2014 Roberta Sanfilippo [guest] guest at bioconductor.org<http: bioconductor.org=""> <mailto:bioconductor%40r- project.org<mailto:bioconductor%2540r-project.org="">?Subject=Re%3A%20%5B BioC%5D%20unknown%20error%20in%20DiffBind%20package&In-Reply-To=%3C201 40414164152.EE26A147A80%40mamba.fhcrc.org<http: 40mamba.fhcrc.org="">%3E > wrote: > Hello! > I am using the DiffBind package to search for differences in histone modifications of ChIPseq peak data, I created a .csv files using sorted.bam > files as both bamReads and Bamcontrol and bed files as Peaks column. I tried to count reads using the dba.count function and changing the value > minOverlap from 0 to 4, but every time I obtained: > Error in if (res[i] == -1) { : missing value where TRUE/FALSE needed > WOuld you help me please? > thank you > Roberta [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ________________________________ Dona il tuo 5xmille a Telethon per far esistere una cura per le malattie genetiche rare. Inserisci la tua firma e il C.F. 04879781005 nel riquadro della dichiarazione dei redditi Finanziamento della ricerca scientifica e delle università. www.telethon.it ________________________________ Dona il tuo 5xmille a Telethon per far esistere una cura per le malattie genetiche rare. Inserisci la tua firma e il C.F. 04879781005 nel riquadro della dichiarazione dei redditi Finanziamento della ricerca scientifica e delle università. www.telethon.it ________________________________ Dona il tuo 5xmille a Telethon per far esistere una cura per le malattie genetiche rare. Inserisci la tua firma e il C.F. 04879781005 nel riquadro della dichiarazione dei redditi Finanziamento della ricerca scientifica e delle università. www.telethon.it [[alternative HTML version deleted]]

ADD COMMENT • link 10.0 years ago Rory Stark ★ 5.1k

0

Entering edit mode

HI both,

I have the same problem as Roberta. My bam.bai files are in the same folder as my bam alignments.

I ran the following script: library("DiffBind")

read csv of sample sheet into dataframe

samples <-read.csv(ExtremePhenotypeH3K27Ac.csv) samples

Load and read the sample sheet

DBdata <-dba(sampleSheet = "ExtremePhenotypeH3K27Ac.csv")

My sample sheet looks like this:

SampleID    Tissue  Factor  Condition   Treatment   Replicate   bamreads    Control ID  bamControl  Peaks   PeakCaller

1 C0011 PBMCCancer H3K4Me3 Resistant ExtremePhenotype 1 1059D00OLSwanseaCS001H3K27Achsi85trimbowtie2UP(copy 1).bam C0011c 3059S00OLSwanseaInputhsi87trimbowtie2UP.bam CSH3K27Acsummits.bed bed 2 C0012 PBMCCancer H3K4Me4 Resistant ExtremePhenotype 2 1059D00OLSwanseaCS001H3K27Achsi85trimbowtie2UP.bam C0011c 3059S00OLSwanseaInputhsi87trimbowtie2UP (copy 1).bam CSH3K27Acsummits(copy 1).bed bed 3 Con0011 PBMCNormal H3K4Me5 Normal Non-extremePhenotype 1 CENCFF350ISYH3K27Ac.bam Con0011c CENCFF380MFXinp.bam ConH3K27Acsummits.bed bed 4 Con0012 PBMCNormal H3K4Me6 Normal Non-extremePhenotype 2 CENCFF350ISYH3K27Ac(copy 1).bam Con0011c CENCFF380MFXinp(copy 1).bam ConH3K27Acsummits(copy 1).bed bed