DiffBind -error with dba.counts
1
0
Entering edit mode
@anitha-sundararajan-6152
Last seen 9.6 years ago
Hi I have been trying to use DiffBind to analyze our Chip-seq data and have been running into some errors repeatedly. I first created a samplesheet.csv describing my samples and it looks like this: SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks,P eakCaller meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_meiocyt es.bam,meio.vs.in.rep1.def_peaks.bed,MACS seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seedling .bam,seed.vs.in.rep1.def_peaks.bed,MACS I only have two samples (and their respective inputs) with one rep each and the peaks were called using MACS v2. The peak caller generated .bed files which was used in DiffBind. I defined the working directory in R first. I then read the sample sheet in : > H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') >H3K4.B73 2 Samples, 38870 sites in matrix (45304 total): ID Tissue Factor Condition Replicate Peak.caller Intervals 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 2 seed.1 seedlings H3K4me3 N 1 MACS 41596 generated a plot, > plot(H3K4.B73) And then when I tried to perform dba.counts, it continuously fails on me. I went through the thread to find similar posts and could not find a solution. I tried the floowing command: > H3K4.B73=dba.count(H3K4.B73, minOverlap=3) and this, > H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) > H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) And they all failed. My error in all three cases is as follows: Error in read.table(fn, skip = skipnum) : no lines available in input Please let me know if you have any insights on it. Thanks so much for your help in advance. Anitha Sundararajan Ph.D. Research Scientist National Center for Genome Resources Santa Fe, NM 87505
DiffBind DiffBind • 2.5k views
ADD COMMENT
0
Entering edit mode
Gord Brown ▴ 650
@gord-brown-5664
Last seen 3.3 years ago
United Kingdom
Hi, Anitha, The basic problem is that you have two samples, but you're asking for a minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). No locations can satisfy that criterion, so you end up with an empty set of peaks. The message is obscure, I will admit. (It happens because DiffBind writes out the unified set of peaks and reads it back in, for tedious implementation reasons, and when it reads it back in, there are no peaks, hence "no lines available in input".) Try using minOverlap=2. But... having said that, I'm not sure how useful DiffBind will be to you, without replicates. Cheers, - Gord Brown >Message: 22 >Date: Fri, 13 Sep 2013 12:21:02 -0600 >From: Anitha Sundararajan <asundara at="" ncgr.org=""> >To: bioconductor at r-project.org >Subject: [BioC] DiffBind -error with dba.counts >Message-ID: <5233578E.3090701 at ncgr.org> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >Hi > >I have been trying to use DiffBind to analyze our Chip-seq data and have >been running into some errors repeatedly. > >I first created a samplesheet.csv describing my samples and it looks >like this: > >SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks, PeakC >aller > >meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_meiocy tes.b >am,meio.vs.in.rep1.def_peaks.bed,MACS > >seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seedlin g.bam >,seed.vs.in.rep1.def_peaks.bed,MACS > > >I only have two samples (and their respective inputs) with one rep each >and the peaks were called using MACS v2. The peak caller generated .bed >files which was used in DiffBind. > > >I defined the working directory in R first. > >I then read the sample sheet in : > > H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') > > >H3K4.B73 > >2 Samples, 38870 sites in matrix (45304 total): > ID Tissue Factor Condition Replicate Peak.caller Intervals >1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 >2 seed.1 seedlings H3K4me3 N 1 MACS 41596 > >generated a plot, > > plot(H3K4.B73) > >And then when I tried to perform dba.counts, it continuously fails on >me. I went through the thread to find similar posts and could not find >a solution. I tried the floowing command: > > > H3K4.B73=dba.count(H3K4.B73, minOverlap=3) >and this, > > H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) > > H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) >And they all failed. > >My error in all three cases is as follows: >Error in read.table(fn, skip = skipnum) : no lines available in input > >Please let me know if you have any insights on it. > >Thanks so much for your help in advance. > >Anitha Sundararajan Ph.D. >Research Scientist >National Center for Genome Resources >Santa Fe, NM 87505
ADD COMMENT
0
Entering edit mode
Hi Gordon I am now trying to run both reps for each sample, despite their low correlation. When I try the >B73.H3K4=dba.count(B73.H3K4, minOverlap=3) the R-session just freezes and there is no response for hours. I am not sure if there is anything wrong with any of my input files. The sample sheet gets read in fine without any errors. Just FYI, my bed file (form MACS2) looks like: chr1 9128 9552 MACS_peak_1 105.25 chr1 9918 10127 MACS_peak_2 4.72 chr1 79482 79691 MACS_peak_3 5.10 chr1 86963 87514 MACS_peak_4 50.23 chr1 94579 94781 MACS_peak_5 5.10 chr1 103763 103997 MACS_peak_6 5.10 chr1 110722 111047 MACS_peak_7 97.69 chr1 144929 145568 MACS_peak_8 127.78 chr1 161344 162320 MACS_peak_9 136.89 chr1 222479 223058 MACS_peak_10 77.67 chr1 227130 227628 MACS_peak_11 17.02 chr1 263835 263971 MACS_peak_12 12.60 chr1 264068 264518 MACS_peak_13 58.01 chr1 264625 265056 MACS_peak_14 68.16 chr1 270509 271086 MACS_peak_15 47.15 chr1 277629 277789 MACS_peak_16 13.25 Not sure if this is the problem? Thanks so much. Anitha On 9/16/13 3:51 AM, Gordon Brown wrote: > Hi, Anitha, > > The basic problem is that you have two samples, but you're asking for a > minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). No > locations can satisfy that criterion, so you end up with an empty set of > peaks. > > The message is obscure, I will admit. (It happens because DiffBind writes > out the unified set of peaks and reads it back in, for tedious > implementation reasons, and when it reads it back in, there are no peaks, > hence "no lines available in input".) > > Try using minOverlap=2. But... having said that, I'm not sure how useful > DiffBind will be to you, without replicates. > > Cheers, > > - Gord Brown > > > >> Message: 22 >> Date: Fri, 13 Sep 2013 12:21:02 -0600 >> From: Anitha Sundararajan <asundara at="" ncgr.org=""> >> To: bioconductor at r-project.org >> Subject: [BioC] DiffBind -error with dba.counts >> Message-ID: <5233578E.3090701 at ncgr.org> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hi >> >> I have been trying to use DiffBind to analyze our Chip-seq data and have >> been running into some errors repeatedly. >> >> I first created a samplesheet.csv describing my samples and it looks >> like this: >> >> SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peak s,PeakC >> aller >> >> meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_meio cytes.b >> am,meio.vs.in.rep1.def_peaks.bed,MACS >> >> seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seedl ing.bam >> ,seed.vs.in.rep1.def_peaks.bed,MACS >> >> >> I only have two samples (and their respective inputs) with one rep each >> and the peaks were called using MACS v2. The peak caller generated .bed >> files which was used in DiffBind. >> >> >> I defined the working directory in R first. >> >> I then read the sample sheet in : >>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') >>> H3K4.B73 >> 2 Samples, 38870 sites in matrix (45304 total): >> ID Tissue Factor Condition Replicate Peak.caller Intervals >> 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 >> 2 seed.1 seedlings H3K4me3 N 1 MACS 41596 >> >> generated a plot, >>> plot(H3K4.B73) >> And then when I tried to perform dba.counts, it continuously fails on >> me. I went through the thread to find similar posts and could not find >> a solution. I tried the floowing command: >> >>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3) >> and this, >>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) >>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) >> And they all failed. >> >> My error in all three cases is as follows: >> Error in read.table(fn, skip = skipnum) : no lines available in input >> >> Please let me know if you have any insights on it. >> >> Thanks so much for your help in advance. >> >> Anitha Sundararajan Ph.D. >> Research Scientist >> National Center for Genome Resources >> Santa Fe, NM 87505
ADD REPLY
0
Entering edit mode
Sorry, I did try the minOverlap=2 (didnt rectify when I wrote the email, my bad) On 9/16/13 1:59 PM, Anitha Sundararajan wrote: > Hi Gordon > > I am now trying to run both reps for each sample, despite their low > correlation. When I try the > > >B73.H3K4=dba.count(B73.H3K4, minOverlap=3) > > the R-session just freezes and there is no response for hours. I am > not sure if there is anything wrong with any of my input files. The > sample sheet gets read in fine without any errors. > > Just FYI, my bed file (form MACS2) looks like: > > > chr1 9128 9552 MACS_peak_1 105.25 > chr1 9918 10127 MACS_peak_2 4.72 > chr1 79482 79691 MACS_peak_3 5.10 > chr1 86963 87514 MACS_peak_4 50.23 > chr1 94579 94781 MACS_peak_5 5.10 > chr1 103763 103997 MACS_peak_6 5.10 > chr1 110722 111047 MACS_peak_7 97.69 > chr1 144929 145568 MACS_peak_8 127.78 > chr1 161344 162320 MACS_peak_9 136.89 > chr1 222479 223058 MACS_peak_10 77.67 > chr1 227130 227628 MACS_peak_11 17.02 > chr1 263835 263971 MACS_peak_12 12.60 > chr1 264068 264518 MACS_peak_13 58.01 > chr1 264625 265056 MACS_peak_14 68.16 > chr1 270509 271086 MACS_peak_15 47.15 > chr1 277629 277789 MACS_peak_16 13.25 > > Not sure if this is the problem? > > Thanks so much. > > Anitha > > On 9/16/13 3:51 AM, Gordon Brown wrote: >> Hi, Anitha, >> >> The basic problem is that you have two samples, but you're asking for a >> minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). No >> locations can satisfy that criterion, so you end up with an empty set of >> peaks. >> >> The message is obscure, I will admit. (It happens because DiffBind >> writes >> out the unified set of peaks and reads it back in, for tedious >> implementation reasons, and when it reads it back in, there are no >> peaks, >> hence "no lines available in input".) >> >> Try using minOverlap=2. But... having said that, I'm not sure how >> useful >> DiffBind will be to you, without replicates. >> >> Cheers, >> >> - Gord Brown >> >> >> >>> Message: 22 >>> Date: Fri, 13 Sep 2013 12:21:02 -0600 >>> From: Anitha Sundararajan <asundara at="" ncgr.org=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] DiffBind -error with dba.counts >>> Message-ID: <5233578E.3090701 at ncgr.org> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Hi >>> >>> I have been trying to use DiffBind to analyze our Chip-seq data and >>> have >>> been running into some errors repeatedly. >>> >>> I first created a samplesheet.csv describing my samples and it looks >>> like this: >>> >>> SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Pea ks,PeakC >>> >>> aller >>> >>> meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_mei ocytes.b >>> >>> am,meio.vs.in.rep1.def_peaks.bed,MACS >>> >>> seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seed ling.bam >>> >>> ,seed.vs.in.rep1.def_peaks.bed,MACS >>> >>> >>> I only have two samples (and their respective inputs) with one rep each >>> and the peaks were called using MACS v2. The peak caller generated .bed >>> files which was used in DiffBind. >>> >>> >>> I defined the working directory in R first. >>> >>> I then read the sample sheet in : >>>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') >>>> H3K4.B73 >>> 2 Samples, 38870 sites in matrix (45304 total): >>> ID Tissue Factor Condition Replicate Peak.caller Intervals >>> 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 >>> 2 seed.1 seedlings H3K4me3 N 1 MACS 41596 >>> >>> generated a plot, >>>> plot(H3K4.B73) >>> And then when I tried to perform dba.counts, it continuously fails on >>> me. I went through the thread to find similar posts and could not find >>> a solution. I tried the floowing command: >>> >>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3) >>> and this, >>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) >>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) >>> And they all failed. >>> >>> My error in all three cases is as follows: >>> Error in read.table(fn, skip = skipnum) : no lines available in input >>> >>> Please let me know if you have any insights on it. >>> >>> Thanks so much for your help in advance. >>> >>> Anitha Sundararajan Ph.D. >>> Research Scientist >>> National Center for Genome Resources >>> Santa Fe, NM 87505 >
ADD REPLY
0
Entering edit mode
Hi, Anitha, What version of Bioconductor/DiffBind are you running, and how much memory does your computer have? Older versions of DiffBind use a *lot* of memory in the counting stage, so if your computer is short on RAM, it could easily run out of memory and start swapping to disk, which will slow it down by orders of magnitude. Does everything else on the machine slow down as well? Can you pass along the output from the "sessionInfo()" command? And if possible, upgrade to the latest version of DiffBind (if you're not there already) and try the "bLowMem" option on dba.count. Other than that, I can't think of any reason it should take hours, unless you have *really* big data files. How many reads are in them, roughly? - Gord On 2013-09-16 21:21, "Anitha Sundararajan" <asundara at="" ncgr.org=""> wrote: >Sorry, I did try the minOverlap=2 (didnt rectify when I wrote the email, >my bad) > > >On 9/16/13 1:59 PM, Anitha Sundararajan wrote: >> Hi Gordon >> >> I am now trying to run both reps for each sample, despite their low >> correlation. When I try the >> >> >B73.H3K4=dba.count(B73.H3K4, minOverlap=3) >> >> the R-session just freezes and there is no response for hours. I am >> not sure if there is anything wrong with any of my input files. The >> sample sheet gets read in fine without any errors. >> >> Just FYI, my bed file (form MACS2) looks like: >> >> >> chr1 9128 9552 MACS_peak_1 105.25 >> chr1 9918 10127 MACS_peak_2 4.72 >> chr1 79482 79691 MACS_peak_3 5.10 >> chr1 86963 87514 MACS_peak_4 50.23 >> chr1 94579 94781 MACS_peak_5 5.10 >> chr1 103763 103997 MACS_peak_6 5.10 >> chr1 110722 111047 MACS_peak_7 97.69 >> chr1 144929 145568 MACS_peak_8 127.78 >> chr1 161344 162320 MACS_peak_9 136.89 >> chr1 222479 223058 MACS_peak_10 77.67 >> chr1 227130 227628 MACS_peak_11 17.02 >> chr1 263835 263971 MACS_peak_12 12.60 >> chr1 264068 264518 MACS_peak_13 58.01 >> chr1 264625 265056 MACS_peak_14 68.16 >> chr1 270509 271086 MACS_peak_15 47.15 >> chr1 277629 277789 MACS_peak_16 13.25 >> >> Not sure if this is the problem? >> >> Thanks so much. >> >> Anitha >> >> On 9/16/13 3:51 AM, Gordon Brown wrote: >>> Hi, Anitha, >>> >>> The basic problem is that you have two samples, but you're asking for a >>> minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). No >>> locations can satisfy that criterion, so you end up with an empty set >>>of >>> peaks. >>> >>> The message is obscure, I will admit. (It happens because DiffBind >>> writes >>> out the unified set of peaks and reads it back in, for tedious >>> implementation reasons, and when it reads it back in, there are no >>> peaks, >>> hence "no lines available in input".) >>> >>> Try using minOverlap=2. But... having said that, I'm not sure how >>> useful >>> DiffBind will be to you, without replicates. >>> >>> Cheers, >>> >>> - Gord Brown >>> >>> >>> >>>> Message: 22 >>>> Date: Fri, 13 Sep 2013 12:21:02 -0600 >>>> From: Anitha Sundararajan <asundara at="" ncgr.org=""> >>>> To: bioconductor at r-project.org >>>> Subject: [BioC] DiffBind -error with dba.counts >>>> Message-ID: <5233578E.3090701 at ncgr.org> >>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>> >>>> Hi >>>> >>>> I have been trying to use DiffBind to analyze our Chip-seq data and >>>> have >>>> been running into some errors repeatedly. >>>> >>>> I first created a samplesheet.csv describing my samples and it looks >>>> like this: >>>> >>>> >>>>SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Pea ks,Pe >>>>akC >>>> >>>> aller >>>> >>>> >>>>meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_mei ocyte >>>>s.b >>>> >>>> am,meio.vs.in.rep1.def_peaks.bed,MACS >>>> >>>> >>>>seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seed ling. >>>>bam >>>> >>>> ,seed.vs.in.rep1.def_peaks.bed,MACS >>>> >>>> >>>> I only have two samples (and their respective inputs) with one rep >>>>each >>>> and the peaks were called using MACS v2. The peak caller generated >>>>.bed >>>> files which was used in DiffBind. >>>> >>>> >>>> I defined the working directory in R first. >>>> >>>> I then read the sample sheet in : >>>>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') >>>>> H3K4.B73 >>>> 2 Samples, 38870 sites in matrix (45304 total): >>>> ID Tissue Factor Condition Replicate Peak.caller Intervals >>>> 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 >>>> 2 seed.1 seedlings H3K4me3 N 1 MACS 41596 >>>> >>>> generated a plot, >>>>> plot(H3K4.B73) >>>> And then when I tried to perform dba.counts, it continuously fails on >>>> me. I went through the thread to find similar posts and could not >>>>find >>>> a solution. I tried the floowing command: >>>> >>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3) >>>> and this, >>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) >>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) >>>> And they all failed. >>>> >>>> My error in all three cases is as follows: >>>> Error in read.table(fn, skip = skipnum) : no lines available in input >>>> >>>> Please let me know if you have any insights on it. >>>> >>>> Thanks so much for your help in advance. >>>> >>>> Anitha Sundararajan Ph.D. >>>> Research Scientist >>>> National Center for Genome Resources >>>> Santa Fe, NM 87505 >> >
ADD REPLY
0
Entering edit mode
Hi Gordon Please see below the session info: > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DiffBind_1.6.2 Biobase_2.20.1 GenomicRanges_1.12.5 IRanges_1.18.3 BiocGenerics_0.6.0 BiocInstaller_1.10.3 loaded via a namespace (and not attached): [1] amap_0.8-7 edgeR_3.2.4 gdata_2.13.2 gplots_2.11.3 gtools_3.0.0 limma_3.16.7 RColorBrewer_1.0-5 stats4_3.0.1 [9] tools_3.0.1 zlibbioc_1.6.0 I have anywhere from 30-55 million reads for my samples. Yes, everything else on the machine does slow down quite a bit. I am running R locally now as we do not have R 3.0.1 installed on command line. Not sure if that matters. Thanks for all your help. Anitha On 9/17/13 3:05 AM, Gordon Brown wrote: > Hi, Anitha, > > What version of Bioconductor/DiffBind are you running, and how much memory > does your computer have? Older versions of DiffBind use a *lot* of memory > in the counting stage, so if your computer is short on RAM, it could > easily run out of memory and start swapping to disk, which will slow it > down by orders of magnitude. Does everything else on the machine slow > down as well? > > Can you pass along the output from the "sessionInfo()" command? > > And if possible, upgrade to the latest version of DiffBind (if you're not > there already) and try the "bLowMem" option on dba.count. > > Other than that, I can't think of any reason it should take hours, unless > you have *really* big data files. How many reads are in them, roughly? > > - Gord > > > On 2013-09-16 21:21, "Anitha Sundararajan" <asundara at="" ncgr.org=""> wrote: > >> Sorry, I did try the minOverlap=2 (didnt rectify when I wrote the email, >> my bad) >> >> >> On 9/16/13 1:59 PM, Anitha Sundararajan wrote: >>> Hi Gordon >>> >>> I am now trying to run both reps for each sample, despite their low >>> correlation. When I try the >>> >>>> B73.H3K4=dba.count(B73.H3K4, minOverlap=3) >>> the R-session just freezes and there is no response for hours. I am >>> not sure if there is anything wrong with any of my input files. The >>> sample sheet gets read in fine without any errors. >>> >>> Just FYI, my bed file (form MACS2) looks like: >>> >>> >>> chr1 9128 9552 MACS_peak_1 105.25 >>> chr1 9918 10127 MACS_peak_2 4.72 >>> chr1 79482 79691 MACS_peak_3 5.10 >>> chr1 86963 87514 MACS_peak_4 50.23 >>> chr1 94579 94781 MACS_peak_5 5.10 >>> chr1 103763 103997 MACS_peak_6 5.10 >>> chr1 110722 111047 MACS_peak_7 97.69 >>> chr1 144929 145568 MACS_peak_8 127.78 >>> chr1 161344 162320 MACS_peak_9 136.89 >>> chr1 222479 223058 MACS_peak_10 77.67 >>> chr1 227130 227628 MACS_peak_11 17.02 >>> chr1 263835 263971 MACS_peak_12 12.60 >>> chr1 264068 264518 MACS_peak_13 58.01 >>> chr1 264625 265056 MACS_peak_14 68.16 >>> chr1 270509 271086 MACS_peak_15 47.15 >>> chr1 277629 277789 MACS_peak_16 13.25 >>> >>> Not sure if this is the problem? >>> >>> Thanks so much. >>> >>> Anitha >>> >>> On 9/16/13 3:51 AM, Gordon Brown wrote: >>>> Hi, Anitha, >>>> >>>> The basic problem is that you have two samples, but you're asking for a >>>> minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). No >>>> locations can satisfy that criterion, so you end up with an empty set >>>> of >>>> peaks. >>>> >>>> The message is obscure, I will admit. (It happens because DiffBind >>>> writes >>>> out the unified set of peaks and reads it back in, for tedious >>>> implementation reasons, and when it reads it back in, there are no >>>> peaks, >>>> hence "no lines available in input".) >>>> >>>> Try using minOverlap=2. But... having said that, I'm not sure how >>>> useful >>>> DiffBind will be to you, without replicates. >>>> >>>> Cheers, >>>> >>>> - Gord Brown >>>> >>>> >>>> >>>>> Message: 22 >>>>> Date: Fri, 13 Sep 2013 12:21:02 -0600 >>>>> From: Anitha Sundararajan <asundara at="" ncgr.org=""> >>>>> To: bioconductor at r-project.org >>>>> Subject: [BioC] DiffBind -error with dba.counts >>>>> Message-ID: <5233578E.3090701 at ncgr.org> >>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>>> >>>>> Hi >>>>> >>>>> I have been trying to use DiffBind to analyze our Chip-seq data and >>>>> have >>>>> been running into some errors repeatedly. >>>>> >>>>> I first created a samplesheet.csv describing my samples and it looks >>>>> like this: >>>>> >>>>> >>>>> SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,P eaks,Pe >>>>> akC >>>>> >>>>> aller >>>>> >>>>> >>>>> meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_m eiocyte >>>>> s.b >>>>> >>>>> am,meio.vs.in.rep1.def_peaks.bed,MACS >>>>> >>>>> >>>>> seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_se edling. >>>>> bam >>>>> >>>>> ,seed.vs.in.rep1.def_peaks.bed,MACS >>>>> >>>>> >>>>> I only have two samples (and their respective inputs) with one rep >>>>> each >>>>> and the peaks were called using MACS v2. The peak caller generated >>>>> .bed >>>>> files which was used in DiffBind. >>>>> >>>>> >>>>> I defined the working directory in R first. >>>>> >>>>> I then read the sample sheet in : >>>>>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') >>>>>> H3K4.B73 >>>>> 2 Samples, 38870 sites in matrix (45304 total): >>>>> ID Tissue Factor Condition Replicate Peak.caller Intervals >>>>> 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 >>>>> 2 seed.1 seedlings H3K4me3 N 1 MACS 41596 >>>>> >>>>> generated a plot, >>>>>> plot(H3K4.B73) >>>>> And then when I tried to perform dba.counts, it continuously fails on >>>>> me. I went through the thread to find similar posts and could not >>>>> find >>>>> a solution. I tried the floowing command: >>>>> >>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3) >>>>> and this, >>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) >>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) >>>>> And they all failed. >>>>> >>>>> My error in all three cases is as follows: >>>>> Error in read.table(fn, skip = skipnum) : no lines available in input >>>>> >>>>> Please let me know if you have any insights on it. >>>>> >>>>> Thanks so much for your help in advance. >>>>> >>>>> Anitha Sundararajan Ph.D. >>>>> Research Scientist >>>>> National Center for Genome Resources >>>>> Santa Fe, NM 87505
ADD REPLY
0
Entering edit mode
Hi, Anitha, It's almost certainly running out of memory, then. If your reads are BAM format, you can try the "bLowMem" option on dba.count, which reduces the memory usage significantly, at some cost in performance (though in your case it should speed things up dramatically). (The format of the peaks doesn't matter, but the reads must be sorted, indexed BAM.) From the dba.count documentation: "bLowMem: logical indicating that the low-memory options should be used for counting (using ?summarizeOverlaps?). This option is slower but memory use does not increase with the number of reads to count. If ?TRUE?, all read files must be BAM (.bam extension), with associated index files (.bam.bai extension). ?insertLength? must absent." Also try "bParallel=FALSE". By default dba.count runs as many parallel threads for counting as there are processors in your computer; "bParallel=FALSE" ensures that it only runs one at a time, hence using much less memory. Hope this helps. We plan that the next release will remove the requirement that reads be BAM format for the bLowMem option. Cheers, - Gord On 2013-09-17 18:01, "Anitha Sundararajan" <asundara at="" ncgr.org=""> wrote: >Hi Gordon > >Please see below the session info: > > > sessionInfo() >R version 3.0.1 (2013-05-16) >Platform: x86_64-apple-darwin10.8.0 (64-bit) > >locale: >[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > >attached base packages: >[1] parallel stats graphics grDevices utils datasets methods >base > >other attached packages: >[1] DiffBind_1.6.2 Biobase_2.20.1 GenomicRanges_1.12.5 >IRanges_1.18.3 BiocGenerics_0.6.0 BiocInstaller_1.10.3 > >loaded via a namespace (and not attached): > [1] amap_0.8-7 edgeR_3.2.4 gdata_2.13.2 >gplots_2.11.3 gtools_3.0.0 limma_3.16.7 RColorBrewer_1.0-5 >stats4_3.0.1 > [9] tools_3.0.1 zlibbioc_1.6.0 > > >I have anywhere from 30-55 million reads for my samples. Yes, everything >else on the machine does slow down quite a bit. > >I am running R locally now as we do not have R 3.0.1 installed on >command line. Not sure if that matters. > >Thanks for all your help. > >Anitha > >On 9/17/13 3:05 AM, Gordon Brown wrote: >> Hi, Anitha, >> >> What version of Bioconductor/DiffBind are you running, and how much >>memory >> does your computer have? Older versions of DiffBind use a *lot* of >>memory >> in the counting stage, so if your computer is short on RAM, it could >> easily run out of memory and start swapping to disk, which will slow it >> down by orders of magnitude. Does everything else on the machine slow >> down as well? >> >> Can you pass along the output from the "sessionInfo()" command? >> >> And if possible, upgrade to the latest version of DiffBind (if you're >>not >> there already) and try the "bLowMem" option on dba.count. >> >> Other than that, I can't think of any reason it should take hours, >>unless >> you have *really* big data files. How many reads are in them, roughly? >> >> - Gord >> >> >> On 2013-09-16 21:21, "Anitha Sundararajan" <asundara at="" ncgr.org=""> wrote: >> >>> Sorry, I did try the minOverlap=2 (didnt rectify when I wrote the >>>email, >>> my bad) >>> >>> >>> On 9/16/13 1:59 PM, Anitha Sundararajan wrote: >>>> Hi Gordon >>>> >>>> I am now trying to run both reps for each sample, despite their low >>>> correlation. When I try the >>>> >>>>> B73.H3K4=dba.count(B73.H3K4, minOverlap=3) >>>> the R-session just freezes and there is no response for hours. I am >>>> not sure if there is anything wrong with any of my input files. The >>>> sample sheet gets read in fine without any errors. >>>> >>>> Just FYI, my bed file (form MACS2) looks like: >>>> >>>> >>>> chr1 9128 9552 MACS_peak_1 105.25 >>>> chr1 9918 10127 MACS_peak_2 4.72 >>>> chr1 79482 79691 MACS_peak_3 5.10 >>>> chr1 86963 87514 MACS_peak_4 50.23 >>>> chr1 94579 94781 MACS_peak_5 5.10 >>>> chr1 103763 103997 MACS_peak_6 5.10 >>>> chr1 110722 111047 MACS_peak_7 97.69 >>>> chr1 144929 145568 MACS_peak_8 127.78 >>>> chr1 161344 162320 MACS_peak_9 136.89 >>>> chr1 222479 223058 MACS_peak_10 77.67 >>>> chr1 227130 227628 MACS_peak_11 17.02 >>>> chr1 263835 263971 MACS_peak_12 12.60 >>>> chr1 264068 264518 MACS_peak_13 58.01 >>>> chr1 264625 265056 MACS_peak_14 68.16 >>>> chr1 270509 271086 MACS_peak_15 47.15 >>>> chr1 277629 277789 MACS_peak_16 13.25 >>>> >>>> Not sure if this is the problem? >>>> >>>> Thanks so much. >>>> >>>> Anitha >>>> >>>> On 9/16/13 3:51 AM, Gordon Brown wrote: >>>>> Hi, Anitha, >>>>> >>>>> The basic problem is that you have two samples, but you're asking >>>>>for a >>>>> minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). >>>>>No >>>>> locations can satisfy that criterion, so you end up with an empty set >>>>> of >>>>> peaks. >>>>> >>>>> The message is obscure, I will admit. (It happens because DiffBind >>>>> writes >>>>> out the unified set of peaks and reads it back in, for tedious >>>>> implementation reasons, and when it reads it back in, there are no >>>>> peaks, >>>>> hence "no lines available in input".) >>>>> >>>>> Try using minOverlap=2. But... having said that, I'm not sure how >>>>> useful >>>>> DiffBind will be to you, without replicates. >>>>> >>>>> Cheers, >>>>> >>>>> - Gord Brown >>>>> >>>>> >>>>> >>>>>> Message: 22 >>>>>> Date: Fri, 13 Sep 2013 12:21:02 -0600 >>>>>> From: Anitha Sundararajan <asundara at="" ncgr.org=""> >>>>>> To: bioconductor at r-project.org >>>>>> Subject: [BioC] DiffBind -error with dba.counts >>>>>> Message-ID: <5233578E.3090701 at ncgr.org> >>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>>>> >>>>>> Hi >>>>>> >>>>>> I have been trying to use DiffBind to analyze our Chip-seq data and >>>>>> have >>>>>> been running into some errors repeatedly. >>>>>> >>>>>> I first created a samplesheet.csv describing my samples and it looks >>>>>> like this: >>>>>> >>>>>> >>>>>> >>>>>>SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,P eaks, >>>>>>Pe >>>>>> akC >>>>>> >>>>>> aller >>>>>> >>>>>> >>>>>> >>>>>>meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_m eiocy >>>>>>te >>>>>> s.b >>>>>> >>>>>> am,meio.vs.in.rep1.def_peaks.bed,MACS >>>>>> >>>>>> >>>>>> >>>>>>seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_se edlin >>>>>>g. >>>>>> bam >>>>>> >>>>>> ,seed.vs.in.rep1.def_peaks.bed,MACS >>>>>> >>>>>> >>>>>> I only have two samples (and their respective inputs) with one rep >>>>>> each >>>>>> and the peaks were called using MACS v2. The peak caller generated >>>>>> .bed >>>>>> files which was used in DiffBind. >>>>>> >>>>>> >>>>>> I defined the working directory in R first. >>>>>> >>>>>> I then read the sample sheet in : >>>>>>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed') >>>>>>> H3K4.B73 >>>>>> 2 Samples, 38870 sites in matrix (45304 total): >>>>>> ID Tissue Factor Condition Replicate Peak.caller >>>>>>Intervals >>>>>> 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124 >>>>>> 2 seed.1 seedlings H3K4me3 N 1 MACS 41596 >>>>>> >>>>>> generated a plot, >>>>>>> plot(H3K4.B73) >>>>>> And then when I tried to perform dba.counts, it continuously fails >>>>>>on >>>>>> me. I went through the thread to find similar posts and could not >>>>>> find >>>>>> a solution. I tried the floowing command: >>>>>> >>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3) >>>>>> and this, >>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE) >>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE) >>>>>> And they all failed. >>>>>> >>>>>> My error in all three cases is as follows: >>>>>> Error in read.table(fn, skip = skipnum) : no lines available in >>>>>>input >>>>>> >>>>>> Please let me know if you have any insights on it. >>>>>> >>>>>> Thanks so much for your help in advance. >>>>>> >>>>>> Anitha Sundararajan Ph.D. >>>>>> Research Scientist >>>>>> National Center for Genome Resources >>>>>> Santa Fe, NM 87505 >
ADD REPLY

Login before adding your answer.

Traffic: 670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6