Entering edit mode
kentanaka@chiba-u.jp
▴
30
@kentanakachiba-ujp-5955
Last seen 10.2 years ago
Dear Dr. Brown,
I asked the questions earlier regarding the DiffBind last week.
I have versioned up to R3.0.1 and Bioconductor 2.12, and I checked to
see how DiffBind 1.6.2 works. But the dba.count() still crashes in the
bed data and I attached the logs of the bed data below.
Since the versioned up DiffBind 1.6.2 output the GEO DATA as an
Alignment Data Sample, I downloaded the data and I checked to see what
that is. And I found that the BWA bam files were created.
I converted the fastq data on Th2 immune cells to bam files and I
checked to see how it works. And finally, I could obtain the
dba.report
().
Thank you very much for all your advice and I really appreciate for
your
kindness.
My Best Regards,
Ken Tanaka
----------------------------------------------------------------------
--
------------------------
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-suse-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] DiffBind_1.6.2 Biobase_2.20.0 GenomicRanges_1.12.4
[4] IRanges_1.18.1 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.0-5 amap_0.8-7 edgeR_3.2.3
gdata_2.12.
0.2
[5] gplots_2.11.0.1 gtools_2.7.1 limma_3.16.5
stats4_3.0.
1
[9] zlibbioc_1.6.0
> th2 = dba(sampleSheet="th2diffbind.csv")
GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
> th2
3 Samples, 16806 sites in matrix (35676 total):
ID Tissue Factor Condition Treatment Replicate Peak.caller
1 GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
2 H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
3 H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
Intervals
1 464
2 25875
3 32569
> th2 = dba.count(th2,minOverlap=3, bParallel=F)
Sample: databed/Th2_GATA3_Ab.bed.gz
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: .Call("croi_load_reads", as.character(bamfile), as.integer(
insertLength))
2: pv.getCounts(job, bed, insertLength, bWithoutDupes =
bWithoutDupes,
bLowMem, yieldSize, mode, singleEnd, scanbamparam)
3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
bWithoutDupes = bWithoutDupes, bLowMem, yieldSize, mode,
singleEnd,
scanbamparam))
4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap,
defaultScore
= score, bLog = bLog, insertLength = insertLength, bOnlyCounts =
T,
bCalledMasks = bCalledMasks, minMaxval = filter, bParallel =
bParallel, bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates,
bScaleControl = bScaleControl, filterFun = filterFun, bLowMem =
bLowMem)
5: dba.count(th2, minOverlap = 3, bParallel = F)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
----------------------------------------------------------------------
--
------------------------
# cat th2diffbind.csv
SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,bamContr
ol,
ControlID,Peaks,PeakCaller,PeakFormat
GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.
gz,
databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.xls,macs,
macs
H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K4me
3.
bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
xls,macs,macs
H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.b
ed.
gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.xls,
macs,macs
#zmore Th2_GATA3_Ab.bed.gz |head
------> Th2_GATA3_Ab.bed.gz <------
chrY 28285 28404 Th2_GATA3_Ab 1
chrY 28363 28482 Th2_GATA3_Ab 1
chrY 28466 28585 Th2_GATA3_Ab 1
chrY 30469 30588 Th2_GATA3_Ab 1
chrY 99937 100056 Th2_GATA3_Ab 1
chrY 114597 114716 Th2_GATA3_Ab 1
chrY 269098 269217 Th2_GATA3_Ab 1
chrY 269773 269892 Th2_GATA3_Ab 1
chrY 410005 410124 Th2_GATA3_Ab 1
# head -30 GATA3_Ab_peaks.xls
# This file is generated by MACS version 1.4.2 20120305
# ARGUMENTS LIST:
# name = GATA3_Ab
# format = BED
# ChIP-seq file = ../databed/Tst_GATA3_Ab.bed
# control file = ../databed/Tst_WCE.bed
# effective genome size = 1.87e+09
# band width = 300
# model fold = 10,30
# pvalue cutoff = 1.00e-05
# Large dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# tag size is determined as 119 bps
# total tags in treatment: 353649
# tags after filtering in treatment: 353649
# maximum duplicate tags at the same position in treatment = 1
# Redundant rate in treatment: 0.00
# total tags in control: 481971
# tags after filtering in control: 481971
# maximum duplicate tags at the same position in control = 1
# Redundant rate in control: 0.00
# d = 200
chr start end length summit tags -10*log10(pvalue)
fold_enrichment FDR(%)
chr1 3704028 3704311 284 142 4 53.27 19.47 13.74
chr1 3787290 3787781 492 299 4 69.45 68.14 10.89
chr1 5694647 5695942 1296 189 11 104.10 25.55 8.33
chr1 5794055 5794296 242 121 4 85.91 38.94 4.17
chr1 6125574 6126093 520 337 7 56.83 16.22 13.71
chr1 6444066 6445227 1162 558 19 63.10 10.22 14.29
------------------------------
------------------------------------------------------------------
----- Original Message -----
> Hi, Ken,
>
> This bug is fixed in the current version of DiffBind, so if you
upgrade,
> it should go away. If upgrading is not feasible, ensure that the
bed
> files all have 6 columns to work around the bug.
>
> Cheers,
>
> - Gord
>
>
> >Message: 23
> >Date: Sat, 25 May 2013 16:55:00 +0900
> >From: <kentanaka at="" chiba-u.jp="">
> >To: <bioconductor at="" r-project.org="">
> >Subject: [BioC] About the DiffBind dba.count() crash problems
> >Message-ID: <20130525075500.0000701E.0473 at chiba-u.jp>
> >Content-Type: text/plain; charset=UTF-8
> >
> >Hi, I'm Ken Tanaka.
> >
> >I'm currently interested in analyzing the DiffBind analysis by
using
the
> >ChIP-seq data from Th2 immune cell samples.
> >
> >To be more specific, I would like to analyze this data (GSE28292)
by
> >using DiffBind analysis.
> >
> >I have questions regarding the dba.count().
> >When I execute the dba.count(), it crashes.
> >
> >The bed data which I'm using doesn't include the 6th strand column.
> >So, I suppose the crash problem doesn't originate from the problems
> >regarding the columns.
> >
> >I would like to know how to modify the bed data which the DiffBind
can
> >read the bed file specifications.
> >If you can inform me of these DiffBind bed file specifications
which
can
> >read the bed data, I think I will be able to make the perl script
for
> >conversions.
> >So, could you kindly please let me know of these DiffBind bed file
> >specifications which can read the bed data?
> >
> >I attached below the data and logs which I used for this analysis
as
> >follows.
> >
> >My Best Regards,
> >Ken Tanaka
> >
> >----------------------------------------------------------------
> ># ChIP-seq bed data files.
> >GSM773482_Th2_GATA3_Ab.bed.gz
> >GSM773480_Th2_control_Ab.bed.gz
> >GSM773484_Th2_WCE.bed.gz (The 2 bed files listed above are the
> >controls.)
> >
> >GSM773486_Th2_WT_anti_H3K27me3.bed.gz
> >GSM773490_Th2_WT_anti_H3K9Ac.bed.gz
> >GSM773492_Th2_WT_anti_H3K4me3.bed.gz
> >GSM773488_Th2_WT_input.bed.gz (The 3 bed files listed above are the
> >controls.)
> >
> >
> ># macs14 1.4.2 20120305 peak calling output files.
> >GATA3_Ab_peaks.bed
> >control_Ab_peaks.bed
> >
> >H3K27me3_peaks.bed
> >H3K4me3_peaks.bed
> >H3K9Ac_peaks.bed
> >
> >
> ># DiffBind sampleSheet file.
> >%cat th2diffbind.csv
> >SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,
bamControl,
> >ControlID,Peaks,PeakCaller,PeakFormat
> >GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.b
ed.
gz,
>
>databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.bed,macs,
raw
>
>control_Ab,control_Ab,Th2,Resistant,Full_Media,1,databed/Th2_control_
Ab.
>
>bed.gz,databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/control_Ab_peaks.
bed,
> >macs,raw
> >H3K27me3,H3K27me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
> >H3K27me3.bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/
> >H3K27me3_peaks.bed,macs,raw
> >H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
H3K4me3.
> >bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_pea
ks.
> >bed,macs,raw
>
>H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.
bed.
>
>gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.bed,
> >macs,raw
> >
> >
> >
> >
> >> th2 = dba(sampleSheet="th2diffbind.csv")
> >GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
> >control_Ab control_Ab Th2 Resistant Full_Media 1 macs
> >H3K27me3 H3K27me3 Th2 Responsive Full_Media 1 macs
> >H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
> >H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
> >>
> >> #th2
> >> #str(th2)
> >> #plot(th2)
> >>
> >> # peaks counting reads
> >> #th2 = dba.count(th2, bParallel=F)
> >> th2 = dba.count(th2,minOverlap=3, bParallel=F)
> >Sample: databed/Th2_GATA3_Ab.bed.gz
> >
> > *** caught segfault ***
> >address 0x10, cause 'memory not mapped'
> >
> >Traceback:
> > 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
> >insertLength))
> > 2: pv.getCounts(job, bed, insertLength, bWithoutDupes =
bWithoutDupes)
> > 3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
> >bWithoutDupes = bWithoutDupes))
> > 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap,
defaultScore
> >= score, bLog = bLog, insertLength = insertLength, bOnlyCounts
=
T,
> > bCalledMasks = bCalledMasks, minMaxval = maxFilter, bParallel =
> >bParallel, bUseLast = bUseLast, bWithoutDupes =
bRemoveDuplicates,
> >bScaleControl = bScaleControl)
> > 5: dba.count(th2, minOverlap = 3, bParallel = F)
> >
> >Possible actions:
> >1: abort (with core dump, if enabled)
> >2: normal R exit
> >3: exit R without saving workspace
> >4: exit R saving workspace
> >Selection: 1
> >
> >
> >
> >> sessionInfo()
> >R version 2.15.2 (2012-10-26)
> >Platform: x86_64-suse-linux-gnu (64-bit)
> >
> >locale:
> > [1] LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=C
> > [3] LC_TIME=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8
> > [5] LC_MONETARY=ja_JP.UTF-8 LC_MESSAGES=ja_JP.UTF-8
> > [7] LC_PAPER=C LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> >[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats graphics grDevices utils datasets methods
base
> >
> >other attached packages:
> >[1] DiffBind_1.4.2 Biobase_2.18.0 GenomicRanges_1.10.7
> >[4] IRanges_1.16.6 BiocGenerics_0.4.0
> >
> >loaded via a namespace (and not attached):
> > [1] RColorBrewer_1.0-5 amap_0.8-7 edgeR_3.0.8
gdata_2.
12.
> >0
> > [5] gplots_2.11.0 gtools_2.7.0 limma_3.14.4
parallel_2.
> >15.2
> > [9] stats4_2.15.2 zlibbioc_1.4.0
> >>
>
>---------------------------------------------------------------------
---
> >---------
> >
> >--------------------------------------
> >Ken Tanaka
> >MD-PhD Candidate
> >Chiba University Medical School
> >
> >
> >
> >------------------------------
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> >
> >End of Bioconductor Digest, Vol 123, Issue 26
> >*********************************************
>
>