Question

DiffBind - Scores for peakset are all the same

0

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 11 weeks ago

Germany

Dear Gordon, Rory and Bioconductors, I was using successfully Diff Bind last year, and I was picking up were I left but things are not going smoothly. The loading of the samples seem to be ok: ############# library(DiffBind) H3K4m3 = dba(sampleSheet="samplesheet_all.csv") H3K4m3 # 8 Samples, 19885 sites in matrix (24260 total): # ID Tissue Factor Condition Replicate Peak.caller Intervals # 1 woJarid1 Hela H3K4me3 wo_Jarids 1 QuEST 14111 # 2 woJarid2 Hela H3K4me3 wo_Jarids 2 QuEST 13771 # 3 indJarid1 Hela H3K4me3 ind_Jarids 1 QuEST 14865 # 4 indJarid2 Hela H3K4me3 ind_Jarids 2 QuEST 13393 # 5 woJarid1 Hela H3K4me3 wo_Jarids 1 MACS 19144 # 6 woJarid2 Hela H3K4me3 wo_Jarids 2 MACS 20391 # 7 indJarid1 Hela H3K4me3 ind_Jarids 1 MACS 22899 # 8 indJarid2 Hela H3K4me3 ind_Jarids 2 MACS 24616 And the problem is in dba.count: H3K4m3 = dba.count(H3K4m3) # Warning messages: # 1: Scores for peakset indJarid1 are all the same -- correlations set to zero. # 2: Scores for peakset indJarid2 are all the same -- correlations set to zero. # 3: Scores for peakset woJarid1 are all the same -- correlations set to zero. # 4: Scores for peakset woJarid2 are all the same -- correlations set to zero. I have no idea why this is happening. I wonder if there is something that changed in the most recent version or I am missing something. Could you please help and point me in the right direction? Maybe I am missing something silly. The sample sheet is attached. Cheers, Ant?nio > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel grid grDevices datasets utils graphics stats [8] methods base other attached packages: [1] ChIPpeakAnno_2.6.0 limma_3.14.4 [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 [5] RSQLite_0.11.2 DBI_0.2-5 [7] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.26.1 [9] Biostrings_2.26.3 multtest_2.14.0 [11] biomaRt_2.14.0 VennDiagram_1.5.1 [13] DiffBind_1.4.1 GenomicFeatures_1.10.1 [15] GenomicRanges_1.10.5 IRanges_1.16.4 [17] data.table_1.8.6 stringr_0.6.2 [19] ggplot2_0.9.3 plyr_1.8 [21] AnnotationDbi_1.20.2 Biobase_2.18.0 [23] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-23 RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.2 [5] XML_3.95-0.1 amap_0.8-7 bitops_1.0-5 colorspace_1.2-1 [9] dichromat_2.0-0 digest_0.6.2 edgeR_2.4.0 gdata_2.12.0 [13] gplots_2.11.0 gtable_0.1.2 gtools_2.7.0 labeling_0.1 [17] munsell_0.4 proto_0.3-10 reshape2_1.2.2 rtracklayer_1.18.1 [21] scales_0.2.3 splines_2.15.2 stats4_2.15.2 survival_2.37-2 [25] tools_2.15.2 zlibbioc_1.4.0 > -- --------------------------------------------------------------------- Ant?nio Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics Pfotenhauerstrasse 108 01307 Dresden Germany http://people.mpi-cbg.de/domingue/home.html e-mail: domingue at mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology

GO BSgenome BSgenome GO BSgenome BSgenome • 1.1k views

ADD COMMENT • link updated 11.2 years ago by Rory Stark ★ 5.1k • written 11.2 years ago by António Miguel de Jesus Domingues ▴ 490

score 0 · Answer 1 · 2013-02-09

Hi António- I've seen something like this once before. Could you share your DBA object H3K4m3 (after the call to dba.count) with me, either as an FTP or Dropbox, so I can troubleshoot this? Cheers- Rory ________________________________ From: António Domingues [amjdomingues@gmail.com] Sent: 09 February 2013 16:32 To: Gordon Brown; bioconductor@r-project.org; Rory.Stark@cancer.org.uk Subject: DiffBind - Scores for peakset are all the same Dear Gordon, Rory and Bioconductors, I was using successfully Diff Bind last year, and I was picking up were I left but things are not going smoothly. The loading of the samples seem to be ok: ############# library(DiffBind) H3K4m3 = dba(sampleSheet="samplesheet_all.csv") H3K4m3 # 8 Samples, 19885 sites in matrix (24260 total): # ID Tissue Factor Condition Replicate Peak.caller Intervals # 1 woJarid1 Hela H3K4me3 wo_Jarids 1 QuEST 14111 # 2 woJarid2 Hela H3K4me3 wo_Jarids 2 QuEST 13771 # 3 indJarid1 Hela H3K4me3 ind_Jarids 1 QuEST 14865 # 4 indJarid2 Hela H3K4me3 ind_Jarids 2 QuEST 13393 # 5 woJarid1 Hela H3K4me3 wo_Jarids 1 MACS 19144 # 6 woJarid2 Hela H3K4me3 wo_Jarids 2 MACS 20391 # 7 indJarid1 Hela H3K4me3 ind_Jarids 1 MACS 22899 # 8 indJarid2 Hela H3K4me3 ind_Jarids 2 MACS 24616 And the problem is in dba.count: H3K4m3 = dba.count(H3K4m3) # Warning messages: # 1: Scores for peakset indJarid1 are all the same -- correlations set to zero. # 2: Scores for peakset indJarid2 are all the same -- correlations set to zero. # 3: Scores for peakset woJarid1 are all the same -- correlations set to zero. # 4: Scores for peakset woJarid2 are all the same -- correlations set to zero. I have no idea why this is happening. I wonder if there is something that changed in the most recent version or I am missing something. Could you please help and point me in the right direction? Maybe I am missing something silly. The sample sheet is attached. Cheers, António > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel grid grDevices datasets utils graphics stats [8] methods base other attached packages: [1] ChIPpeakAnno_2.6.0 limma_3.14.4 [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 [5] RSQLite_0.11.2 DBI_0.2-5 [7] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.26.1 [9] Biostrings_2.26.3 multtest_2.14.0 [11] biomaRt_2.14.0 VennDiagram_1.5.1 [13] DiffBind_1.4.1 GenomicFeatures_1.10.1 [15] GenomicRanges_1.10.5 IRanges_1.16.4 [17] data.table_1.8.6 stringr_0.6.2 [19] ggplot2_0.9.3 plyr_1.8 [21] AnnotationDbi_1.20.2 Biobase_2.18.0 [23] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-23 RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.2 [5] XML_3.95-0.1 amap_0.8-7 bitops_1.0-5 colorspace_1.2-1 [9] dichromat_2.0-0 digest_0.6.2 edgeR_2.4.0 gdata_2.12.0 [13] gplots_2.11.0 gtable_0.1.2 gtools_2.7.0 labeling_0.1 [17] munsell_0.4 proto_0.3-10 reshape2_1.2.2 rtracklayer_1.18.1 [21] scales_0.2.3 splines_2.15.2 stats4_2.15.2 survival_2.37-2 [25] tools_2.15.2 zlibbioc_1.4.0 > -- --------------------------------------------------------------------- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics Pfotenhauerstrasse 108 01307 Dresden Germany http://people.mpi-cbg.de/domingue/home.html e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:20}}

score 0 · Answer 2 · 2013-02-12

Hi António- The files you sent me had mismatched chromosome names ("chr" preprended in the peaksets, and just the number/letter in the bam file). There is an easy "backdoor" way to try this. In your DBA object H3K4m3 there is a filed called chrmap: > H3K4m3$chrmap [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chrX" [24] "chrY" This has the names of the chromosomes in a vector. If you place this vector of chromomse names with the corresponding numbers (as strings) before you call dba.count, it should work the way you want: > H3K4m3 = dba(sampleSheet="samplesheet_all.csv") > chrmap.save = H3K4m3$chrmap > H3K4m3$chrmap = c("1","10","11","12","13","14","15","16","17","18","19", "2","20","21","22","3","4","5","6","7","8","9","X","Y") > H3K4m3 = dba.count(H3K4m3) > H3K4m3$chrmap = chrmap.save Cheers- Rory ________________________________ From: António Domingues [amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>] Sent: 12 February 2013 11:47 To: Rory Stark Cc: Gordon Brown Subject: Re: DiffBind - Scores for peakset are all the same Hi Rory and and Gord, the mapping was inittially done with a UCSC reference (chr*) but when diffbind failed I used a collaborators mapping. Unfortunately I forgot that he tends to map against an Ensembl reference. Just to be clear, the diffbind error happened when the bam files should have had the matching chromosome names to the peaks. I'll redo the whole thing from scratch and report later if it works. Best, António On 12/02/13 10:35, Rory Stark wrote: Thanks Gord, this one was staring me in the face and I couldn't see it! António, I wonder how this happened? Does your reference genome not use chr1 etc.? Cheers- Rory ________________________________ From: Gordon Brown Sent: 12 February 2013 09:21 To: Rory Stark; amjdomingues@gmail.com<mailto:amjdomingues@gmail.com> Subject: RE: DiffBind - Scores for peakset are all the same Hi, folks, Sort order doesn't matter, but consistent naming of chromosomes does. The BAM file has chromosome names "1", "2", "X" and so on, but the peaks are described using names like "chr1", "chr2", "chrX". The software isn't clever enough to know that "1" and "chr1" are two names for the same chromosome. So none of the reads matches a peak and all counts come out zero. Probably the quickest fix is to edit the peaks files to remove the "chr" via search/replace, then re-run. Hope this helps... - Gord ________________________________ From: Gordon Brown Sent: 11 February 2013 22:48 To: Rory Stark; amjdomingues@gmail.com<mailto:amjdomingues@gmail.com> Cc: Gordon Brown Subject: Re: DiffBind - Scores for peakset are all the same Hi, folks, Sort order shouldn't matter; that's the purpose of building the tree. But I'll double-check tomorrow. - Gord Insert Witty Signature Here Rory Stark <rory.stark@cruk.cam.ac.uk><mailto:rory.stark@cruk.cam.ac.uk> wrote: Just to be clear, the file you sent is already sorted, in the order given below. I expected chromosome 1 to come first but don't know how important this is - hopefully Gord knows if this is the issue. -R On 11 Feb 2013, at 17:32, "António Domingues" <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: Humm, I thought the bam files were sorted. I can sort them and have another go. Thanks for taking the time to help me. António On 11/02/13 17:54, Rory Stark wrote: Hi António- The fist thing I notice is that the sort order of this file is not what I expect. I'll need Gord to weigh in on how important that is. The order of chromosomes is: 10 11 12 13 14 15 16 17 18 19 1 20 21 22 2 3 4 5 6 7 8 9 MT X Y With chromosome 1 in the middle. Gord, can you take a look? I've got the files in: /lustre/mib-cri/stark01/Support/mapped_reads Cheers- Rory From: António Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> Date: Sun, 10 Feb 2013 17:31:08 +0100 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Subject: Re: DiffBind - Scores for peakset are all the same Hi Rory, Thanks a lot for this. I am sending the DBA object H3K4m3 and the correlation plot in attach. The R object is very small. I wonder if that already hints at the problem. Best, António On 09/02/13 19:09, Rory Stark wrote: Hi António- I've seen something like this once before. Could you share your DBA object H3K4m3 (after the call to dba.count) with me, either as an FTP or Dropbox, so I can troubleshoot this? Cheers- Rory ________________________________ From: António Domingues [amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>] Sent: 09 February 2013 16:32 To: Gordon Brown; bioconductor@r-project.org<mailto:bioconductor@r-project.org>; Rory.Stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk> Subject: DiffBind - Scores for peakset are all the same Dear Gordon, Rory and Bioconductors, I was using successfully Diff Bind last year, and I was picking up were I left but things are not going smoothly. The loading of the samples seem to be ok: ############# library(DiffBind) H3K4m3 = dba(sampleSheet="samplesheet_all.csv") H3K4m3 # 8 Samples, 19885 sites in matrix (24260 total): # ID Tissue Factor Condition Replicate Peak.caller Intervals # 1 woJarid1 Hela H3K4me3 wo_Jarids 1 QuEST 14111 # 2 woJarid2 Hela H3K4me3 wo_Jarids 2 QuEST 13771 # 3 indJarid1 Hela H3K4me3 ind_Jarids 1 QuEST 14865 # 4 indJarid2 Hela H3K4me3 ind_Jarids 2 QuEST 13393 # 5 woJarid1 Hela H3K4me3 wo_Jarids 1 MACS 19144 # 6 woJarid2 Hela H3K4me3 wo_Jarids 2 MACS 20391 # 7 indJarid1 Hela H3K4me3 ind_Jarids 1 MACS 22899 # 8 indJarid2 Hela H3K4me3 ind_Jarids 2 MACS 24616 And the problem is in dba.count: H3K4m3 = dba.count(H3K4m3) # Warning messages: # 1: Scores for peakset indJarid1 are all the same -- correlations set to zero. # 2: Scores for peakset indJarid2 are all the same -- correlations set to zero. # 3: Scores for peakset woJarid1 are all the same -- correlations set to zero. # 4: Scores for peakset woJarid2 are all the same -- correlations set to zero. I have no idea why this is happening. I wonder if there is something that changed in the most recent version or I am missing something. Could you please help and point me in the right direction? Maybe I am missing something silly. The sample sheet is attached. Cheers, António > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel grid grDevices datasets utils graphics stats [8] methods base other attached packages: [1] ChIPpeakAnno_2.6.0 limma_3.14.4 [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 [5] RSQLite_0.11.2 DBI_0.2-5 [7] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.26.1 [9] Biostrings_2.26.3 multtest_2.14.0 [11] biomaRt_2.14.0 VennDiagram_1.5.1 [13] DiffBind_1.4.1 GenomicFeatures_1.10.1 [15] GenomicRanges_1.10.5 IRanges_1.16.4 [17] data.table_1.8.6 stringr_0.6.2 [19] ggplot2_0.9.3 plyr_1.8 [21] AnnotationDbi_1.20.2 Biobase_2.18.0 [23] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-23 RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.2 [5] XML_3.95-0.1 amap_0.8-7 bitops_1.0-5 colorspace_1.2-1 [9] dichromat_2.0-0 digest_0.6.2 edgeR_2.4.0 gdata_2.12.0 [13] gplots_2.11.0 gtable_0.1.2 gtools_2.7.0 labeling_0.1 [17] munsell_0.4 proto_0.3-10 reshape2_1.2.2 rtracklayer_1.18.1 [21] scales_0.2.3 splines_2.15.2 stats4_2.15.2 survival_2.37-2 [25] tools_2.15.2 zlibbioc_1.4.0 > -- --------------------------------------------------------------------- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics Pfotenhauerstrasse 108 01307 Dresden Germany http://people.mpi-cbg.de/domingue/home.html e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. -- --------------------------------------------------------------------- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics Pfotenhauerstrasse 108 01307 Dresden Germany http://people.mpi-cbg.de/domingue/home.html e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology -- --------------------------------------------------------------------- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics Pfotenhauerstrasse 108 01307 Dresden Germany http://people.mpi-cbg.de/domingue/home.html e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology -- --------------------------------------------------------------------- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics Pfotenhauerstrasse 108 01307 Dresden Germany http://people.mpi-cbg.de/domingue/home.html e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]