Hi António-
The files you sent me had mismatched chromosome names ("chr"
preprended in the peaksets, and just the number/letter in the bam
file).
There is an easy "backdoor" way to try this. In your DBA object H3K4m3
there is a filed called chrmap:
> H3K4m3$chrmap
[1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16"
"chr17" "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3" "chr4"
"chr5" "chr6" "chr7" "chr8" "chr9" "chrX"
[24] "chrY"
This has the names of the chromosomes in a vector. If you place this
vector of chromomse names with the corresponding numbers (as strings)
before you call dba.count, it should work the way you want:
> H3K4m3 = dba(sampleSheet="samplesheet_all.csv")
> chrmap.save = H3K4m3$chrmap
> H3K4m3$chrmap =
c("1","10","11","12","13","14","15","16","17","18","19",
"2","20","21","22","3","4","5","6","7","8","9","X","Y")
> H3K4m3 = dba.count(H3K4m3)
> H3K4m3$chrmap = chrmap.save
Cheers-
Rory
________________________________
From: António Domingues
[amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>]
Sent: 12 February 2013 11:47
To: Rory Stark
Cc: Gordon Brown
Subject: Re: DiffBind - Scores for peakset are all the same
Hi Rory and and Gord,
the mapping was inittially done with a UCSC reference (chr*) but when
diffbind failed I used a collaborators mapping. Unfortunately I forgot
that he tends to map against an Ensembl reference.
Just to be clear, the diffbind error happened when the bam files
should have had the matching chromosome names to the peaks. I'll redo
the whole thing from scratch and report later if it works.
Best,
António
On 12/02/13 10:35, Rory Stark wrote:
Thanks Gord, this one was staring me in the face and I couldn't see
it!
António, I wonder how this happened? Does your reference genome not
use chr1 etc.?
Cheers-
Rory
________________________________
From: Gordon Brown
Sent: 12 February 2013 09:21
To: Rory Stark; amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>
Subject: RE: DiffBind - Scores for peakset are all the same
Hi, folks,
Sort order doesn't matter, but consistent naming of chromosomes does.
The BAM file has chromosome names "1", "2", "X" and so on, but the
peaks are described using names like "chr1", "chr2", "chrX". The
software isn't clever enough to know that "1" and "chr1" are two names
for the same chromosome. So none of the reads matches a peak and all
counts come out zero.
Probably the quickest fix is to edit the peaks files to remove the
"chr" via search/replace, then re-run.
Hope this helps...
- Gord
________________________________
From: Gordon Brown
Sent: 11 February 2013 22:48
To: Rory Stark; amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>
Cc: Gordon Brown
Subject: Re: DiffBind - Scores for peakset are all the same
Hi, folks,
Sort order shouldn't matter; that's the purpose of building the tree.
But I'll double-check tomorrow.
- Gord
Insert Witty Signature Here
Rory Stark
<rory.stark@cruk.cam.ac.uk><mailto:rory.stark@cruk.cam.ac.uk> wrote:
Just to be clear, the file you sent is already sorted, in the order
given below. I expected chromosome 1 to come first but don't know how
important this is - hopefully Gord knows if this is the issue.
-R
On 11 Feb 2013, at 17:32, "António Domingues"
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote:
Humm, I thought the bam files were sorted. I can sort them and have
another go.
Thanks for taking the time to help me.
António
On 11/02/13 17:54, Rory Stark wrote:
Hi António-
The fist thing I notice is that the sort order of this file is not
what I expect. I'll need Gord to weigh in on how important that is.
The order of chromosomes is:
10
11
12
13
14
15
16
17
18
19
1
20
21
22
2
3
4
5
6
7
8
9
MT
X
Y
With chromosome 1 in the middle.
Gord, can you take a look? I've got the files in:
/lustre/mib-cri/stark01/Support/mapped_reads
Cheers-
Rory
From: António Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
Date: Sun, 10 Feb 2013 17:31:08 +0100
To: Rory Stark
<rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>>
Subject: Re: DiffBind - Scores for peakset are all the same
Hi Rory,
Thanks a lot for this. I am sending the DBA object H3K4m3 and the
correlation plot in attach. The R object is very small. I wonder if
that already hints at the problem.
Best,
António
On 09/02/13 19:09, Rory Stark wrote:
Hi António-
I've seen something like this once before.
Could you share your DBA object H3K4m3 (after the call to dba.count)
with me, either as an FTP or Dropbox, so I can troubleshoot this?
Cheers-
Rory
________________________________
From: António Domingues
[amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>]
Sent: 09 February 2013 16:32
To: Gordon Brown;
bioconductor@r-project.org<mailto:bioconductor@r-project.org>;
Rory.Stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>
Subject: DiffBind - Scores for peakset are all the same
Dear Gordon, Rory and Bioconductors,
I was using successfully Diff Bind last year, and I was picking up
were I left but things are not going smoothly. The loading of the
samples seem to be ok:
#############
library(DiffBind)
H3K4m3 = dba(sampleSheet="samplesheet_all.csv")
H3K4m3
# 8 Samples, 19885 sites in matrix (24260 total):
# ID Tissue Factor Condition Replicate Peak.caller
Intervals
# 1 woJarid1 Hela H3K4me3 wo_Jarids 1 QuEST
14111
# 2 woJarid2 Hela H3K4me3 wo_Jarids 2 QuEST
13771
# 3 indJarid1 Hela H3K4me3 ind_Jarids 1 QuEST
14865
# 4 indJarid2 Hela H3K4me3 ind_Jarids 2 QuEST
13393
# 5 woJarid1 Hela H3K4me3 wo_Jarids 1 MACS
19144
# 6 woJarid2 Hela H3K4me3 wo_Jarids 2 MACS
20391
# 7 indJarid1 Hela H3K4me3 ind_Jarids 1 MACS
22899
# 8 indJarid2 Hela H3K4me3 ind_Jarids 2 MACS
24616
And the problem is in dba.count:
H3K4m3 = dba.count(H3K4m3)
# Warning messages:
# 1: Scores for peakset indJarid1 are all the same -- correlations set
to zero.
# 2: Scores for peakset indJarid2 are all the same -- correlations set
to zero.
# 3: Scores for peakset woJarid1 are all the same -- correlations set
to zero.
# 4: Scores for peakset woJarid2 are all the same -- correlations set
to zero.
I have no idea why this is happening. I wonder if there is something
that changed in the most recent version or I am missing something.
Could you please help and point me in the right direction? Maybe I am
missing something silly. The sample sheet is attached.
Cheers,
António
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel grid grDevices datasets utils graphics stats
[8] methods base
other attached packages:
[1] ChIPpeakAnno_2.6.0 limma_3.14.4
[3] org.Hs.eg.db_2.8.0 GO.db_2.8.0
[5] RSQLite_0.11.2 DBI_0.2-5
[7] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.26.1
[9] Biostrings_2.26.3 multtest_2.14.0
[11] biomaRt_2.14.0 VennDiagram_1.5.1
[13] DiffBind_1.4.1 GenomicFeatures_1.10.1
[15] GenomicRanges_1.10.5 IRanges_1.16.4
[17] data.table_1.8.6 stringr_0.6.2
[19] ggplot2_0.9.3 plyr_1.8
[21] AnnotationDbi_1.20.2 Biobase_2.18.0
[23] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] MASS_7.3-23 RColorBrewer_1.0-5 RCurl_1.95-3
Rsamtools_1.10.2
[5] XML_3.95-0.1 amap_0.8-7 bitops_1.0-5
colorspace_1.2-1
[9] dichromat_2.0-0 digest_0.6.2 edgeR_2.4.0
gdata_2.12.0
[13] gplots_2.11.0 gtable_0.1.2 gtools_2.7.0
labeling_0.1
[17] munsell_0.4 proto_0.3-10 reshape2_1.2.2
rtracklayer_1.18.1
[21] scales_0.2.3 splines_2.15.2 stats4_2.15.2
survival_2.37-2
[25] tools_2.15.2 zlibbioc_1.4.0
>
--
---------------------------------------------------------------------
António Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstrasse 108
01307 Dresden
Germany
http://people.mpi-cbg.de/domingue/home.html
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
--
---------------------------------------------------------------------
António Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstrasse 108
01307 Dresden
Germany
http://people.mpi-cbg.de/domingue/home.html
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
--
---------------------------------------------------------------------
António Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstrasse 108
01307 Dresden
Germany
http://people.mpi-cbg.de/domingue/home.html
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
--
---------------------------------------------------------------------
António Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstrasse 108
01307 Dresden
Germany
http://people.mpi-cbg.de/domingue/home.html
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]