I have mapped my data to drosophila genome BDGP6.28. Hence, the bam files and peaks' chromosome names are 2L, 2R, 211000022278421, 211000022278482, et cetera. When I try to filter blacklists with dba.blacklist function
dba.blacklist(dbObj_initial, blacklist = DBA_BLACKLIST_DM6, greylist=FALSE)
I am informed that no blacklists are found, even though I have looked at the peaks, and saw that some do fall at blacklists' area from here.
More importantly, when I try to use greylists
dba.blacklist(dbObj_initial, blacklist = FALSE, greylist=DBA_BLACKLIST_DM6)
Error in value[[3L]](cond) : GreyListChIP error: Error in tileGenome(obj@karyotype, tilewidth = tileSize/2): 'seqlengths' contains NAs or negative values
I think both issues stem from the difference in chromosome names - in my peaks (and in my bam files) the chromosome names do not have "chr", and I guess that in the chromosome names DiffBind uses they do.
Is it possible to circumvent it and what would it take?
I can clean the blacklisted peaks by myself, using bedtools; but when it comes to greylists, I would be glad to enjoy the automatic use of greylists in DiffBind.
Is there any solution for it (perhaps using the diffloop package)?
It worked!
For some reason, getting the karyotype sizes via Seqinfo did not work for me with this specific genome, so I have downloaded a file which lists the sizes for each chromosome, and created a seqinfo object (using just the chromosome and chromosome length columns). I have passed the seqinfo object as the greylist parameter, and set the pvalue of the greylist to 0.99 (dbObj_greylist$config$greylist.pval), as in the GreyListChIP package (instead of the default 0.999 in DiffBind).
Then I have checked the greylist regions via igv, and indeed, they show a very large number of reads in the control samples. Since the number of reads is large in the greylisted regions (hundreds of reads), I think that a pvalue of 0.99 is not very permissive.
Glad to hear this worked. I'm going to document the ability to pass a
Seqinfo
object, and how to obtain one for a reference genome that are not explicitly supported in aBSgenome
object.In general I try to be conservative with the defaults, in the sense of altering the raw experimental data as little as possible, which is why the default pvalue is set at 0.999 instead of 0.99. In your case I think you are doing the right thing in restoring it to 0.99.
Is there a way to assess whether the GreyListChIP package did a good enough job? In my case, with the drosophila genome, and p-value of 0.99, the master greylist (of 3 control samples) had only 11 areas declared as grey . The coverage of the greylists was 0.03% for each of the samples only. Does that sound as "too little" to you?
Good question. I haven't run GreyListChIP on Drosophila, personally. However, it's a pretty compact genome, so it's plausible enough that there are relatively few "grey" regions. I'll give it a try on Monday and see what I get.
Can the greylist and blacklist be applied after dba.count ?
After using the same seqinfo object on another computer and after some time, now I get the following output :
I don't understand what the "2 combined objects have no sequence levels in common" message means. (It seems that it did manage to remove some peaks base on the greylist.) How do I double-check that all is fine?
When I look at the correlation values between samples
r_vals <-dba.plotHeatmap(dbObj)
the values are exactly the same whether I use the values before normalization (dba.normalize) or after it. I think that normalization should change the values. Does the heatmap show the values without normalization being applied? If so, how can normalized values be used to plot a correlation heatmap?
Yes, blacklists and greylists can be applied after
dba.count()
.Looking at previously discussed issue #101694, it would appear that there is some disagreement between the reference genome used for the bam file and the supplied
Seqinfo
, is that possible?By default, the scores used in
dba.plotHeatmap()
are normalized. If you haven't calleddba.normalize()
, then the default normalization is applied (as ifdba.normalize()
has been called with all default parameters). You can see what the heatmap looks like using non-normalized values by changing the score. You can change the score without re-counting by callingdba.count()
withpeaks=NULL
and setting thescore
parameter, eg. toscore=DBA_SCORE_READS
(raw read counts) orscore=DBA_SCORE_READS_MINUS
(raw reads minus control reads, minimum valueminCount
). You can change them back to normalized values usingscore=DBA_SCORE_NORMALIZED
which will respect whatever you set up usingdba.normalize()
.dba.count
). I would like to ask - can I assume that removing greylists before dba.count is the preferred (usual) order ?Regarding the "2 combined objects have no sequence levels in common" message : I have changed the Seqinfo object to be read via
seq_ktype <- Seqinfo(genome="GCF_000001215.4")
, but the error message persistsIt is possible for the Greylist to be calculated differently depending on whether you do it before or after counting, if there are chromosomes that have no peaks remaining after a consensus is taken. So, for example, if none of the peaks initially identified on chromosome 211000022278451 were included in the consensus, then the Greylist calculation would include reads on chromosome 211000022278451 in its model in the "before" case but not in the "after" case.
I've had a look and now I don't think that the warning is actually a problem, but I'm not 100% sure. To track it down further I'd need access to copies of your object before and after the greylist has been applied.
But the bottom line should be the same, shouldn't it? E.g. chromosome 211000022278451 would not be included after counting after both greylists and counting (no matter the order). In my case, the correlation heatmap gives different results depending on the order.
In the pre-counting case, chromosome 211000022278451 is incorporated into the greylist model of what constitutes anomalous enrichment, while in the post-consensus case, it will not be included in the model. So the cut-offs for anomalies will be different.
The warning appears only in case that I apply a blacklist first, and a greylist after it. So it's the combination of grey+blacklist which generates this warning. If you would like to look into it, can I send the objects to you by email ?
Yes if you send me links to the objects (with all peaks, blacklisted/greylisted before counting, blacklisted/greylisted after counting), I can get to the bottom of this.
Hi, I have sent the links to rory DOT stark AT cruk.cam.ac.uk
I've checked in a fix that should handle this warning condition, it is active in the currently downloadable version if you do a
BiocManager::install("DiffBind")
. The fix should be there from versionDiffBind_3.4.4
onwards.So I understand that the warning was actually of no importance and could be ignored ?
That is correct.