First thing first: thanks a lot for providing the community with such a useful ChIP-seq tool as DiffBind :)
It seems I am missing something in how the score column values are computed in the 1st step of reading in the datasets with the dba() function.
My design table is in a csv file, using peaksets derived from MACS2 xls output file (broad peak).
I ran the following command to read-in the design table and MACS peaksets:
mydba <- dba(sampleSheet = filename, peakCaller = "macs", peakFormat = "macs", scoreCol = 7)
head(mydba$peaks[]) # MACS2 input peaks coordinates chr start end X.log10.pvalue. 1 1 778427 778623 0.07637541
The same peak info line in the original MACS2 xls file looks like this:
chr start end length abs_summit pileup -log10(pvalue) fold_enrichment -log10(qvalue) name 1 778427 778623 197 778528 19.00 16.93802 8.87306 14.41193 myfile
Basically, the "score", i.e -log10(pvalue) from MACS2 is 16.93802.
While dba() give a 0.0763 as a score.
I thought that normalization/processing was performed later on with dba.count(), that the dba() step was only to load the data?
My question therefore is: how is the -log10(pvalue) computed by dba()?
University of Fribourg, CH