I am puzzled by the documentation of dba.peakset(), where the detailed description of peak.format argument in the help shows:
“narrow”: narrowPeaks file; scoreCol=8
shows that column 8 of NarrowPeaks file is the p-value.
Would it be better to use the signalValue column (column 7 instead of 8)? I wonder what are the pros and cons of using (unadjusted) p-value vs signalValue?
The detailed description of scoreCol argument in the help shows:
peak column to normalize to 0...1 scale when adding a peakset; 0 indicates no normalization
Does that mean peak column is doing two things: the first is to override the column choice given above, so that I can for instance force DiffBind to use signalValue column; the second is to indicate normalization is TRUE/FALSE.
Teck Por Lim
National University of Singapore
The default scoring for each peakset type is as you describe. Using the p-value means that the highest confidence peaks (taking into account eg background) will have the highest scores. It is worth noting that for the narrowPeak and MACS xls formats, the p-values are reported as -log10(p), so higher scores indicate higher confidence peaks. As you suggest, other values can be used as scores, and as you figured out, you can override the default column using the scoreCol parameter in dba.peakset (or a column called ScoreCol when using a samplesheet with dba()).
Looking at this now, I see that there is actually an error in the documentation for the scoreCol parameter for dba.peakset(). Setting this to zero will not prevent the scores from being normalized, rather it will prevent any scores from being extracted from the peak file (in which case all scores will be 1 for samples where the peak is called). I have fixed this in the man page moving forward.
A minor comment on "signalValue" versus -log10(p-value): at least in MACS, signalValue is fold change, which can be quite unstable, particularly for small peaks (1 more or fewer background reads can change the apparent fold change dramatically if there are only a few reads). -log10(p-value) is probably more consistent. My $0.02, anyway... :)