Why is the "detection" value given by lumidat the inverse (i.e. 1-x) of the "detection_pval" given by GenomeStudio?
2
3
Entering edit mode
willj ▴ 30
@willj-8763
Last seen 4.9 years ago
France

I've ran several idat files of microarray data through lumidat, including the example idat files provided in the lumidat package. The "Detection" column always has it's greatest values (i.e. values at or close to 1) for the probes having the strongest signal strength. For example, below I paste output for the three probes having the strongest signal in the lumidat example idat file 5356583020_A. This output was generated using the example given in the lumidat vignette: example(preprocess.illumina.idat)

I have Genome Studio output for some of the idats that I have run. There are minimal changes in the ranking of probes according to signal strength but the Genome Studio "Detection Pval" for the strongest probes is usually close to 0. This makes sense to me: except if I've misunderstood something, the probes with the strongest signal would be the most confidently detected and thus have the lowest detection p-values. So, why is it that lumidat gives a "detection" value close to 1 and not close to 0?  Is the lumidat "Detection" column intended to represent the detection p-value, or does it represent something else? (e.g. detection type 2 error, which I guess is 1-p?). Also (and more importantly) is this difference between Genome Studio and lumidat automatically accounted for in downstream workflows such as Lumi?

edit: I should add that the title may be a little misleading because the lumidat "detection" value is not precisely 1-x of the "detection_pval" given by GenomeStudio - it is approximately 1-x, in the examples I have checked.

 TargetID ProbeID MIN_Signal-5356583020_A AVG_Signal-5356583020_A MAX_Signal-5356583020_A NARRAYS-5356583020_A ARRAY_STDEV-5356583020_A BEAD_STDEV-5356583020_A Avg_NBEADS-5356583020_A Detection-5356583020_A HBG2 TIPRV4rkp9ekgjrh6I 42501.53 42501.53 42501.53 1 NaN 15260.67 23 1 UBB oV5F615l4qunvD30tM 40860.58 40860.58 40860.58 1 NaN 10581.83 25 1 EEF1A1 97viJ90hzUX_rZ.nvY 40495.68 40495.68 40495.68 1 NaN 12339.05 30 1
lumidat idat microarray illumina • 1.7k views
3
Entering edit mode
Mark Cowley ▴ 400
@mark-cowley-2858
Last seen 6.9 years ago
Australia

Hi Bill,

My goal for lumidat was to recapitulate the behaviour of GenomeStudio, for better or for worse. As noted by Gordon, and as you can see below in the code, the actual behaviour of GS contradicted the software manual. I like Gordon's idea of detecting & converting them to proper p-values. I usually used lumi for downstream processing, and felt that lumi handled the detection p-values appropriately produced by lumidat. For use with the limma pipeline, perhaps 1-p is more appropriate. I'm afraid that level of nuance has escaped me after so many years of looking at this code.

Here's an excerpt from the underling Java code:

    /**
* According to the GenomeStudio manual, Detection p-value = 1-R/N,
* where R is the rank of the gene signal relative to negative controls and N is number of negative controls.
* if intensity < all negControls, R=0, p=1
* else if intensity > all negControls, R=N, p=0
* else R = sum(negControls < intensity). in other words, R is the 0-based index that it would have been, if it was in the vector
*
* However, in comparison to GenomeStudio version 1.8.0 actual output, it's clear that for the Reporter probes,
* P=R/N; ie higher expressed genes have P=1; very lowly expressed genes have P=0.
*
* Since this function aims to reproduce the ACTUAL GenomeStudio output, we write out P=R/N
*
* References:
* Chapter 4, Page 106 (page 122/186 in PDF) GenomeStudio_GX_Module_v1.0_UG_11319121_RevA_tcm240-21462.pdf
*
* @param R the rank of the gene
* @param N the number of negative controls
* @return The Pvalue of a gene with rank R, relative to the N negative controls.
*/
public static float getPValue(int R, int N) {
float P = 1.0f - ((float)R / (float)N);
P = 1 - P; // In comparing to GenomeStudio output, it's clear that they've use P=R/N, so highly expressed genes have P=1
return (P);
}

best regards,

Mark

1
Entering edit mode

Mark, thanks, that makes it clear.

limma will work fine with Detection as either p or 1-p. I guess that lumi will also.

0
Entering edit mode

Mark - thanks very much for responding to my mail! And thanks to Gordon and you both for advice on how this will work with limma and lumi.

4
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

lumidat is not a Bioconductor package, so this is not the place to ask questions about it.

GenomeStudio is not consistent about how it outputs the Detection column. Some versions of GenomeStudio have written a p-value in this column, and some write 1-p instead.

My lab uses the read.idat() function in the limma package to read Illumina idat files. This feeds straight into the neqc() function for background correcting and normalizing the expression values. These functions automatically check for how the detection p-value has been represented in the idat file. They automatically convert back to proper p-values if 1-p has been given.

I don't know about lumidat, but it is possible that they have chosen to always convert to 1-p instead of p. You would have to write to the lumidat authors to find out.

neqc() doesn't damp down the signal as much as the lumi procedures do, see:

Edit: limma::read.idat() and lumidat::read.ilmn.idat() seem to be similar in intention, and both should work fine. read.idat() will pass through the Detection column as found in the idat file, whereas read.ilmn.idat() resets the Detection column. Either should work fine with neqc() and downstream functions.

BTW, neqc() has the nice feature that it figures out what the negative control values must have been from the detection p-values, then uses the negative control values to set the parameters of the normexp background correction step.

0
Entering edit mode

Dear Gordon, thanks very much for explaining this about the GenomeStudio output and how it is interpreted by read.idat(), and for your advice on using your neqc function. I will try to find a way to contact the lumidat authors (edit: Mark Cowley commented below), but I do suggest that it was still good to post the question here because it is relevant to downstream analyses in Bioconductor packages like limma and lumi.