Search
Question: Why is the "detection" value given by lumidat the inverse (i.e. 1-x) of the "detection_pval" given by GenomeStudio?
3
gravatar for willj
2.2 years ago by
willj30
France
willj30 wrote:

I've ran several idat files of microarray data through lumidat, including the example idat files provided in the lumidat package. The "Detection" column always has it's greatest values (i.e. values at or close to 1) for the probes having the strongest signal strength. For example, below I paste output for the three probes having the strongest signal in the lumidat example idat file 5356583020_A. This output was generated using the example given in the lumidat vignette: example(preprocess.illumina.idat)

I have Genome Studio output for some of the idats that I have run. There are minimal changes in the ranking of probes according to signal strength but the Genome Studio "Detection Pval" for the strongest probes is usually close to 0. This makes sense to me: except if I've misunderstood something, the probes with the strongest signal would be the most confidently detected and thus have the lowest detection p-values. So, why is it that lumidat gives a "detection" value close to 1 and not close to 0?  Is the lumidat "Detection" column intended to represent the detection p-value, or does it represent something else? (e.g. detection type 2 error, which I guess is 1-p?). Also (and more importantly) is this difference between Genome Studio and lumidat automatically accounted for in downstream workflows such as Lumi?

edit: I should add that the title may be a little misleading because the lumidat "detection" value is not precisely 1-x of the "detection_pval" given by GenomeStudio - it is approximately 1-x, in the examples I have checked.

TargetID ProbeID MIN_Signal-5356583020_A AVG_Signal-5356583020_A MAX_Signal-5356583020_A NARRAYS-5356583020_A ARRAY_STDEV-5356583020_A BEAD_STDEV-5356583020_A Avg_NBEADS-5356583020_A Detection-5356583020_A
HBG2 TIPRV4rkp9ekgjrh6I 42501.53 42501.53 42501.53 1 NaN 15260.67 23 1
UBB oV5F615l4qunvD30tM 40860.58 40860.58 40860.58 1 NaN 10581.83 25 1
EEF1A1 97viJ90hzUX_rZ.nvY 40495.68 40495.68 40495.68 1 NaN 12339.05 30 1
ADD COMMENTlink modified 2.2 years ago by Mark Cowley400 • written 2.2 years ago by willj30
3
gravatar for Mark Cowley
2.2 years ago by
Mark Cowley400
Australia
Mark Cowley400 wrote:

Hi Bill,

My goal for lumidat was to recapitulate the behaviour of GenomeStudio, for better or for worse. As noted by Gordon, and as you can see below in the code, the actual behaviour of GS contradicted the software manual. I like Gordon's idea of detecting & converting them to proper p-values. I usually used lumi for downstream processing, and felt that lumi handled the detection p-values appropriately produced by lumidat. For use with the limma pipeline, perhaps 1-p is more appropriate. I'm afraid that level of nuance has escaped me after so many years of looking at this code.

Here's an excerpt from the underling Java code:

    /**
     * According to the GenomeStudio manual, Detection p-value = 1-R/N,
     * where R is the rank of the gene signal relative to negative controls and N is number of negative controls.
     * if intensity < all negControls, R=0, p=1
     * else if intensity > all negControls, R=N, p=0
     * else R = sum(negControls < intensity). in other words, R is the 0-based index that it would have been, if it was in the vector
     *
     * However, in comparison to GenomeStudio version 1.8.0 actual output, it's clear that for the Reporter probes,
     * P=R/N; ie higher expressed genes have P=1; very lowly expressed genes have P=0.
     *
     * Since this function aims to reproduce the ACTUAL GenomeStudio output, we write out P=R/N
     *
     * References:
     * Chapter 4, Page 106 (page 122/186 in PDF) GenomeStudio_GX_Module_v1.0_UG_11319121_RevA_tcm240-21462.pdf
     *
     * @param R the rank of the gene
     * @param N the number of negative controls
     * @return The Pvalue of a gene with rank R, relative to the N negative controls.
     */
    public static float getPValue(int R, int N) {
        float P = 1.0f - ((float)R / (float)N);
        P = 1 - P; // In comparing to GenomeStudio output, it's clear that they've use P=R/N, so highly expressed genes have P=1
        return (P);
    }

best regards,

Mark

ADD COMMENTlink modified 2.2 years ago by Gordon Smyth32k • written 2.2 years ago by Mark Cowley400
1

Mark, thanks, that makes it clear.

limma will work fine with Detection as either p or 1-p. I guess that lumi will also.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Gordon Smyth32k

Mark - thanks very much for responding to my mail! And thanks to Gordon and you both for advice on how this will work with limma and lumi.

ADD REPLYlink written 2.2 years ago by willj30
4
gravatar for Gordon Smyth
2.2 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

lumidat is not a Bioconductor package, so this is not the place to ask questions about it.

GenomeStudio is not consistent about how it outputs the Detection column. Some versions of GenomeStudio have written a p-value in this column, and some write 1-p instead.

My lab uses the read.idat() function in the limma package to read Illumina idat files. This feeds straight into the neqc() function for background correcting and normalizing the expression values. These functions automatically check for how the detection p-value has been represented in the idat file. They automatically convert back to proper p-values if 1-p has been given.

I don't know about lumidat, but it is possible that they have chosen to always convert to 1-p instead of p. You would have to write to the lumidat authors to find out.

neqc() doesn't damp down the signal as much as the lumi procedures do, see:

  http://www.ncbi.nlm.nih.gov/pubmed/20929874

Edit: limma::read.idat() and lumidat::read.ilmn.idat() seem to be similar in intention, and both should work fine. read.idat() will pass through the Detection column as found in the idat file, whereas read.ilmn.idat() resets the Detection column. Either should work fine with neqc() and downstream functions.

BTW, neqc() has the nice feature that it figures out what the negative control values must have been from the detection p-values, then uses the negative control values to set the parameters of the normexp background correction step.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Gordon Smyth32k

Dear Gordon, thanks very much for explaining this about the GenomeStudio output and how it is interpreted by read.idat(), and for your advice on using your neqc function. I will try to find a way to contact the lumidat authors (edit: Mark Cowley commented below), but I do suggest that it was still good to post the question here because it is relevant to downstream analyses in Bioconductor packages like limma and lumi.

 

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by willj30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 148 users visited in the last hour