Question

Illumina HT-12.v4: Averaging of technical replicates

0

Entering edit mode

Eleni Christodoulou ▴ 150

@eleni-christodoulou-2653

Last seen 7.4 years ago

Singapore

Hello Bioconductor people,

I am analyzing a microarray gene expression dataset generated with the Illumina human ht-12 v4 platform, which contains several technical and biological replicates. I first load the raw image data to Genome Studio and calculate the respective group and sample matrices. I am interested in the probe-level measurements. I extract the raw data and the use Bioconductor's lumi and limma packages for pre-processing and differential gene analysis respectively.

My question is whether i) I shall average the technical replicates in Genome Studio and use Group_Probe_Profile as input to lumi and proceed to log-transformation and normalization with this table OR ii) load the Sample_Probe_Profile in lumi, proceed to log-transformation and normalization and average the technical replicates using aveArrays from limma?

In brief: shall I average the technical replicates in Genome Studio or in R? What do you usually do?

Thank you very much!

Eleni

illumina human ht-12 v4 technical replicates • 3.2k views

ADD COMMENT • link updated 10.0 years ago by Gordon Smyth 53k • written 10.0 years ago by Eleni Christodoulou ▴ 150

score 1 · Answer 1 · 2016-01-08

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Definitely average in R rather than in Genome Studio.

Do you need to average the technical replicates at all? In some cases it is better to preserve them in the analysis and instead link them using duplicateCorrelation().

Have you tried using neqc() for the preprocessing?

ADD COMMENT • link 10.0 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Gordon,

Thank you very much for your prompt and to the point response. I have not tried using neqc() for preprocessing...What are the advantages of using it? I will re-think whether I need to average at all, thank you so much for the comment!

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150

0

Entering edit mode

neqc() is fast and easy and makes good use of control probes. It gives excellent noise control like vst does, but doesn't attenuate the signal so much. See:

http://nar.oxfordjournals.org/content/38/22/e204

for a comparison of the different Illumina preprocessing methods.

ADD REPLY • link 10.0 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you Gordon for the information. It seems interesting...I have been following the procedure that lumi user guide (from bioconductor) suggests. Neqc() is not stated in there...To make things clear in my mind, do you suggest

1) Load the Sample_Probe_Profile that Genome Studio exports

2) Perform neqc() instead of log2 or vst, and

3) Proceed to normalization (i.e. quantile)?

Thank you very much,

Eleni

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150

0

Entering edit mode

Just follow the case study in Section 17.3 of the limma User's Guide. neqc() already does normalization, so there is no need for step 3. Ideally you will have the control_probe_profile file as well, but neqc() can work without it.

ADD REPLY • link 10.0 years ago Gordon Smyth 53k

0

Entering edit mode

Great, thank you very much! I don't have the control_probe_profile file but,as you state, I can do without it.

Best wishes,

Eleni

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150

0

Entering edit mode

Since you are running Genome Studio yourself, you must be able to export the control probe profiles.

Nevertheless, if you can't figure out how to do that, neqc() will infer what the control profiles must have been from the detection p-values.

ADD REPLY • link 10.0 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150

0

Entering edit mode

Hmm...I realized I need to pull together this dataset with an older one. So I think I will need to first pull and then normalize, right? So, just to make sure, I extract the expressed y matrix from each dataset, extract the $E component and then bind the two tables together...then apply neqc() on the combined table. Is this correct? I am sorry if this is straightforward, I just wanted to be sure.

Thank you very much,

Eleni

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150

0

Entering edit mode

No, you can't simply extract an $E component from a dataset and then use neqc(), because the $E component doesn't contain any information about control probes.

Ideally you would read in all the combined data again, both new and old data, and preprocess it all together from scratch.

ADD REPLY • link 10.0 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you very much Gordon! Good I asked!

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150

0

Entering edit mode

Dear Gordon,

I am sorry to come back to this again but something is not clear in my mind. After applying neqc() and keep only the truly expressed probes, we have the normalized and expressed data in the $E component, right? Can I use this $E component only for creation of the design matrix and fit a model?

Thank you very much

ADD REPLY • link 10.0 years ago Eleni Christodoulou ▴ 150