Technical normalization of single channel array data with Limma?
1
0
Entering edit mode
@darioveneziano-8833
Last seen 9.1 years ago
United States

Greetings,

As I have already posted in other questions I have asked recently, I'm working with single-channel array data on small ncRNAs detected with a custom array analyzed through GenePix. I have several gpr files that I've managed to process with the limma package: I've acquired the data, performed background correction, and now I need to normalize before I proceed with differential expression analysis. Ergo my question: is it possible to take into account the control probes and perform a "within-array" normalization before the normalizeBetweenArrays() function? 

Each array possesses positive and negative controls. These last ones are basically randomers that should not be expressed. So what I guess I'm trying to understand is if it is possible/advisable to perform a sort of "technical" normalization on each array on the base of these negative controls, before performing between-array normalization. 

Forgive the banality of my question (if that's the case), I am aware that within-array normalization for single channel data is not applied, but I'm a newbie at this and I feel the custom array I'm dealing with puts me in a particular context which I'm not sure how to handle best. I would like to be able to take into account (normalize on) the negative controls (randomers) before normalizing between arrays. Is this possible? If so, how advisable is it? When would these controls be taken into account, if ever, along the limma pipeline? Am I missing something?

Thanks so much for any help/suggestion/elucidation you can provide!

~Dario

 

limma single genepix normalization technical • 2.1k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 7 minutes ago
WEHI, Melbourne, Australia

Personally, I don't think that trying to "normalize" on control probes within each array is likely to a good idea. I don't do that myself for any type of single channel array. I do recommend the use of control probes for between array normalization, if there are enough controls and they cover a wide enough range of intensities, see below.

It is possible to use negative control probes for background correction. This however requires that the negative controls behave truly like probes for unexpressed transcripts, which is not always the case. (It works well for Illumina arrays, but they are an exception.)

As a first step, I think it would be a good idea to at least try a reasonably standard normalization pipeline (eg with normexp background correction and quantile normalization) and see how the results look. You could also try neqc(), which stands for "normexp background correction and quantile normalization using control probes".

As a second possibility, I did try hard to do a good job of background correcting and normalizing Affymetrix microRNA arrays some years ago. We eventually recommended cyclic loess normalization with upweighting of selected control probes, see:

  http://www.ncbi.nlm.nih.gov/pubmed/23709276

This is what I would generally recommend for small RNA arrays with a wide range of positive and negative controls. However my overall conclusion was that normalization is hard for small RNA microarrays, so it is probably best to forget microarrays for small RNAs. Better to use RNA-seq instead.

So there are a few major suggestions. Normexp or neqc for background correction and quantile or cyclic loess normalization. Cyclic loess can use control probes in a very flexible ways. Be aware though that, if your arrays are so special that they need a custom normalization method, then we won't be able to design it for you, not having intimate knowledge of your arrays.

Note: Originally I assumed your arrays had just a handful of control probes. You later explained that more than a third of all the probes on your arrays are control probes. I have therefore edited my answer and widened my suggestions to include cyclic loess normalization.

ADD COMMENT
0
Entering edit mode

Thank you for all your advice Prof. Smyth.

The randomers used as negative controls in this array, showed to be not that random at times! Thus, some of them (a small minority, yet consistent!) have unexpectedly high counts. For this reason, I don't believe they would serve well in background correction. Unless there were a way to pin point and exclude that minority, perform neqc() and see what comes out..

At this point, I'd rather stick with the standard normalization pipeline you suggested. 

Also, I did not mention that the number of ncRNA molecules we intend to analyze with this custom microarray experiment is barely over a 100. So very small number of spots for them and about half as many randomers. What would you suggest in this case? I ask because, as you've rightly pointed out in another post (https://stat.ethz.ch/pipermail/bioconductor/2013-November/056090.html), normalizing could be trickier..  

Thanks again for your help.

~Dario

ADD REPLY
1
Entering edit mode

Unless there were a way to pin point and exclude that minority, perform neqc() and see what comes out.

That's exactly what the robust=TRUE option to neqc() does. It is designed to ignore a minority of negative control probes with overly high intensities.

Whether this will work well for your arrays, of course, I can't say.

ADD REPLY
0
Entering edit mode

Thank you for the additional suggestions and references. I will take the time to study them thoroughly. Nevertheless, I'm running on a tight schedule and I need to find an "optimal" solution given such circumstances. 

At this point I'm tempted to try the standard pipeline and compare it to the neqc one. 

I realize that the normexp + cyclic loess normalization, as suggested in your paper, would be the best approach but I wouldn't honestly know where to start, as I'd need some general code example on how to upweight the selected control probes and apply the cyclic loess normalization with the limma package (I don't seem to find anything like this in the userguide).

Last but not least, as far as intimate knowledge about the arrays, I thought I'd add something I've just realized it's "pretty" important: my arrays have triplicate spots per gene. Generally, each array is a collection of 16 12X12 subarrays, with 148 distinct small RNAs I wish to investigate, 38 distinct negative controls and 6 distinct positive controls. The rest is blanks. All genes are present in each array in triplicate spots. Given this very important piece of info (which I stupidly neglected until now), I suppose I must take this replication into account after background correction. I've read the example in section 16.4 of the limma userguide, but that's for dual-colored arrays. How could I integrate the need to compensate for triplication when normalizing between arrays (whether with quantile - considering or not the negative controls - or cyclicloess) for these single-channel arrays and use the best linear model to perform differential expression analysis?

I know it's gotten way complicated, but I'm a total newbie and yes I'm learning a lot, but at the same time I'm at a loss in front of such particular context. Thank you so much for your patience and all the help you can give me, I truly appreciate it.

~Dario

ADD REPLY

Login before adding your answer.

Traffic: 723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6