Question

Array Normalization on focussed array in Limma using R

0

Entering edit mode

reubenmcgregor88 • 0

@reubenmcgregor88-13722

Last seen 3.8 years ago

I have been analysing protein array data with hundreds and thousands of proteins using Limma in R.

For normalisation I have been using the following:

y <- normalizeBetweenArrays(log2(exprs), method="quantile")

followed by box plots and density plots for QC. Followed by model fitting for differential expression analysis in Limma.

However we then chose the most promising 35 proteins and had a "focussed" array synthesised. Here we chose the 35 proteins that were highest in patients vs controls and ran them for many more patients and their controls. When we got the data back I had a think about the analysis and normalising between arrays may be fine when there are many random proteins to bring between array intensities to similar levels.

However it seems to me (I am relatively new to array analysis so I may be wrong) that if we have specifically chosen proteins based on the low expression in some samples and high expression in other samples that this normalisation would not be valid ,as the assumption for this normalisation is that genes are expected to have low variation. Is this correct?

If so what kind of normalisation is more appropriate for this type of analysis?

Any guidance much appreciated.

EDIT: I have been toying with the idea of using:

y <- normalizeBetweenArrays(log2(exprs), method="cyclicloess")

Which may be more appropriate?

EDIT2: The array was a Protoarray and the analysis has actually already been done by someone from the provider of the service. However I managed to repeat their analysis getting the same values in Limma with the quantile normalisation mentioned above. The issue is I am questioning if they simply ran a standard analysis pipeline for large arrays not putting much thought into the different design of the array

Note: all values are Log2 transformations of the fluorescence data

Boxplots of data pre normalisation: enter image description here

Boxplots of data post quantile normalisation: enter image description here

Boxplots of data post cyclicloess normalisation: enter image description here

limma microarray r • 1.9k views

ADD COMMENT • link updated 5.9 years ago by Gordon Smyth 51k • written 5.9 years ago by reubenmcgregor88 • 0

score 2 · Accepted Answer · 2019-01-09

You are right to recognise that there is a problem here, because the focussed array design is entirely confounded with DE for patients vs controls.

When we have made focused arrays in the past, we included control probes in order to permit normalization, see

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2007-8-1-r2

Without control probes, there is frankly not much you can do. Switching to cyclic loess normalization will be slightly better because it is more tolerant of DE all in one direction than quantile is. Other than that, you just have to proceed and recognize that the patient vs control log fold changes will be under-estimated (less positive or more negative) because the changes will be partly normalized out.

The boxplots that you've done don't really help. They don't allow you to see the problem. An MA or MD plot comparing patient samples to control samples might show an increasing trend, which would be symptomatic of the problem.