I have been analysing protein array data with hundreds and thousands of proteins using Limma in R.
For normalisation I have been using the following:
y <- normalizeBetweenArrays(log2(exprs), method="quantile")
followed by box plots and density plots for QC. Followed by model fitting for differential expression analysis in Limma.
However we then chose the most promising 35 proteins and had a "focussed" array synthesised. Here we chose the 35 proteins that were highest in patients vs controls and ran them for many more patients and their controls. When we got the data back I had a think about the analysis and normalising between arrays may be fine when there are many random proteins to bring between array intensities to similar levels.
However it seems to me (I am relatively new to array analysis so I may be wrong) that if we have specifically chosen proteins based on the low expression in some samples and high expression in other samples that this normalisation would not be valid ,as the assumption for this normalisation is that genes are expected to have low variation. Is this correct?
If so what kind of normalisation is more appropriate for this type of analysis?
Any guidance much appreciated.
EDIT: I have been toying with the idea of using:
y <- normalizeBetweenArrays(log2(exprs), method="cyclicloess")
Which may be more appropriate?
EDIT2: The array was a Protoarray and the analysis has actually already been done by someone from the provider of the service. However I managed to repeat their analysis getting the same values in Limma with the quantile normalisation mentioned above. The issue is I am questioning if they simply ran a standard analysis pipeline for large arrays not putting much thought into the different design of the array
Note: all values are Log2 transformations of the fluorescence data
Boxplots of data pre normalisation:
Boxplots of data post quantile normalisation:
Boxplots of data post cyclicloess normalisation: