RMA on different disease labels
1
0
Entering edit mode
Joshua • 0
@f858240e
Last seen 10 weeks ago
United States

I'm working with HGU133Plus2 datasets, and I'm determining the best normalization procedures.

RMA seems to be the standard in literature, however some papers have been opting away from global normalization procedures:

I'm not familiar with how the statistics with RMA works.

So my question is: Can I apply RMA (safely) on each disease group separately (rma on case, rma on control), and combine them afterword? Or would an algorithm that accounts for disease label differences (i.e. qsmooth) be a better option?

Thanks for any help

rma background microarray normalization • 300 views
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

RMA is still by far the most accepted method of preprocessing Affymetrix data. Personally, I would use RMA and not give it a second thought. qsmooth might also be good but I am not familiar with it.

The papers you cite have not received a high level of acceptance. I have not read them in detail but I suspect that they are rediscovering the well-known fact that global normalization assumes that the majority of genes are not DE, and some other strategy may be required if you want to detect global expression changes in one direction.

Applying RMA on each disease group separately sounds very dangerous to me.

0
Entering edit mode

Thanks for the recommendation Gordon,

I'd like to add an addendum in case anyone else has this question. Like you recommended, I did not do RMA on each disease case separately, but I tried to use qsmooth. In the RMA pipeline:

1. Background correction
2. Normalization
3. PM correction
4. Summarizaton
5. Log2 transformation

Item 2. is done with global quantile normalization. To see if qsmooth had a big effect, I ran this pipeline twice: once with global normalization on the PM probes and once with qsmooth on the PM probes in step 2. I compared their difference using the correlation of expression values for all samples across the two separate cases using diag(cor(exprs(global), exprs(qsmoothed)))

The range of correlation values for data when preprocessed with quantile normalization or qsmooth was 0.9999979-0.9999996.

So if your case and control phenotypes have small differences across a wide range of genes, such as complex neurodegenerative disease, then using qsmooth instead of standard global quantile normalization will cause negligible difference, so I would recommend against it in these cases

1
Entering edit mode

Just for the record, the log2 tranformation comes before summarization. I am less clear about PM correction, but I believe it comes before normalization if done at all.

0
Entering edit mode

There is no 'PM correction' done when running RMA, assuming that the OP means something like subtracting the MM probe values like they did with MAS5.0.

0
Entering edit mode

Checking from the source code for rma in affy, the "PM Correction" that was done equates to just taking the PM probe subset, no subtraction or anything like that. Is this correct? https://rdrr.io/bioc/affy/src/R/rma.R

1
Entering edit mode

Yes. I wouldn't characterize simply extracting the PM probes as a 'correction', but that is what happens.