I'm not familiar with how the statistics with RMA works.
So my question is: Can I apply RMA (safely) on each disease group separately (rma on case, rma on control), and combine them afterword? Or would an algorithm that accounts for disease label differences (i.e. qsmooth) be a better option?
RMA is still by far the most accepted method of preprocessing Affymetrix data. Personally, I would use RMA and not give it a second thought. qsmooth might also be good but I am not familiar with it.
The papers you cite have not received a high level of acceptance. I have not read them in detail but I suspect that they are rediscovering the well-known fact that global normalization assumes that the majority of genes are not DE, and some other strategy may be required if you want to detect global expression changes in one direction.
Applying RMA on each disease group separately sounds very dangerous to me.
I'd like to add an addendum in case anyone else has this question.
Like you recommended, I did not do RMA on each disease case separately, but I tried to use qsmooth. In the RMA pipeline:
Background correction
Normalization
PM correction
Summarizaton
Log2 transformation
Item 2. is done with global quantile normalization. To see if qsmooth had a big effect, I ran this pipeline twice: once with global normalization on the PM probes and once with qsmooth on the PM probes in step 2. I compared their difference using the correlation of expression values for all samples across the two separate cases using diag(cor(exprs(global), exprs(qsmoothed)))
The range of correlation values for data when preprocessed with quantile normalization or qsmooth was 0.9999979-0.9999996.
So if your case and control phenotypes have small differences across a wide range of genes, such as complex neurodegenerative disease, then using qsmooth instead of standard global quantile normalization will cause negligible difference, so I would recommend against it in these cases
Just for the record, the log2 tranformation comes before summarization. I am less clear about PM correction, but I believe it comes before normalization if done at all.
There is no 'PM correction' done when running RMA, assuming that the OP means something like subtracting the MM probe values like they did with MAS5.0.
Checking from the source code for rma in affy, the "PM Correction" that was done equates to just taking the PM probe subset, no subtraction or anything like that. Is this correct? https://rdrr.io/bioc/affy/src/R/rma.R
Thanks for the recommendation Gordon,
I'd like to add an addendum in case anyone else has this question. Like you recommended, I did not do RMA on each disease case separately, but I tried to use qsmooth. In the RMA pipeline:
Item 2. is done with global quantile normalization. To see if qsmooth had a big effect, I ran this pipeline twice: once with global normalization on the PM probes and once with qsmooth on the PM probes in step 2. I compared their difference using the correlation of expression values for all samples across the two separate cases using
diag(cor(exprs(global), exprs(qsmoothed)))
The range of correlation values for data when preprocessed with quantile normalization or qsmooth was 0.9999979-0.9999996.
So if your case and control phenotypes have small differences across a wide range of genes, such as complex neurodegenerative disease, then using qsmooth instead of standard global quantile normalization will cause negligible difference, so I would recommend against it in these cases
Just for the record, the log2 tranformation comes before summarization. I am less clear about PM correction, but I believe it comes before normalization if done at all.
There is no 'PM correction' done when running RMA, assuming that the OP means something like subtracting the MM probe values like they did with MAS5.0.
Checking from the source code for rma in affy, the "PM Correction" that was done equates to just taking the PM probe subset, no subtraction or anything like that. Is this correct? https://rdrr.io/bioc/affy/src/R/rma.R
Yes. I wouldn't characterize simply extracting the PM probes as a 'correction', but that is what happens.