Dear Community,
My current research involves developing a ML-based predictor model, for which I have chosen DESeq2 for normalization. I would appreciate any advice regarding some challenges I am facing.
In my study, I trained the model on RNA from blood samples of healthy donors (which has been validated by an additional healthy cohort). I then tested the model using RNA from virally infected patients to quantify the degree of change"
For normalization, I used all the samples (both training and test) together in order to account for global RNA perturbations caused by infection, as suggested by our prior studies. Given this, I used all genes as "control genes," and normalizing only the healthy donors wasn't a viable option for me.
However, I am now encountering issues with using external datasets. Normalizing these datasets, with their own RNA compositions, separately for the test seems nonsensical. Alternatively, combining them with my current dataset and redoing the normalization would change the model (both for this and future data).
I would be very grateful for any suggestions to resolve this problem.
Cheers,
Alan
Thanks for the advice Michael. I will try this with my future test datasets.