Entering edit mode
aec
▴
90
@aec-9409
Last seen 4.5 years ago
dear all,
I have more than 500 RNA-seq samples and have to compare cases vs controls. I first run SVA to remove unknown variation and found >500 surrogate variables. Is a good practice to perform a LRT test with deseq2 where full model =~case+SV1+SV2+SVn and reduced model=~case to know how many surrogate variables should I add in order to avoid overfitting? The idea would be to first add SV1 to the full model, then add SV1+SV2, then SV1+SV2+SV3 and so on, and stop if the number of differentially expressed genes diminishes with respect to the previous model.
I think something went wrong with your estimation of SVs. Can you post all your code and sessionInfo()
What do you get with the default method "be"?
I'll wait to see Jeff's answer, but this seems to be an issue.
I typically use a small number of SVs. Even with hundreds of samples, I usually find that 1-10 SVs or RUV factors is sufficient to capture technical variance.
thanks Michael.