Hi, all/Micheal Love,
Is there a way in DESeq2 package to hep with this kind of problems? For example, I may be able to think of tens of possible variables that may affect gene expressions(Genotypes of a couple of genes, genders, age, PMI, RIN, RIN^2, mapping rate, batches....). Obviously, I should only include a limited number of those variables.
And how could I choose these variables? How the number of my samples would restrain my selections, in order to make a robust estimation?
From the literature, I saw someone using the PC1 as the corresponding factor. Then ANOVA model could be applied to testing the contribution of each possible variables to PC1. This is reasonable, but with obvious limitations. Especially, when sometimes you see the PC1 is mainly dominated by a single factor(such as Batch), then (PC2,PC3, etc) may also be used to identify other factors.
Any suggestions? (I know this is not a pure DESeq2 package problem, but I guess Micheal would have some clue about this:-) )
Thanks in advance,