I'm trying to figure out which is the best model to go with in an experiment, so I'd appreciate any advice people can give!
I have 450K experiment with ~200 samples. Samples are split by disease type (A, B, C, and D). I'm aiming to find probes that correlate with age.
So far I have 2 ways I can approach this problem. I'm interested in looking at each disease type against age, as well as grouping them to look for probes that correlate between two or more disease types (A and B together for example).
Subset the input matrix for just the disease types I'm interested in i.e. A, or A and B. Use the model ~Age and use topTable to look at the second coefficient (the first being the intercept). Advantages: Simple, relatively easy to understand and trace back. Disadvantage: a new model has to be made for each test I want to carry out.
Throw all samples in, and use an interaction model ~SampleType:Age. I can then look at each individual SampleType's correlations (topTable the relevant coefficient). Then using a contrast matrix A+B+C+D/4 for example, would give me probes that on average have similar gradients throughout all sampleTypes? (or should I be using an none-intercept model for this?). Do the P Values of a regression test against a continuous variable represent the minimal amount of variance around the fit line?
Are my assumptions correct?
Is there a better approach that I've missed? (If not, which method would you recommend I go with?)