5 months ago by
United Kingdom / London / Francis Crick Institute
This might be worth getting a local statistician on board for, because there are multiple subtly different approaches depending on precisely what you mean. For example, in question 1, let's just take the case of CSF, so we subset the data down to 24hr & CSF. We could then take a model ~ severityFactor, where levels(severityFactor)=c("ctrl", "A","B","C"). We could do an LRT against a reduced of ~1, and then take the significant genes from that, put them into a heatmap, and the see the relevant sizes of the clusters that correspond to patterns you're looking for in question 2. Or we could do individual Wald tests on the original model, and combine the genesets to look at Q2. Or we could take the model ~as.integer(severityFactor), and look at the Wald test to look for linear trends (on a log-expression scale!) going with severity. To address the 'AND/OR' part - you could take the other subset of the data, or you could nest the model within the CSFBlood factor (at which point you'd want to model the patient effect).
Again, question 3 is ambiguous as to the meaning of "between A, B, C and Control" - do you mean between any of severities, or between pairs of severities, or between each of the injuries vs the no-injury control. I'd advise limiting yourself to the control and C samples for the time being, as there is only one interpretation there, and one would expect it to be the biggest. You'd then want an LRT of ~ Severity + TimeFactor + Severity:TimeFactor, and compare it to ~Severity +Time, and that would give you a list of genes where the time-profile isn't consistent across the severities. Or you might want to include a Patient main effect in both models, to adjust for different patient baselines. If you used a numeric version of time, you could test for polynomial trends of log-expression against time, ...
Q4, yes you can keep all samples in the same experiment, in which case you'll need to decide whether you're pooling the tissues (so no need for a term in the model), analysing them separately (nest the models within tissue), looking for commonality against different baselines (include a main effect for tissue), or looking for differences in response between the tissues (include an interaction between tissue and the response you're looking for).
You can treat severity as a 'dose', but you will need to model some quantitative relationship between the progression. The 'as.integer' approach i've alluded to above is an arbitrary way of doing this. Same for incorporating time - you need to decide whether you're looking at pairs of time-points, a specific time-trend, or just arbitrary time profiles (so a Wald test on a factor; modelling time as a numeric, with possibly quadratic terms in your model; or an LRT test on a factor, respectively)