Dear community,
I am analyzing microarray data of patients followed from birth up to an age of 3 years. The samples were taken at various age points for each child.
I want to identify non constant gene expression over time within these children, so genes that are differentially expressed over time.
For this I have applied a mixed effects model with splines(df=3) in limma adjusting for sex as a covariate.
X <- ns(age, df=3)
The topTable function results in various differentially expressed genes, I have set the coef parameter to coef=1:3, including the columns X1,X2 X3.
My question is how are the log2foldchanges in the result table of topTable() interpreted?
Many thanks in advance


Are you sure that
coef=1:3is correct? Usuallycoef=1is the intercept.Thanks for your help! Actually I am not so sure, as from my background I am not particular strong in mathematics. I have fitted the model without an intercept and used the columns of the output matrix of the ns() function as coef=1:3
No, that is wrong. The
ns()function assumes that an intercept is included in the model, as is the default in R. Why would you remove it?The default in the
ns()function isintercept = FALSE, but the lm function adds than an intercept, right ? I played a bit with the example that was given by James MacDonald to get a better understanding and feeling for the analysis. Would the following approach be correct ?In this case what would the lfc threshold mean ? It is not really interpretable as far as I have understood from the answer below because it refers to coefficients in the spline model and not the difference between two conditions ?
Would it be reasonable to take the maximum log2 foldchange within each predicted trajectory as a measure of how strong a gene is differentially expressed and also as a measure to cut off genes from the list that show very little variation in gene expression over time ?
You mean maximum logFC between any pair of times? Yes, you could do that. I do not generally recommend ad hoc cutoffs like that however. Usually it is better to simply rank the genes by p-value. The empirical Bayes prcoedure of limma already gives you a lot of protection against identifying small fold changes as DE so adding an ad hoc fold-change cutoff is usually unnecessary.
Thanks. If I consider all significant genes without any maximum logFC criteria, the gene with the lowest expression change over time, has about 4 % percent maximum change over time ~0.06 logFC. Can this still be physiological relevant ?
I have never seen such a small logFC coming up as significant in any of my own analyses. You must have a large number of samples and, if you do, then applying a max logFC cutoff could be reasonable. Unfortunately we don't have a TREAT method for spline trends.
Thanks, but what is the lowest that you have seen in such circumstances ? And did you have systematically calculated it actually or just read off for some genes from the predicted curves? So far I have only worked with treatment experiments, and I know in these cases the foldchanges are often enormous, but I am lacking any feeling for genes that are just observed over age without any treatment. Regarding the change I don't know if a 4 % change can have physiological effects. If I think of a guy who is 180 cm and a guy that is 187.2 cm, the difference is quite obvious, I don't know how much this translates to the world of transcriptomics. The biggest lFC in my data is around 1, even this I consider quite low from what I have observed in previous treatment experiments