Dear community,
I am analyzing microarray data of patients followed from birth up to an age of 3 years. The samples were taken at various age points for each child.
I want to identify non constant gene expression over time within these children, so genes that are differentially expressed over time.
For this I have applied a mixed effects model with splines(df=3) in limma adjusting for sex as a covariate.
X <- ns(age, df=3)
The topTable function results in various differentially expressed genes, I have set the coef parameter to coef=1:3, including the columns X1,X2 X3.
My question is how are the log2foldchanges in the result table of topTable() interpreted?
Many thanks in advance
Are you sure that
coef=1:3
is correct? Usuallycoef=1
is the intercept.Thanks for your help! Actually I am not so sure, as from my background I am not particular strong in mathematics. I have fitted the model without an intercept and used the columns of the output matrix of the ns() function as coef=1:3
No, that is wrong. The
ns()
function assumes that an intercept is included in the model, as is the default in R. Why would you remove it?The default in the
ns()
function isintercept = FALSE
, but the lm function adds than an intercept, right ? I played a bit with the example that was given by James MacDonald to get a better understanding and feeling for the analysis. Would the following approach be correct ?In this case what would the lfc threshold mean ? It is not really interpretable as far as I have understood from the answer below because it refers to coefficients in the spline model and not the difference between two conditions ?
Would it be reasonable to take the maximum log2 foldchange within each predicted trajectory as a measure of how strong a gene is differentially expressed and also as a measure to cut off genes from the list that show very little variation in gene expression over time ?
You mean maximum logFC between any pair of times? Yes, you could do that. I do not generally recommend ad hoc cutoffs like that however. Usually it is better to simply rank the genes by p-value. The empirical Bayes prcoedure of limma already gives you a lot of protection against identifying small fold changes as DE so adding an ad hoc fold-change cutoff is usually unnecessary.
Thanks. If I consider all significant genes without any maximum logFC criteria, the gene with the lowest expression change over time, has about 4 % percent maximum change over time ~0.06 logFC. Can this still be physiological relevant ?
I have never seen such a small logFC coming up as significant in any of my own analyses. You must have a large number of samples and, if you do, then applying a max logFC cutoff could be reasonable. Unfortunately we don't have a TREAT method for spline trends.
Thanks, but what is the lowest that you have seen in such circumstances ? And did you have systematically calculated it actually or just read off for some genes from the predicted curves? So far I have only worked with treatment experiments, and I know in these cases the foldchanges are often enormous, but I am lacking any feeling for genes that are just observed over age without any treatment. Regarding the change I don't know if a 4 % change can have physiological effects. If I think of a guy who is 180 cm and a guy that is 187.2 cm, the difference is quite obvious, I don't know how much this translates to the world of transcriptomics. The biggest lFC in my data is around 1, even this I consider quite low from what I have observed in previous treatment experiments