Dear Prof. Smyth and limma experts,
I would like to clarify if my design and logic in applying limma is correct. I have a timecourse where the blood was taken from participants before vaccination (Day0), after first vaccination (Day1) and after second vaccination eight weeks later (Day57). These participants were subsequently challenged with the infectious pathogen and monitored.
Some individuals were protected while others developed the disease (and treated). Our aim is to look for genes correlated with the protection status.
If we subset the dataset for each timepoint and analyzed for protection status, only a couple of genes are significant. However, when we analyze all time points together with adjustment for timepoint, we obtain many more significant genes. The phenotype data looks something like this:
SubjectID Timepoint Status Age Sex Batch
1001 Day0 NotProtected 20 M B1
1001 Day1 NotProtected 20 M B2
1001 Day57 NotProtected 20 M B3
1015 Day0 Protected 35 F B2
1015 Day1 Protected 35 F B3
1015 Day57 Protected 35 F B1
... ... ... ... ... ....
And here is the model matrix and limma call for analyzing all time points jointly.
mm <- model.matrix( ~ -1 + Age + Sex + Batch + Status + Timepoint, data=pheno )
dupCor <- duplicateCorrelation( EXPRS, design=mm, block=pheno$SubjectID )$consensus
fit <- lmFit( EXPRS, design=mm, block=pheno$Subject, correlation=dupCor )
I am slightly concerned that we are violating some basic assumption here. Or is the blocking for subject and duplicateCorrelation sufficient? Any advice is greatly appreciated. Many thanks.
Regards, Adai
<font color="#0782c1">--</font>
Adaikalavan Ramasamy
Senior Leadership Fellow in Bioinformatics
Head of the Transcriptomics Core Facility
The Jenner Institute, University of Oxford
Roosevelt Drive, Oxford OX3 7DQ
Email: adaikalavan.ramasamy@ndm.ox.ac.uk
Office: 01865 617 100
Mob: 07906 308 465
Dear Aaron,
Thank you very much for the useful answer. Apologies for the delay. We included the Age covariate in to account for the fact there was some weak correlation between age and protection status (probably due to small sampling). You are right that it is implicit when blocking for SubjectID but I am not sure how best to account for the age variation.
Thank you for conforming duplicateCorrelation is probably the best way to go and advice on subsetting.
Regards, Adai