Edited the original question to make it a little clearer:
I have a gene expression data set of 165 samples and different subject/ Time-point at which the sample was collected ( after a certain dose of drug and days), source of sample, whether it was CD33/34 enriched or not.
A subset of this actual data looks like this:
I concatenated time point, sample source and sample type as the grouping factor:
design <- model.matrix(~0+TimePoint+Source+Sample.Type)
My question is that not all subjects have the same matching time point collected or even same sample source. For example, I wanted to look at the differences of CD33/34+ PBMC samples between samples collected after Dose1Day8 and after Dose1Day1. There are 20 CD33/34+ PBMC samples from Dose1Day1, only 7 of those from Dose1Day8.
Furthermore, only 5 subjects are present in both time points.
Do I only look at those 5 subjects (making the comparison balanced and 10 samples total)? If that's the case then I'll have to filter the samples first instead of fitting all the data to one glm model?
Or, do I look at all the available samples within that group and make the contrast looks something like :
Group = factor(sampleInfoAll$Group) design <- model.matrix(~0+Group) my.contrasts = makeContrasts(Dose1Day8.PBMC.CD33_34pos_vs_Dose1Day1.PBMC.CD33_34pos=Dose1Day8.PBMC.CD33_34pos-Dose1Day1.PBMC.CD33_34pos,levels=design)