Hello,
I recently had a question regarding repeated measures RNA-seq analysis. This has been thoroughly answered through an extension of the edgeR manual section 3.5. However this has lead to me towards another question as I attempted to extend such concepts to another experiment wherein the sample size in each group is different. For example, here is a dataframe modified from the edgeR user manual concerning between and within subjects
comparisons (Section 3.5) and another containing specific times points to explain my point, both dataframes re-numbered as recommended by the manual.
> targets
Disease Patient Treatment
1 Healthy 1 None
2 Healthy 1 Hormone
3 Healthy 2 None
4 Healthy 2 Hormone
5 Healthy 3 None
6 Healthy 3 Hormone
7 Disease1 1 None
8 Disease1 1 Hormone
9 Disease1 2 None
10 Disease1 2 Hormone
11 Disease2 1 None
12 Disease2 1 Hormone
13 Disease2 2 None
14 Disease2 2 Hormone
15 Disease2 3 None
16 Disease2 3 Hormone
> sample_data
Condition Subject Time
1 control 1 0hr
2 control 1 1hr
3 control 1 2hr
4 control 2 0hr
5 control 2 1hr
6 control 2 2hr
7 control 3 0hr
8 control 3 1hr
9 control 3 2hr
10 control 4 0hr
11 control 4 1hr
12 control 4 2hr
13 Disease 1 0hr
14 Disease 1 1hr
15 Disease 1 2hr
16 Disease 2 0hr
17 Disease 2 1hr
18 Disease 2 2hr
I have read the initial posting that lead to this section of the manual and it said to drop the samples that don't have equal numbers. Now this doesn't seem to be a big deal if only dropping from one group a sample or two but could potentially be a problem such as above where dropping four or six samples seems more of a sacrifice. I begin to think of experiments
which (assuming repeated/dependent samples) group numbers very more significantly as a result of difficulty acquiring samples. Are there any recommendations from the community regarding such a situation? All I have found assumes that the samples within each group are equal.
Regards,
--
Charles Determan
Integrated Biosciences PhD Candidate
University of Minnesota

Gordon,
The reason I ask is because I get an error if I attempt to run a design formula of (~group + group:subject + group:time) and I run estimateGLMCommonDisp(dge, design) I get the error:
The mailing list post I am referring to, with the same error, is at the following link:
https://stat.ethz.ch/pipermail/bioconductor/2012-November/049055.html
Am I simply writing the design formula incorrectly to still account for the subject variation?
Regards,
Charles
Dear Charles,
The link you give is to a user question. I replied to that post explaining how to solve the problem without removing samples:
https://stat.ethz.ch/pipermail/bioconductor/2012-November/049087.html
The advice that I gave there applies also to your data.
The problem is that the model.matrix() function in R adds superfluous columns to the design matrix that have to removed manually. In your case you have to remove the design columns for disease patients 3 and 4, because there are no such patients. It is beyond the scope of the edgeR package to rewrite the model.matrix() function, which is maintained by R core, so I can only advise on work-arounds.
Best wishes
Gordon
My apologies, I feel rather silly that I misinterpreted your answer. I mistakenly read it as removing samples from the dataset and not from the design matrix. Thank you for clearing up that matter. You have answered my question completely.
Regards,
Charles