I recently had a question regarding repeated measures RNA-seq analysis. This has been thoroughly answered through an extension of the edgeR manual section 3.5. However this has lead to me towards another question as I attempted to extend such concepts to another experiment wherein the sample size in each group is different. For example, here is a dataframe modified from the edgeR user manual concerning between and within subjects
comparisons (Section 3.5) and another containing specific times points to explain my point, both dataframes re-numbered as recommended by the manual.
> targets Disease Patient Treatment 1 Healthy 1 None 2 Healthy 1 Hormone 3 Healthy 2 None 4 Healthy 2 Hormone 5 Healthy 3 None 6 Healthy 3 Hormone 7 Disease1 1 None 8 Disease1 1 Hormone 9 Disease1 2 None 10 Disease1 2 Hormone 11 Disease2 1 None 12 Disease2 1 Hormone 13 Disease2 2 None 14 Disease2 2 Hormone 15 Disease2 3 None 16 Disease2 3 Hormone > sample_data Condition Subject Time 1 control 1 0hr 2 control 1 1hr 3 control 1 2hr 4 control 2 0hr 5 control 2 1hr 6 control 2 2hr 7 control 3 0hr 8 control 3 1hr 9 control 3 2hr 10 control 4 0hr 11 control 4 1hr 12 control 4 2hr 13 Disease 1 0hr 14 Disease 1 1hr 15 Disease 1 2hr 16 Disease 2 0hr 17 Disease 2 1hr 18 Disease 2 2hr
I have read the initial posting that lead to this section of the manual and it said to drop the samples that don't have equal numbers. Now this doesn't seem to be a big deal if only dropping from one group a sample or two but could potentially be a problem such as above where dropping four or six samples seems more of a sacrifice. I begin to think of experiments
which (assuming repeated/dependent samples) group numbers very more significantly as a result of difficulty acquiring samples. Are there any recommendations from the community regarding such a situation? All I have found assumes that the samples within each group are equal.
Integrated Biosciences PhD Candidate
University of Minnesota