Hi to all,
I have an RNA-seq study experimental design where I am testing for differential expression (DE) between two groups of individuals, though I have multiple biological samples per individual and an unbalanced number of samples per individual. I am currently only interested in finding DE genes between the two groups as a whole and not any particular questions at the sample level.
These are biopsies taken from an individual at the same time and from the same tissue type at different locations. Some individuals have more samples than others. Here is an example design matrix:
Subject Group Biopsy
1 1 NR 1
2 1 NR 2
3 2 R 1
4 2 R 2
5 2 R 3
6 3 NR 1
7 4 R 1
8 4 R 2
9 5 NR 1
10 5 NR 2
11 5 NR 3
12 5 NR 4
13 6 R 1
14 6 R 2
15 6 R 3
16 7 NR 1
I've searched online and read through a few similar questions including the relevant sections in the edgeR, limma, and DESeq2 user guides. So far either the design or DE question I've seen is slightly different than mine, such as asking questions about DE at the sample level as well as between groups, or the design having the exact same number of samples per individual.
Since the samples within an individual are expected to be correlated, does the proper DE analysis between groups need to follow a repeated measures (with duplicateCorrelation
) or block design ~Subject + Group
? Or am I overthinking this and it's simply still a straightforward two-group design ~Group
? Could the fact that some individuals have more samples than others potentially skew an ordinary two-group comparison that should be handled in some way?
The limma User's Guide Section 9.7. Multi-level Experiments
and edgeR User's Guide Section 3.5 Comparisons both between and within subjects
have a similar design though exactly two samples per subject. It was suggested in the limma guide that only looking at DE between groups is a two-group comparison, "If we only wanted to compared diseased to normal, we could do an ordinary two group comparison.". Does this still apply to unbalanced number of samples per subject?
Thank you very much - I've updated the
Biopsy
labels in the OP to make it more clear to others in future if they have the same question. Apologies too for mixing up the block design formula, I was confusing it with modeling a batch effect which I have in a follow-up question below. I have two follow-up questions:When I now add batches to the OP design (that are not confounded with
Group
orBiopsy
):For the same question where I want to test DE between the two groups, is this design:
correct along with
block=Subject
?I've looked over the edgeR User's Guide and manual again, from my understanding with edgeR there is no
block
option inglmQLFit
, or maybe no need with the QL model, so for my OP design I only need to do the ordinary two group comparison?and for dealing with the batches?
Looks ok, but you should include Batch only if Batch has a large effect, such that Subjects from the same Batch noticeably cluster together. Otherwise, specifying Subject as a random block would be sufficient.
You can only fit correlated blocks in limma because this methodology requires normality. It would be problematic to analyse your experiment in any negative binomial-base package such as edgeR or DESeq2.
Dear Dr Smyth - thanks very much for the answers and advice, particularly regarding the modeling of batch in this repeated measures design. I will look at MDS plots to see how the data clusters. I have two additional follow-up questions, and sorry in advance for many questions but hopefully these two will be the last for a while :-):
My colleague wondered why an unbalanced number of samples per subject is not something that needs any specific handling. It is counterintuitive to statistics laypeople like us. For example, if one subject has 6 samples and another subject 1 sample and this unbalance occurs between subjects in either group how would this not cause in the DE analysis subjects with many samples to outweigh, as it were, subjects in the same group with 1 or 2 for example?
As I'm waiting for my real datasets to finish getting generated I used a few very similar repeated measures design example datasets and compared standard limma-voom without
duplicateCorrelation
to a double-round withduplicateCorrelation
. What I found is that including the correlation and blocks adjusts/corrects the statistics and p-values slightly upward but that the ranking of DE genes is pretty much the same. If I am currently interested in selecting from the ranking of DE genes well below a particular FDR target then, just out of curiosity, wouldn't also edgeR QL and DESeq2 (which I know is more liberal like LRT) exhibit the same behavior as limma-voom withoutduplicateCorrelation
? Or would negative binomial-based models not exhibit simply slightly overconfident statistics? Essentially if the highest ranking genes are very well below the FDR target it doesn't matter that much that the repeated sample correlation wasn't modeled? For example what Aaron Lun said here.That's what
duplicateCorrelation
does -- it reduces the influence of subjects with more biopsies vs those with fewer. And more than that, it decides by how much to reduce the influence of extra repeated measures. Surely that's what you want. Surely you would want extra biopsies for a subject to have some extra influence, otherwise why would you bother collecting them?duplicateCorrelation
might change the order and it might not. The order is more likely to change if the number of repeated measures differs between subjects or if the repeated measures for a subject are contradictory. But, honestly, even if the order doesn't change, why would you not want to get more defensible p-values??I have to say that your two questions seem to me somewhat contradictory. Your first question seems to come from a pre-conception that unbalanced repeated measures are such a serious problem as to bamboozle any statistical analysis, but your second question posits that repeated measures have so little influence that they can be universally ignored.
I apologize for appearing to be contradictory. It wasn't meant that way. My colleague and I just didn't know in detail how
duplicateCorrelation
works and that it actually compensated for the imbalance. We didn't believe an imbalance in and of itself would bamboozle the analysis, just though if there was a very strong imbalance in number of samples per individual that this could somewhat affect results. Thank you for explaining in more detail how it handles this.Regarding my second question, see I am also using limma-voom DE testing as an ML feature scorer for classification tasks, so in this specific use case how defensible the p-values of selected features are if they are all well below the FDR target is not as critical as they would be for other use cases. The DE ranking (by p-value) is somewhat more important but still also not super critical if it's not drastically different between methods.