Hello DESeqers,
I have two groups of patients whose gene expression is expected to change differently over time. In one group, the change in gene expression is expected to be greater or at least different, but individual patients are also expected to have different baseline levels of gene expression as well. Therefore, I'm interested the interaction of time, and patient, and not in individual patient effects.
In my DESeq2 model I use ~Patient * TimePt, and I would like to do a contrast in which I combine the patient.latetimepoint interactions in one group vs the patient.latetimepoint interactions in the other group. For this, I think I need to use lfcShrink with type="ashr" and a contrast argument, and would also need something along the lines of the listValues argument, but it doesn't appear that lfcShrink has such a thing. However, it does appear that ashr::estimate_mixprop can take a "weights" argument, and that that may be what I want to use. Is this correct, or am I barking up the wrong tree?
Thanks for your help,
Jonathan
Hi Michael,
Thanks for your response. At the risk of going into too much detail, the experiment is set up like this:
I have 9 patients. Patients 10, 58 & 96 experience disease relapse. Patients 16, 18, 23, 50, 55 & 75 remain in remission. For each patient, I have 3 timepoints, T1, T2, and T3. T1 is factor level 1, as is patient 16. Each time point has 2 or 3 replicates. I have 70 samples in all.
Since there is patient to patient variation, I'd like to model for that, as well as model for changes in gene expression over time that are common in all patients. The effects I'm interested in are the difference in the changes between the patients who progress and those who stay in remission.
My analysis looks like this:
This is one of the comparisons that I would like to make to look at the difference in gene expression changes for the relapse patients vs the continued remission in Time T3 vs Time T1. So, I'm using the 3 interaction terms for the relapse patients vs. the 6 continued remission patients:
Note that this is also relative to patient 16, the reference, since they remained in remission.
I tried this:
That fails with the message: Error in is.list(x) : unused argument (listValues = c(1/3, -1/6))
I also tried this:
I'm not sure I'm using the weights argument correctly, nor the optmethod argument, but I was thinking I need a listValues equivalent, but that's not working with lfcShrink, so I used weights. Anyways, this gave some enormous log2 fold changes (e.g. 37, -34, etc.) for the differentially expressed genes, so I think it's wrong. Do you have any suggestions?
One last thing, I'm using package.version('DESeq2') == "1.22.1"
DESeq2 has been extremely useful software for me, and I appreciate you taking the time to help!
Cheers,
Jonathan
hi Jonathan,
One more question before I give some code examples: is it sufficient to account for patient-specific baseline (at first time point), or do you want to have a different time course for each patient? So the changes over time are a random effect? The first one can definitely be done with apeglm and ashr "out of the box" but for the second I would consider perhaps using a mixed effects model or something similar on variance stabilized data.
Also, are the replicates technical (more sequencing of the same library)?
Hi Michael,
The replicates are different cell sorts from the same patient at the same time point, so I've been treating them as separate samples and biological replicates. I have been summing counts from different sequence runs on the same sample. (We're not using UMIs.)
What we're thinking is that we might see is a differential change in gene expression leading up to relapse, and our hypothesis is that at least some genes may be doing more or less the same thing in the three relapse patients vs the six continued remission patients. We're not expecting random effects to predominate, but there certainly may be complicating factors relating to patient and time-point-specific immune states, phasing of time points, etc., so perhaps that's an argument for a mixed effects model.
In an earlier approach, one that I'm revisiting at the moment since we now have deeper sequencing, we looked at longitudinal changes in gene expression in individual patients. We definitely saw differential expression over time, and we sought to find common genes that were significantly up or down in the relapse patients but not in the remission patients, and vice versa. This involved a lot of Venn diagrams that often amounted to zero genes in the union sets. One interpretation is that the dysfunction that leads to relapse in one patient may be different from the dysfunction that leads to relapse in another patient (in which case the project may be doomed), or alternatively, that this kind of signature is hard to tease out of bulk data. The story is complicated by the possibility that there may be some heterogeneity that we do not pick up with our cell-surface markers, which may dilute biological effects of interest.
Another approach to this analysis was to group the continued remission patients together and group the relapse patients together (as patient Type). The model for that was ~Type*TimePt, with Rem and T1 as the reference factor levels, so I'd just use the relapse T3 interaction term to tell me how the relapse time course differed from the continued remission time course:
In this case, I didn't see any differentially expressed genes, at least none which survived the multiple comparison correction. My sense about this comparison, however, was that grouping the patients together this way would give greater dispersions, whereas modeling the individual patients would be more of a paired comparison.
Doing the ~Patient*TimePt model in edgeR, I was able to get this to work, or at least I got a small list of DE genes for T3 vs T1 and T2 vs T1. But maybe it doesn't mean what I think it means.
Thanks again for your continued assistance.
Cheers,
Jonathan
You may want to consider discussing the more complex analysis plan with a statistician and consider how to include random effects over time for individual patients.
My recommendation for the case where you control for patient specific baseline would be to follow the example here:
https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups
You would use a design where group corresponds to relapse vs remission, the nested individual variable is 1,2,3 for the relapse group and 1,2,3,4,5,6 for the remission group, and then you include a time variable for the three time points. Because you have different numbers of individuals across group, you would have to form the model matrix and then remove some columns which have all zeros (as discussed in the section above).
But this will give you the relapse-specific coefficients and the remission-specific coefficients for the time effects controlling for patient baseline which you can perform shrinkage on.
If you want to shrink the difference between the relapse and remission-specific coefficients for time, you can swap in a design of
~ grp + condition + grp:ind.n + grp:cnd
. Here the interaction term will represent the difference between the two groups in the time effects. You don't need to re-estimate the dispersions after swapping the design because the designs are equivalent, only changing the interpretation of the coefficients. So you only need to run nbinomWaldTest() and then lfcShrink().