Question

lfcShrink, interactions, and ashr

1

Entering edit mode

jurbach ▴ 10

@jurbach-18770

Last seen 6.1 years ago

Hello DESeqers,

I have two groups of patients whose gene expression is expected to change differently over time. In one group, the change in gene expression is expected to be greater or at least different, but individual patients are also expected to have different baseline levels of gene expression as well. Therefore, I'm interested the interaction of time, and patient, and not in individual patient effects.

In my DESeq2 model I use ~Patient * TimePt, and I would like to do a contrast in which I combine the patient.latetimepoint interactions in one group vs the patient.latetimepoint interactions in the other group. For this, I think I need to use lfcShrink with type="ashr" and a contrast argument, and would also need something along the lines of the listValues argument, but it doesn't appear that lfcShrink has such a thing. However, it does appear that ashr::estimate_mixprop can take a "weights" argument, and that that may be what I want to use. Is this correct, or am I barking up the wrong tree?

Thanks for your help,

Jonathan

deseq2 lfcshrink contrast • 2.2k views

ADD COMMENT • link updated 7.1 years ago by Michael Love 43k • written 7.1 years ago by jurbach ▴ 10

score 2 · Answer 1 · 2018-12-10

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

I think you can do this with apeglm or ashr out of the box, but can you post your sample information so I have a better idea? Eg

Patient 1, group 1, t0
Patient 1, group 1, t1
...

ADD COMMENT • link 7.1 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thanks for your response. At the risk of going into too much detail, the experiment is set up like this:

I have 9 patients. Patients 10, 58 & 96 experience disease relapse. Patients 16, 18, 23, 50, 55 & 75 remain in remission. For each patient, I have 3 timepoints, T1, T2, and T3. T1 is factor level 1, as is patient 16. Each time point has 2 or 3 replicates. I have 70 samples in all.

Since there is patient to patient variation, I'd like to model for that, as well as model for changes in gene expression over time that are common in all patients. The effects I'm interested in are the difference in the changes between the patients who progress and those who stay in remission.

My analysis looks like this:

des_EC_AGG_TimeXPatient <- DESeqDataSetFromMatrix( countData = EC_AGG_FILT,
                                                   colData = metadata_AGG,
                                                   design = ~TimePt*Patient )

des_EC_AGG_TimeXPatient <- estimateSizeFactors( object = des_EC_AGG_TimeXPatient )
des_EC_AGG_TimeXPatient <- estimateDispersions( object = des_EC_AGG_TimeXPatient )
des_EC_AGG_TimeXPatient <- nbinomWaldTest( object = des_EC_AGG_TimeXPatient, maxit = 1000 )

Here are the results names:
# [1] "Intercept"                "TimePt_T2_vs_T1"          "TimePt_T3_vs_T1"         
# [4] "Patient_58_vs_16" "Patient_75_vs_16" "Patient_18_vs_16"
# [7] "Patient_23_vs_16" "Patient_55_vs_16" "Patient_10_vs_16"
#[10] "Patient_50_vs_16" "Patient_96_vs_16" "TimePtT2.Patient58"  
#[13] "TimePtT3.Patient58"   "TimePtT2.Patient75"   "TimePtT3.Patient75"  
#[16] "TimePtT2.Patient18"   "TimePtT3.Patient18"   "TimePtT2.Patient23"  
#[19] "TimePtT3.Patient23"   "TimePtT2.Patient55"   "TimePtT3.Patient55"  
#[22] "TimePtT2.Patient10"   "TimePtT3.Patient10"   "TimePtT2.Patient50"  
#[25] "TimePtT3.Patient50"   "TimePtT2.Patient96"   "TimePtT3.Patient96"

This is one of the comparisons that I would like to make to look at the difference in gene expression changes for the relapse patients vs the continued remission in Time T3 vs Time T1. So, I'm using the 3 interaction terms for the relapse patients vs. the 6 continued remission patients:

contrast.T3.Rel = list(c("TimePtT3.Patient10", "TimePtT3.Patient58", "TimePtT3.Patient96"),
                       c("TimePtT3.Patient75", "TimePtT3.Patient18", "TimePtT3.Patient23",
                         "TimePtT3.Patient55", "TimePtT3.Patient50" ) )

Note that this is also relative to patient 16, the reference, since they remained in remission.

I tried this:

res_EC_AGG_FILT_TimeXPatient <- lfcShrink( dds = des_EC_AGG_TimeXPatient,
                                           type = "ashr",
                                           contrast = contrast.T3.Rel,
                                           listValues=c(1/3,-1/6))

That fails with the message: Error in is.list(x) : unused argument (listValues = c(1/3, -1/6))

I also tried this:

res_EC_AGG_FILT_TimeXPatient <- lfcShrink( dds = des_EC_AGG_TimeXPatient,
                                           type = "ashr",
                                           contrast = contrast.T3.Rel,
                                           optmethod = "w_mixEM",
                                           weights = c(1/3,-1/6) )

I'm not sure I'm using the weights argument correctly, nor the optmethod argument, but I was thinking I need a listValues equivalent, but that's not working with lfcShrink, so I used weights. Anyways, this gave some enormous log2 fold changes (e.g. 37, -34, etc.) for the differentially expressed genes, so I think it's wrong. Do you have any suggestions?

One last thing, I'm using package.version('DESeq2') == "1.22.1"

DESeq2 has been extremely useful software for me, and I appreciate you taking the time to help!

Cheers,

Jonathan

ADD REPLY • link 7.1 years ago jurbach ▴ 10

0

Entering edit mode

hi Jonathan,

One more question before I give some code examples: is it sufficient to account for patient-specific baseline (at first time point), or do you want to have a different time course for each patient? So the changes over time are a random effect? The first one can definitely be done with apeglm and ashr "out of the box" but for the second I would consider perhaps using a mixed effects model or something similar on variance stabilized data.

Also, are the replicates technical (more sequencing of the same library)?

ADD REPLY • link 7.1 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

The replicates are different cell sorts from the same patient at the same time point, so I've been treating them as separate samples and biological replicates. I have been summing counts from different sequence runs on the same sample. (We're not using UMIs.)

What we're thinking is that we might see is a differential change in gene expression leading up to relapse, and our hypothesis is that at least some genes may be doing more or less the same thing in the three relapse patients vs the six continued remission patients. We're not expecting random effects to predominate, but there certainly may be complicating factors relating to patient and time-point-specific immune states, phasing of time points, etc., so perhaps that's an argument for a mixed effects model.

In an earlier approach, one that I'm revisiting at the moment since we now have deeper sequencing, we looked at longitudinal changes in gene expression in individual patients. We definitely saw differential expression over time, and we sought to find common genes that were significantly up or down in the relapse patients but not in the remission patients, and vice versa. This involved a lot of Venn diagrams that often amounted to zero genes in the union sets. One interpretation is that the dysfunction that leads to relapse in one patient may be different from the dysfunction that leads to relapse in another patient (in which case the project may be doomed), or alternatively, that this kind of signature is hard to tease out of bulk data. The story is complicated by the possibility that there may be some heterogeneity that we do not pick up with our cell-surface markers, which may dilute biological effects of interest.

Another approach to this analysis was to group the continued remission patients together and group the relapse patients together (as patient Type). The model for that was ~Type*TimePt, with Rem and T1 as the reference factor levels, so I'd just use the relapse T3 interaction term to tell me how the relapse time course differed from the continued remission time course:

res_EC_AGG_STIM <- lfcShrink( dds = des_EC_AGG_STIM, coef = "TimePtT3.TypeRel", type = "apeglm" )

In this case, I didn't see any differentially expressed genes, at least none which survived the multiple comparison correction. My sense about this comparison, however, was that grouping the patients together this way would give greater dispersions, whereas modeling the individual patients would be more of a paired comparison.

Doing the ~Patient*TimePt model in edgeR, I was able to get this to work, or at least I got a small list of DE genes for T3 vs T1 and T2 vs T1. But maybe it doesn't mean what I think it means.

Thanks again for your continued assistance.

Cheers,

Jonathan

ADD REPLY • link 7.1 years ago jurbach ▴ 10

0

Entering edit mode

You may want to consider discussing the more complex analysis plan with a statistician and consider how to include random effects over time for individual patients.

My recommendation for the case where you control for patient specific baseline would be to follow the example here:

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

You would use a design where group corresponds to relapse vs remission, the nested individual variable is 1,2,3 for the relapse group and 1,2,3,4,5,6 for the remission group, and then you include a time variable for the three time points. Because you have different numbers of individuals across group, you would have to form the model matrix and then remove some columns which have all zeros (as discussed in the section above).

But this will give you the relapse-specific coefficients and the remission-specific coefficients for the time effects controlling for patient baseline which you can perform shrinkage on.

If you want to shrink the difference between the relapse and remission-specific coefficients for time, you can swap in a design of ~ grp + condition + grp:ind.n + grp:cnd. Here the interaction term will represent the difference between the two groups in the time effects. You don't need to re-estimate the dispersions after swapping the design because the designs are equivalent, only changing the interpretation of the coefficients. So you only need to run nbinomWaldTest() and then lfcShrink().

ADD REPLY • link 7.1 years ago Michael Love 43k