Multi-factorial design for patient-replicates from different time points (Deseq2)
Entering edit mode
alva.james • 0
Last seen 2.9 years ago

Dear All,

Since we have a rather complicated experimental design we would like to get

We have 70 samples from 40 patients. Each sample was collected from a different time interval (ID and RE) based on the disease stage. For 10 patients there is only 1 sample available. We would like to find the differentially expressed genes between subtype1 and subtype2, in which the patients are divided.

For example, this is how our design  look like,

Sample condition Patient
A1_ID Subtype1 A1
A1_RE Subtype1 A1
B1_ID Subtype2 B1
B1_RE Subtype2 B1
C1_ID Subtype1 C1


Our concern is that are we boosting certain genes by having two samples from the same patients, how could we account for that using a multifactorial design? Should we account for additive effect ( ~ condition+patient ) or also interaction (~ condition*patient )?

Thanks for all opinion and suggestions



deseq2 rnaseq timecourse multiple time points multifactorial design • 830 views
Entering edit mode

Both of those models pool across the time interval, which seems bold given you say that corresponds to disease stage.  For the patients with only one sample, do these correspond to one specific timepoint, or are they censored (ie the disease didn't progress so you only have the initial timepoint, or progressed so rapidly that you don't have the first timepoint) or are they missing entirely at random? If they're missing at random, then I'd think any biases would cancel out, but if not, then we'll need more details.

I'd recommend thoroughly justifying your decision to pool over timepoint/stage, or consider a model including a timepoint factor.  It looks like you'll have difficulty fitting both patient and condition factors as they look confounded from the snippet you give; you can get some of the way using the advice given in section 3.12 of the DESeq2 vignette.  

Roughly, your options for taking timepoint/stage into account are: pool (as you've done);normalise out stage-effect; stratify, so you have analyses for both timepoints separately; or look for interactions (so which genes have a different time-profile in different subtypes).  


Entering edit mode

I agree with Gavin's comments here.

It will help if you can say more about exactly what kind of DE across subtype you are interested in, particularly given the two disease stages.

Entering edit mode

@Gavin ,@Mike, Thanks for your opinion. It is much valued. The dropouts are purely technical, and not disease-related.

Also, in another analysis, we could see that the difference between the stages is very minimal and that the subtype is the driving force in the overall expression pattern of the samples. This was our rationale to pool them together. Having said that, what we are interested in, is to characterize the upregulated genes in our subtype of question.

Regarding the vignette: I do not fully understand what example you are referring to in section 3.12 in the DESeq2 vignette. What scenario are you suggesting?

Entering edit mode
Last seen 18 hours ago
United States

hi Alva,

It's ultimately up to you how to model disease state. If you are just interested in comparing across subtype, and you want to control for patient effects, this is not actually possible using fixed effects in DESeq2, because patient and subtype is confounded, and you are interested in making comparisons across patients, not within patient (the example Gavin refers to is about making before / after comparisons for a number of patients divided into groups). You could use limma however with its duplicateCorrelation() function to account for the correlations of multiple samples from the same patient, while comparing across subtype. See the limma User Guide.

Entering edit mode

Micheal,Thanks for the reply, but I think the question wasn't conveyed clear enough, 

here is my question again,

My question is to find differentially expressed genes between groups, so here in my example cohort I have 82 samples from approximately 40 patients,  and Some patients are paired means, some of them have both ID and RE samples whereas other patients has only ID or RE samples. And what I am aiming is to look for Differentially expressed (DE) genes between two subytpe1 and subytype2. Here suytype1 is defined as set of patients  for instance,

Subytpe1 : A1_ID, A1_RE, C1_ID,....... n21, here there is 21 samples

Subtype2:. B1_ID, B1_RE....n61, here we have 61 samples

So at the end I need DE genes between the above design, and my question whether DEseq2 has a inbulid statistics which would account for the within subytpe patients effects . As you see there are biological replicates with each subtype.


Entering edit mode

You can't do that with DESeq2 (account for correlation among samples within groups, and make comparison across group). 

You'd have to use duplicateCorrelation() with limma as in my above post.


Login before adding your answer.

Traffic: 238 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6