Question: Multi-factorial design for patient-replicates from different time points (Deseq2)
0
gravatar for alva.james
2.6 years ago by
alva.james0
Germany
alva.james0 wrote:

Dear All,

Since we have a rather complicated experimental design we would like to get

We have 70 samples from 40 patients. Each sample was collected from a different time interval (ID and RE) based on the disease stage. For 10 patients there is only 1 sample available. We would like to find the differentially expressed genes between subtype1 and subtype2, in which the patients are divided.

For example, this is how our design  look like,

Sample condition Patient
A1_ID Subtype1 A1
A1_RE Subtype1 A1
B1_ID Subtype2 B1
B1_RE Subtype2 B1
C1_ID Subtype1 C1
     
     

 

Our concern is that are we boosting certain genes by having two samples from the same patients, how could we account for that using a multifactorial design? Should we account for additive effect ( ~ condition+patient ) or also interaction (~ condition*patient )?

Thanks for all opinion and suggestions

                

 

ADD COMMENTlink modified 2.6 years ago by Michael Love25k • written 2.6 years ago by alva.james0

Both of those models pool across the time interval, which seems bold given you say that corresponds to disease stage.  For the patients with only one sample, do these correspond to one specific timepoint, or are they censored (ie the disease didn't progress so you only have the initial timepoint, or progressed so rapidly that you don't have the first timepoint) or are they missing entirely at random? If they're missing at random, then I'd think any biases would cancel out, but if not, then we'll need more details.

I'd recommend thoroughly justifying your decision to pool over timepoint/stage, or consider a model including a timepoint factor.  It looks like you'll have difficulty fitting both patient and condition factors as they look confounded from the snippet you give; you can get some of the way using the advice given in section 3.12 of the DESeq2 vignette.  

Roughly, your options for taking timepoint/stage into account are: pool (as you've done);normalise out stage-effect; stratify, so you have analyses for both timepoints separately; or look for interactions (so which genes have a different time-profile in different subtypes).  

 

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Gavin Kelly560

I agree with Gavin's comments here.

It will help if you can say more about exactly what kind of DE across subtype you are interested in, particularly given the two disease stages.

ADD REPLYlink written 2.6 years ago by Michael Love25k

@Gavin ,@Mike, Thanks for your opinion. It is much valued. The dropouts are purely technical, and not disease-related.

Also, in another analysis, we could see that the difference between the stages is very minimal and that the subtype is the driving force in the overall expression pattern of the samples. This was our rationale to pool them together. Having said that, what we are interested in, is to characterize the upregulated genes in our subtype of question.

Regarding the vignette: I do not fully understand what example you are referring to in section 3.12 in the DESeq2 vignette. What scenario are you suggesting?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by alva.james0
Answer: Multi-factorial design for patient-replicates from different time points (Deseq2
3
gravatar for Michael Love
2.6 years ago by
Michael Love25k
United States
Michael Love25k wrote:

hi Alva,

It's ultimately up to you how to model disease state. If you are just interested in comparing across subtype, and you want to control for patient effects, this is not actually possible using fixed effects in DESeq2, because patient and subtype is confounded, and you are interested in making comparisons across patients, not within patient (the example Gavin refers to is about making before / after comparisons for a number of patients divided into groups). You could use limma however with its duplicateCorrelation() function to account for the correlations of multiple samples from the same patient, while comparing across subtype. See the limma User Guide.

ADD COMMENTlink written 2.6 years ago by Michael Love25k

Micheal,Thanks for the reply, but I think the question wasn't conveyed clear enough, 

here is my question again,

My question is to find differentially expressed genes between groups, so here in my example cohort I have 82 samples from approximately 40 patients,  and Some patients are paired means, some of them have both ID and RE samples whereas other patients has only ID or RE samples. And what I am aiming is to look for Differentially expressed (DE) genes between two subytpe1 and subytype2. Here suytype1 is defined as set of patients  for instance,

Subytpe1 : A1_ID, A1_RE, C1_ID,....... n21, here there is 21 samples

Subtype2:. B1_ID, B1_RE....n61, here we have 61 samples

So at the end I need DE genes between the above design, and my question whether DEseq2 has a inbulid statistics which would account for the within subytpe patients effects . As you see there are biological replicates with each subtype.

 

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by alva.james0
1

You can't do that with DESeq2 (account for correlation among samples within groups, and make comparison across group). 

You'd have to use duplicateCorrelation() with limma as in my above post.

ADD REPLYlink written 2.5 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour