Question

Multi-factorial design for patient-replicates from different time points (Deseq2)

0

Entering edit mode

alva.james • 0

@alvajames-6967

Last seen 5.8 years ago

Germany

Dear All,

Since we have a rather complicated experimental design we would like to get

We have 70 samples from 40 patients. Each sample was collected from a different time interval (ID and RE) based on the disease stage. For 10 patients there is only 1 sample available. We would like to find the differentially expressed genes between subtype1 and subtype2, in which the patients are divided.

For example, this is how our design look like,

Sample	condition	Patient
A1_ID	Subtype1	A1
A1_RE	Subtype1	A1
B1_ID	Subtype2	B1
B1_RE	Subtype2	B1
C1_ID	Subtype1	C1

Our concern is that are we boosting certain genes by having two samples from the same patients, how could we account for that using a multifactorial design? Should we account for additive effect ( ~ condition+patient ) or also interaction (~ condition*patient )?

Thanks for all opinion and suggestions

deseq2 rnaseq timecourse multiple time points multifactorial design • 1.5k views

ADD COMMENT • link updated 7.2 years ago by Michael Love 41k • written 7.2 years ago by alva.james • 0

0

Entering edit mode

Both of those models pool across the time interval, which seems bold given you say that corresponds to disease stage. For the patients with only one sample, do these correspond to one specific timepoint, or are they censored (ie the disease didn't progress so you only have the initial timepoint, or progressed so rapidly that you don't have the first timepoint) or are they missing entirely at random? If they're missing at random, then I'd think any biases would cancel out, but if not, then we'll need more details.

I'd recommend thoroughly justifying your decision to pool over timepoint/stage, or consider a model including a timepoint factor. It looks like you'll have difficulty fitting both patient and condition factors as they look confounded from the snippet you give; you can get some of the way using the advice given in section 3.12 of the DESeq2 vignette.

Roughly, your options for taking timepoint/stage into account are: pool (as you've done);normalise out stage-effect; stratify, so you have analyses for both timepoints separately; or look for interactions (so which genes have a different time-profile in different subtypes).

ADD REPLY • link 7.2 years ago Gavin Kelly ▴ 680

0

Entering edit mode

I agree with Gavin's comments here.

It will help if you can say more about exactly what kind of DE across subtype you are interested in, particularly given the two disease stages.

ADD REPLY • link 7.2 years ago Michael Love 41k

0

Entering edit mode

@Gavin ,@Mike, Thanks for your opinion. It is much valued. The dropouts are purely technical, and not disease-related.

Also, in another analysis, we could see that the difference between the stages is very minimal and that the subtype is the driving force in the overall expression pattern of the samples. This was our rationale to pool them together. Having said that, what we are interested in, is to characterize the upregulated genes in our subtype of question.

Regarding the vignette: I do not fully understand what example you are referring to in section 3.12 in the DESeq2 vignette. What scenario are you suggesting?

ADD REPLY • link 7.2 years ago alva.james • 0

score 3 · Accepted Answer · 2017-03-01

3

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

hi Alva,

It's ultimately up to you how to model disease state. If you are just interested in comparing across subtype, and you want to control for patient effects, this is not actually possible using fixed effects in DESeq2, because patient and subtype is confounded, and you are interested in making comparisons across patients, not within patient (the example Gavin refers to is about making before / after comparisons for a number of patients divided into groups). You could use limma however with its duplicateCorrelation() function to account for the correlations of multiple samples from the same patient, while comparing across subtype. See the limma User Guide.

ADD COMMENT • link 7.2 years ago Michael Love 41k

0

Entering edit mode

Micheal,Thanks for the reply, but I think the question wasn't conveyed clear enough,

here is my question again,

My question is to find differentially expressed genes between groups, so here in my example cohort I have 82 samples from approximately 40 patients, and Some patients are paired means, some of them have both ID and RE samples whereas other patients has only ID or RE samples. And what I am aiming is to look for Differentially expressed (DE) genes between two subytpe1 and subytype2. Here suytype1 is defined as set of patients for instance,

Subytpe1 : A1_ID, A1_RE, C1_ID,....... n21, here there is 21 samples

Subtype2:. B1_ID, B1_RE....n61, here we have 61 samples

So at the end I need DE genes between the above design, and my question whether DEseq2 has a inbulid statistics which would account for the within subytpe patients effects . As you see there are biological replicates with each subtype.

ADD REPLY • link 7.1 years ago alva.james • 0

1

Entering edit mode

You can't do that with DESeq2 (account for correlation among samples within groups, and make comparison across group).

You'd have to use duplicateCorrelation() with limma as in my above post.

ADD REPLY • link 7.1 years ago Michael Love 41k