DESeq2 : group specific effect of a treatment while controlling for individual effect
1
0
Entering edit mode
SPbeginner • 0
@spbeginner-15170
Last seen 5.1 years ago

Dear Bioconductor community,

I have timecourse RNA-seq data.
Animals received 4 differents treatments (group = GpA, GpB, GpCD, GpE).
There are 6 animals in each group.
For each animal,blood were collacted before any treatment (D0) and at 3 time points after treatment: D3, D7 and D28.
So, i have a design matrix looking like :

Animal    Time    Treatment Rep
M657    0    GpA 1
M657    3    GpA 1
M657    7    GpA 1
M657    28    GpA 1
M658    0    GpA 2
M658    3    GpA 2
M658    7    GpA 2
M658    28    GpA 2
.
.
.
.
D627    0    GpB 1
D627    3    GpB 1
D627    7    GpB 1
D627    28    GpB 1
D628    0    GpB 2
D628    3    GpB 2
D628    7    GpB 2
D628    28    GpB 2
.
.
.
.

With rep being a factor value for Animal-Group which distinguishes the individual nested within a group.


I want to determine :
1/ Which genes respond at either the Day3, Day7 or day28 in the different group
for exemple :
GpA : Day3vsDay0 Day7vsDay0, Day28vsDay0
do I have to analyse each treatment separatly ? Or I can analyse them all together ?
Because each subject yields a value for each time point, can I take "Animal" effect into my model ?

2/ Which genes respond differently over time in the different groups ie :
Which genes respond differently over time in the GpA relative to the GpB?

Here is what I did :

Design_clean <- data.frame(Treatment=Group,Time=factor(Time), ind.n=factor(order$Rep),row.names=row.names(order))

dds <- DESeqDataSetFromMatrix(countData = Count, colData = Design_clean, design=~Treatment+Treatment:ind.n+Treatment:Time)
dds <- DESeq(dds, minReplicatesForReplace=Inf, modelMatrixType="standard", parallel=TRUE)

Is the design correct ?

Here are a subset of resultsNames(dds) output :

If I understant well, the following comparisons answer to my first question ie, which genes respond at either the Day3, Day7 or Day28 in the different group :

"GpA.Time3" "GpA.Time7" "GpA.Time28"

It's not indicated if the comparison was made against baseline (ie Day0), is it ? Is the within subject variability taken into account in these comparison ?

I also have :

"GpA.ind.n1"  "GpA.ind.n2" "GpA.ind.n3" "GpA.ind.n4" "GpA.ind.n5" "GpA.ind.n6"

Is it within subject variability ?

"GpA_vs_GpB" "GpA_vs_GpCD" "GpA_vs_GpE"

Here, the comparisons are made between the Group, but Time information is not take into account.

How can I do, to have, for example, genes that respond differently over time in the GpA relative to the GpB at day 3 ie GpA_vs_GpB.day3 = (GpA.3-GpA.0)-(GpB.3-GpB.0)

Thanks in advance for your help,

C.

 

deseq2 DESeq design and contrast matrix deseq • 2.1k views
ADD COMMENT
0
Entering edit mode

Is there any option in DESeq to protect against outliers ?

ADD REPLY
1
Entering edit mode

Yes there are automatic procedures to filter or replace outliers, and these have unit tests to make sure they are working. However even saying what is an "outlier" is difficult, and users sometimes want different behavior. If there are 5 samples in a group and 2 have high expression compared to another group, are those 2 samples oultiers? Or is that important biological signal? What if it's 2 out of 6, or 3 out of 6? You can read about the outlier routines in the vignette and in the DESeq2 paper. The short version is that an individual count is labelled as an outlier if the LFC would be very different without that count (as defined by a standard regression diagnostic).

ADD REPLY
0
Entering edit mode

I have another question.

I have the same kind of RNA-seq data to analyse, but this time, each subject doesn't yields a value at each time point. I have animal with missing data at some time points

Can I still set this design ? 

design=~Treatment+Treatment:ind.n+Treatment:Time
ADD REPLY
0
Entering edit mode

I moved your post to a "comment", instead of an "answer".

It's not necessarily possible to fit the same model if there is missing data. If you have some un-paired samples, you can instead use limma-voom with the duplicateCorrelation() function. See the limma user guide and if necessary, make a new post.

ADD REPLY
0
Entering edit mode

How can I analyse these data with DESeq2 ? 

Am I correct :

TS <- paste(Design_clean$Treatment, Design_clean$Time, sep=".")
Design_clean_simple <-data.frame(Group=TS,Animal=factor(Design_clean$Animal),row.names=row.names(Design_clean))

dds <- DESeqDataSetFromMatrix(countData = Count, colData = Design_clean, design=~Group)

ADD REPLY
0
Entering edit mode

If you want to test for the effect of treatment over time while controlling for individual animals, and some measurements are missing, you can't do this with a fixed effects model (you can't use DESeq2).

To be more specific: some animals are missing values along the time series.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 9 minutes ago
United States

"Is the design correct?"

Yes, this is how I would approach it, following the example in the vignette as it looks you are.

Yes those coefficients are the group specific comparisons of time points against the reference level of time (t0). The within individual variation is taken into account by the Treatment:ind.n term: every individual in every treatment group has its own baseline that is controlled for in the design. To make comparisons across group, you can use 'contrast' instead of 'name' in the results() function, as is described in that section in the vignette:

contrast=list("GpB.Time3", "GpA.Time3")
ADD COMMENT
0
Entering edit mode

Thank you for your reply !

Now I understand !

ADD REPLY

Login before adding your answer.

Traffic: 713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6