Search
Question: deseq2 course experiment
0
gravatar for riccardo
17 months ago by
riccardo50
riccardo50 wrote:

Hello,

I read this http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments about time course experiment.

Could you please explain me what is the meaning of these results obtained by resultsNames(ddsTC):

minute_30_vs_0

strain_mut_vs_wt

strainmut.minute15

I tried to read the ?results but it did not help me.

Thank you.

ADD COMMENTlink modified 17 months ago • written 17 months ago by riccardo50

Thank you!

I have some difficulties to choose the correct model to use in DESeq2, this is the condition table:

condition time
WT 2
WT 2
WT 2
A 2
A 2
A 2
A 2
B 2
B 2
B 2
B 2
WT 5
WT 5
WT 5
A 5
A 5
A 5
A 5
B 5
B 5
B 5
B 5

 

I do not have a time 0 and the experiment can be summarized in this way:

WT is the wild type animal, A is the WT animal in which one gene was inserted in the genome and B is the WT animal in which a gene (different from A) was inserted in the genome. A molecule M was given to the animals in order to express the gene in A and in B. M was given to all the animals and after 2 days WT, A and B were sacrificed while other animals WT, A and B were treated for other 3 days (so 5 days) with M and after 5 days these animals were sacrificed.

1)Is this experiment a time course experiment that I have to analyzed as written here: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments ?

2) or should I use as model: ~ condition + time?

3) Or should I use this condition table?

condition
WT2
WT2
WT2
A2
A2
A2
A2
B2
B2
B2
B2
WT5
WT5
WT5
A5
A5
A5
A5
B5
B5
B5
B5

4) Is it important the order of the variable in the model?

For example ~ condition + time and ~ condition + time + condition:time are different from ~ condition:time + time + condition?

5) I would really appreciate if you could briefly explain me what DESeq2 does when I use the models that I proposed in 1), 2) or 3) an maybe could be also helpful for other people.

Thank you very much for your time.

ADD REPLYlink written 17 months ago by riccardo50

It's easiest to provide help if you can state what you want to test. What kind of comparisons do you want to make?

The section of the vignette where I explain interactions with figures is the best I can do in terms of explaining interaction models in a general manner. I can hopefully point you on the right track if you let me know what kind of comparisons you are interested in here. In general, I also recommend investigators who are new to linear models to partner with local statisticians who can help to explain and interpret different linear models and contrasts.

ADD REPLYlink modified 17 months ago • written 17 months ago by Michael Love20k

Thanks Michael!

This is is what I would like to test:

A 2 days vs WT 2 days

A 5 days vs WT 5 days

B 2 days vs WT 2 days

B 5 days vs WT 5 days

A 2 days vs B 2 days

A 5 days vs B 5 days

Since I am interested in the pvalue and fold change of these specific comparison I have to use the Wald test after the LRT test as explained here: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments because LRT tests the differences considering all the time points and all the conditions together, right?

I read in the vignette https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions    that is possible to use a condition table like that:

condition
WT2
WT2
WT2
A2
A2
A2
A2
B2
B2
B2
B2
WT5
WT5
WT5
A5
A5
A5
A5
B5
B5
B5

B5

and then use the contrast to do the comparisons instead to use the interactions. Is this solution like the first solution that uses interaction, LRT and then Wald test?

Thank you. 

ADD REPLYlink written 17 months ago by riccardo50

You're right. For those comparisons, it's easiest to combine the condition and time into a factor and use a design of ~condition and then make these comparisons with the 'contrast' argument, e.g.:

results(dds, contrast=c("condition", "A2", "WT2"))
ADD REPLYlink written 17 months ago by Michael Love20k

Thanks Michael, I created the model as you suggested me and I performed a PCA rlog blind true:

http://imgur.com/a/cVAXB

As you can see Sample 6, 7 and should be near but 8 is in another group and also Sample 16, 17, 18 and 19 are split in two groups. Is it better to remove the replicates and use only two replicates or is it better to have more replicates even if they are not very similar?

Thank you

ADD REPLYlink written 17 months ago by riccardo50

I wouldn't recommend removing samples when you have so few. Something seems to be going on with PC2, maybe batch related, but that's not a good reason to toss samples.

ADD REPLYlink written 17 months ago by Michael Love20k

Thanks. When is it advisable to remove samples?

 

ADD REPLYlink written 17 months ago by riccardo50

That's a really good question, hard to answer briefly.

I look for technical justification either in sequencing QC measures, or the sample is deviating from all the rest in a PCA plot, etc.

Here, the samples are not deviating away from all the others, only from the others within their biological condition. To me that implies either batches, or a potential sample swap, but I'd want to confirm the latter through other avenues, not just assuming it was a sample swap from the PCA alone. Do you know about batches?

ADD REPLYlink written 17 months ago by Michael Love20k

One difference is that the RNA was extracted after 2 and 5 days because is a time course experiment. Should I use sva?

Thanks

ADD REPLYlink written 17 months ago by riccardo50

The 2 vs 5 difference in addition to condition was encoded in color I thought. I can't see the legend anyway.

SVA would be for finding differences *other* than the design.

ADD REPLYlink written 17 months ago by Michael Love20k
I used the condition merging the treatment and the time as above, for example control2, control5 ecc. In this case sva could remove batch effect that I am not aware? Thank you.
ADD REPLYlink written 17 months ago by riccardo50
yes. See our workflow for example
ADD REPLYlink written 17 months ago by Michael Love20k

Hi Michael, I would like to generate a PCA after to have used sva, I used this code:

dds = DESeqDataSetFromTximport(txi, cond, ~cond)
dds = dds[rowSums(counts(dds)) > 0, ]
dds = estimateSizeFactors(dds)

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~cond, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
#Number of significant surrogate variables is:  3

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + cond

rldBlindTrue <- rlog(ddssva, blind=T)

plotPCA(rldBlindTrue, intgroup=c("cond"))

but the PCA is the same as without using sva, is it correct the generation of the new PCA?

Thank you

ADD REPLYlink written 17 months ago by riccardo50

rlog does not remove variation associated with the design, for that you can use limma's removeBatchEffect() on the matrix in assay(rld).

ADD REPLYlink modified 17 months ago • written 17 months ago by Michael Love20k

Thanks, is this the correct way to use the information from DESeq2 and SVA in limma?

rldBlindTrueCounts = assay(rldBlindTrue)
rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData))

Thanks

ADD REPLYlink written 17 months ago by riccardo50

Sorry I think that this is more correct:

rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))

Since my ddsva@colData is:

               cond          SV1         SV2         SV3

           <factor>    <numeric>   <numeric>   <numeric>

Sample-6   control2   -0.2427044 -0.12319065  0.36255386

Sample-7   control2   -0.2951574  0.01406966  0.20129016

and using 

as.matrix(ddssva@colData)

converts the factor in this way:

          cond          SV1         SV2         SV3

Sample-6     1 -0.242704406 -0.12319065  0.36255386

Sample-7     1 -0.295157408  0.01406966  0.20129016

and now the PCA looks better. Is it correct what I wrote?

Thank you.

ADD REPLYlink written 17 months ago by riccardo50

yes                          

ADD REPLYlink modified 17 months ago • written 17 months ago by Michael Love20k

Thank you!

If I wanted the read counts normalized without batch effects could I use this command?

removeBatchEffect(counts(dds, normalized = TRUE), covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))

Thank you

ADD REPLYlink written 17 months ago by riccardo50
Yes
ADD REPLYlink written 17 months ago by Michael Love20k

Hi I think that there is a problem because some values are negative, what could be the problem? 

I have also another question regarding this code:

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~cond, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + cond

rldBlindTrue <- rlog(ddssva, blind=T)
rldBlindTrueCounts = assay(rldBlindTrue)
rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))

 

Since the SV variables are calculated using the normalized read counts is it correct to use removeBatchEffect with the matrix that come from the function rlog and the SV variable that come from the matrix of the normalized read counts?

Thank you

 

ADD REPLYlink modified 17 months ago • written 17 months ago by riccardo50

Hi Michael, I think that I found the problem, removeBatchEffect wants the log counts so using the log2 of the counts normalized by DESeq2 it seems work, do you agree?

 

ADD REPLYlink written 17 months ago by riccardo50

You can use either rlog() values, vst() values, or simply log2 normalized counts with removeBatchEffect() and using the SV calculated from normalized counts

ADD REPLYlink written 17 months ago by Michael Love20k

Hi Michael, thank you for your help. My WT2 and WT5 are my controls at different times but what if later I will be only interested in the comparison of, for example, A2 vs B2 and A5 vs B5? Should I perform again the analysis excluding WT2 and WT5 from the table of counts used by DESeq2?

Or is it better to leave these samples, even if they will be not used in the comparison, because they add useful information to estimate the dispersion of the genes?

Thank you.

ADD REPLYlink written 17 months ago by riccardo50

This is one of the FAQ in the vignette 

ADD REPLYlink written 17 months ago by Michael Love20k
0
gravatar for Michael Love
17 months ago by
Michael Love20k
United States
Michael Love20k wrote:

Check the vignette section on interactions, there is a diagram of how interaction terms work. This is the 30 vs 0 difference for WT, the mut vs WT difference at time 0, and the interaction for mut and time 15.

 

 

ADD COMMENTlink written 17 months ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 402 users visited in the last hour