Question

deseq2 course experiment

0

Entering edit mode

ribioinfo ▴ 100

@ribioinfo-9434

Last seen 5.5 years ago

Hello,

I read this http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments about time course experiment.

Could you please explain me what is the meaning of these results obtained by resultsNames(ddsTC):

minute_30_vs_0

strain_mut_vs_wt

strainmut.minute15

I tried to read the ?results but it did not help me.

Thank you.

deseq2 • 4.2k views

ADD COMMENT • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

Thank you!

I have some difficulties to choose the correct model to use in DESeq2, this is the condition table:

condition	time
WT	2
WT	2
WT	2
A	2
A	2
A	2
A	2
B	2
B	2
B	2
B	2
WT	5
WT	5
WT	5
A	5
A	5
A	5
A	5
B	5
B	5
B	5
B	5

I do not have a time 0 and the experiment can be summarized in this way:

WT is the wild type animal, A is the WT animal in which one gene was inserted in the genome and B is the WT animal in which a gene (different from A) was inserted in the genome. A molecule M was given to the animals in order to express the gene in A and in B. M was given to all the animals and after 2 days WT, A and B were sacrificed while other animals WT, A and B were treated for other 3 days (so 5 days) with M and after 5 days these animals were sacrificed.

1)Is this experiment a time course experiment that I have to analyzed as written here: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments ?

2) or should I use as model: ~ condition + time?

3) Or should I use this condition table?

condition

WT2

A2

B2

WT5

A5

B5

4) Is it important the order of the variable in the model?

For example ~ condition + time and ~ condition + time + condition:time are different from ~ condition:time + time + condition?

5) I would really appreciate if you could briefly explain me what DESeq2 does when I use the models that I proposed in 1), 2) or 3) an maybe could be also helpful for other people.

Thank you very much for your time.

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

It's easiest to provide help if you can state what you want to test. What kind of comparisons do you want to make?

The section of the vignette where I explain interactions with figures is the best I can do in terms of explaining interaction models in a general manner. I can hopefully point you on the right track if you let me know what kind of comparisons you are interested in here. In general, I also recommend investigators who are new to linear models to partner with local statisticians who can help to explain and interpret different linear models and contrasts.

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael!

This is is what I would like to test:

A 2 days vs WT 2 days

A 5 days vs WT 5 days

B 2 days vs WT 2 days

B 5 days vs WT 5 days

A 2 days vs B 2 days

A 5 days vs B 5 days

Since I am interested in the pvalue and fold change of these specific comparison I have to use the Wald test after the LRT test as explained here: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments because LRT tests the differences considering all the time points and all the conditions together, right?

I read in the vignette https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions that is possible to use a condition table like that:

condition

WT2

A2

B2

WT5

A5

B5

and then use the contrast to do the comparisons instead to use the interactions. Is this solution like the first solution that uses interaction, LRT and then Wald test?

Thank you.

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

You're right. For those comparisons, it's easiest to combine the condition and time into a factor and use a design of ~condition and then make these comparisons with the 'contrast' argument, e.g.:

results(dds, contrast=c("condition", "A2", "WT2"))

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael, I created the model as you suggested me and I performed a PCA rlog blind true:

http://imgur.com/a/cVAXB

As you can see Sample 6, 7 and should be near but 8 is in another group and also Sample 16, 17, 18 and 19 are split in two groups. Is it better to remove the replicates and use only two replicates or is it better to have more replicates even if they are not very similar?

Thank you

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

I wouldn't recommend removing samples when you have so few. Something seems to be going on with PC2, maybe batch related, but that's not a good reason to toss samples.

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Thanks. When is it advisable to remove samples?

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

That's a really good question, hard to answer briefly.

I look for technical justification either in sequencing QC measures, or the sample is deviating from all the rest in a PCA plot, etc.

Here, the samples are not deviating away from all the others, only from the others within their biological condition. To me that implies either batches, or a potential sample swap, but I'd want to confirm the latter through other avenues, not just assuming it was a sample swap from the PCA alone. Do you know about batches?

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

One difference is that the RNA was extracted after 2 and 5 days because is a time course experiment. Should I use sva?

Thanks

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

The 2 vs 5 difference in addition to condition was encoded in color I thought. I can't see the legend anyway.

SVA would be for finding differences *other* than the design.

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

I used the condition merging the treatment and the time as above, for example control2, control5 ecc. In this case sva could remove batch effect that I am not aware? Thank you.

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

yes. See our workflow for example

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, I would like to generate a PCA after to have used sva, I used this code:

dds = DESeqDataSetFromTximport(txi, cond, ~cond)
dds = dds[rowSums(counts(dds)) > 0, ]
dds = estimateSizeFactors(dds)

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~cond, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
#Number of significant surrogate variables is:  3

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + cond

rldBlindTrue <- rlog(ddssva, blind=T)

plotPCA(rldBlindTrue, intgroup=c("cond"))

but the PCA is the same as without using sva, is it correct the generation of the new PCA?

Thank you

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

rlog does not remove variation associated with the design, for that you can use limma's removeBatchEffect() on the matrix in assay(rld).

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Thanks, is this the correct way to use the information from DESeq2 and SVA in limma?

rldBlindTrueCounts = assay(rldBlindTrue)
rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData))

Thanks

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

Sorry I think that this is more correct:

rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))

Since my ddsva@colData is:

cond SV1 SV2 SV3

Sample-6 control2 -0.2427044 -0.12319065 0.36255386

Sample-7 control2 -0.2951574 0.01406966 0.20129016

and using

as.matrix(ddssva@colData)

converts the factor in this way:

cond SV1 SV2 SV3

Sample-6 1 -0.242704406 -0.12319065 0.36255386

Sample-7 1 -0.295157408 0.01406966 0.20129016

and now the PCA looks better. Is it correct what I wrote?

Thank you.

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

yes

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Thank you!

If I wanted the read counts normalized without batch effects could I use this command?

removeBatchEffect(counts(dds, normalized = TRUE), covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))

Thank you

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

Yes

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Hi I think that there is a problem because some values are negative, what could be the problem?

I have also another question regarding this code:

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~cond, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + cond

rldBlindTrue <- rlog(ddssva, blind=T)
rldBlindTrueCounts = assay(rldBlindTrue)
rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))

Since the SV variables are calculated using the normalized read counts is it correct to use removeBatchEffect with the matrix that come from the function rlog and the SV variable that come from the matrix of the normalized read counts?

Thank you

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

Hi Michael, I think that I found the problem, removeBatchEffect wants the log counts so using the log2 of the counts normalized by DESeq2 it seems work, do you agree?

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

You can use either rlog() values, vst() values, or simply log2 normalized counts with removeBatchEffect() and using the SV calculated from normalized counts

ADD REPLY • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, thank you for your help. My WT2 and WT5 are my controls at different times but what if later I will be only interested in the comparison of, for example, A2 vs B2 and A5 vs B5? Should I perform again the analysis excluding WT2 and WT5 from the table of counts used by DESeq2?

Or is it better to leave these samples, even if they will be not used in the comparison, because they add useful information to estimate the dispersion of the genes?

Thank you.

ADD REPLY • link 8.7 years ago ribioinfo ▴ 100

0

Entering edit mode

This is one of the FAQ in the vignette

ADD REPLY • link 8.7 years ago Michael Love 43k

score 0 · Answer 1 · 2017-05-25

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 20 hours ago

United States

Check the vignette section on interactions, there is a diagram of how interaction terms work. This is the 30 vs 0 difference for WT, the mut vs WT difference at time 0, and the interaction for mut and time 15.

ADD COMMENT • link 8.7 years ago Michael Love 43k