deseq2 course experiment
1
0
Entering edit mode
ribioinfo ▴ 100
@ribioinfo-9434
Last seen 21 months ago

Hello,

Could you please explain me what is the meaning of these results obtained by resultsNames(ddsTC):

minute_30_vs_0

strain_mut_vs_wt

strainmut.minute15

I tried to read the ?results but it did not help me.

Thank you.

deseq2 • 1.7k views
0
Entering edit mode

Thank you!

I have some difficulties to choose the correct model to use in DESeq2, this is the condition table:

 condition time WT 2 WT 2 WT 2 A 2 A 2 A 2 A 2 B 2 B 2 B 2 B 2 WT 5 WT 5 WT 5 A 5 A 5 A 5 A 5 B 5 B 5 B 5 B 5

I do not have a time 0 and the experiment can be summarized in this way:

WT is the wild type animal, A is the WT animal in which one gene was inserted in the genome and B is the WT animal in which a gene (different from A) was inserted in the genome. A molecule M was given to the animals in order to express the gene in A and in B. M was given to all the animals and after 2 days WT, A and B were sacrificed while other animals WT, A and B were treated for other 3 days (so 5 days) with M and after 5 days these animals were sacrificed.

1)Is this experiment a time course experiment that I have to analyzed as written here: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments ?

2) or should I use as model: ~ condition + time?

3) Or should I use this condition table?

 condition WT2 WT2 WT2 A2 A2 A2 A2 B2 B2 B2 B2 WT5 WT5 WT5 A5 A5 A5 A5 B5 B5 B5 B5

4) Is it important the order of the variable in the model?

For example ~ condition + time and ~ condition + time + condition:time are different from ~ condition:time + time + condition?

5) I would really appreciate if you could briefly explain me what DESeq2 does when I use the models that I proposed in 1), 2) or 3) an maybe could be also helpful for other people.

Thank you very much for your time.

0
Entering edit mode

It's easiest to provide help if you can state what you want to test. What kind of comparisons do you want to make?

The section of the vignette where I explain interactions with figures is the best I can do in terms of explaining interaction models in a general manner. I can hopefully point you on the right track if you let me know what kind of comparisons you are interested in here. In general, I also recommend investigators who are new to linear models to partner with local statisticians who can help to explain and interpret different linear models and contrasts.

0
Entering edit mode

Thanks Michael!

This is is what I would like to test:

A 2 days vs WT 2 days

A 5 days vs WT 5 days

B 2 days vs WT 2 days

B 5 days vs WT 5 days

A 2 days vs B 2 days

A 5 days vs B 5 days

Since I am interested in the pvalue and fold change of these specific comparison I have to use the Wald test after the LRT test as explained here: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments because LRT tests the differences considering all the time points and all the conditions together, right?

I read in the vignette https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions    that is possible to use a condition table like that:

 condition WT2 WT2 WT2 A2 A2 A2 A2 B2 B2 B2 B2 WT5 WT5 WT5 A5 A5 A5 A5 B5 B5 B5 B5

and then use the contrast to do the comparisons instead to use the interactions. Is this solution like the first solution that uses interaction, LRT and then Wald test?

Thank you.

0
Entering edit mode

You're right. For those comparisons, it's easiest to combine the condition and time into a factor and use a design of ~condition and then make these comparisons with the 'contrast' argument, e.g.:

results(dds, contrast=c("condition", "A2", "WT2"))
0
Entering edit mode

Thanks Michael, I created the model as you suggested me and I performed a PCA rlog blind true:

http://imgur.com/a/cVAXB

As you can see Sample 6, 7 and should be near but 8 is in another group and also Sample 16, 17, 18 and 19 are split in two groups. Is it better to remove the replicates and use only two replicates or is it better to have more replicates even if they are not very similar?

Thank you

0
Entering edit mode

I wouldn't recommend removing samples when you have so few. Something seems to be going on with PC2, maybe batch related, but that's not a good reason to toss samples.

0
Entering edit mode

Thanks. When is it advisable to remove samples?

0
Entering edit mode

That's a really good question, hard to answer briefly.

I look for technical justification either in sequencing QC measures, or the sample is deviating from all the rest in a PCA plot, etc.

Here, the samples are not deviating away from all the others, only from the others within their biological condition. To me that implies either batches, or a potential sample swap, but I'd want to confirm the latter through other avenues, not just assuming it was a sample swap from the PCA alone. Do you know about batches?

0
Entering edit mode

One difference is that the RNA was extracted after 2 and 5 days because is a time course experiment. Should I use sva?

Thanks

0
Entering edit mode

The 2 vs 5 difference in addition to condition was encoded in color I thought. I can't see the legend anyway.

SVA would be for finding differences *other* than the design.

0
Entering edit mode
I used the condition merging the treatment and the time as above, for example control2, control5 ecc. In this case sva could remove batch effect that I am not aware? Thank you.
0
Entering edit mode
yes. See our workflow for example
0
Entering edit mode

Hi Michael, I would like to generate a PCA after to have used sva, I used this code:

dds = DESeqDataSetFromTximport(txi, cond, ~cond)
dds = dds[rowSums(counts(dds)) > 0, ]
dds = estimateSizeFactors(dds)

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~cond, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
#Number of significant surrogate variables is:  3

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + cond

rldBlindTrue <- rlog(ddssva, blind=T)

plotPCA(rldBlindTrue, intgroup=c("cond"))

but the PCA is the same as without using sva, is it correct the generation of the new PCA?

Thank you

0
Entering edit mode

rlog does not remove variation associated with the design, for that you can use limma's removeBatchEffect() on the matrix in assay(rld).

0
Entering edit mode

Thanks, is this the correct way to use the information from DESeq2 and SVA in limma?

rldBlindTrueCounts = assay(rldBlindTrue)
rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData))

Thanks

0
Entering edit mode

Sorry I think that this is more correct:

rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))  Since my ddsva@colData is: cond SV1 SV2 SV3 <factor> <numeric> <numeric> <numeric> Sample-6 control2 -0.2427044 -0.12319065 0.36255386 Sample-7 control2 -0.2951574 0.01406966 0.20129016 and using as.matrix(ddssva@colData) converts the factor in this way: cond SV1 SV2 SV3 Sample-6 1 -0.242704406 -0.12319065 0.36255386 Sample-7 1 -0.295157408 0.01406966 0.20129016 and now the PCA looks better. Is it correct what I wrote? Thank you. ADD REPLY 0 Entering edit mode yes ADD REPLY 0 Entering edit mode Thank you! If I wanted the read counts normalized without batch effects could I use this command? removeBatchEffect(counts(dds, normalized = TRUE), covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData$SV3))


Thank you

0
Entering edit mode
Yes
0
Entering edit mode

Hi I think that there is a problem because some values are negative, what could be the problem?

I have also another question regarding this code:

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~cond, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + cond

rldBlindTrue <- rlog(ddssva, blind=T)
rldBlindTrueCounts = assay(rldBlindTrue)
rldBlindTrueRemBatch = removeBatchEffect(rldBlindTrueCounts, covariates = as.matrix(ddssva@colData$SV1, ddssva@colData$SV2, ddssva@colData\$SV3))

Since the SV variables are calculated using the normalized read counts is it correct to use removeBatchEffect with the matrix that come from the function rlog and the SV variable that come from the matrix of the normalized read counts?

Thank you

0
Entering edit mode

Hi Michael, I think that I found the problem, removeBatchEffect wants the log counts so using the log2 of the counts normalized by DESeq2 it seems work, do you agree?

0
Entering edit mode

You can use either rlog() values, vst() values, or simply log2 normalized counts with removeBatchEffect() and using the SV calculated from normalized counts

0
Entering edit mode

Hi Michael, thank you for your help. My WT2 and WT5 are my controls at different times but what if later I will be only interested in the comparison of, for example, A2 vs B2 and A5 vs B5? Should I perform again the analysis excluding WT2 and WT5 from the table of counts used by DESeq2?

Or is it better to leave these samples, even if they will be not used in the comparison, because they add useful information to estimate the dispersion of the genes?

Thank you.

0
Entering edit mode

This is one of the FAQ in the vignette

0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Check the vignette section on interactions, there is a diagram of how interaction terms work. This is the 30 vs 0 difference for WT, the mut vs WT difference at time 0, and the interaction for mut and time 15.