Question: DESeq2 - paired sample, interaction and fold change
1
5.1 years ago by
France
samuel collombet130 wrote:
Dear all, I would like to make an analysis with a multilevel design with an interaction therm, with paired samples. If I have 2 patients, for which I have "tumour" and "control" samples, and for each I have them treated with treatment A or B. I want to know if treatment A as a specific effect on the tumor versus WT. As a toy example: > dds <- makeExampleDESeqDataSet(n = 1000, m = 8) > dds$patient=c("1","1","1","1","2","2","2","2") > dds$condition=c("WT","WT","tumor","tumor","WT","WT","tumor","tumor") > dds$treatment=c("A","Ct","A","Ct","A","Ct","A","Ct") > dds <- DESeqDataSet(dds, design= ~ patient + condition*treatment ) > dds <- DESeq(dds) for which I got: > colData(dds) DataFrame with 8 rows and 5 columns sample condition patient treatment sizeFactor <character> <factor> <factor> <factor> <numeric> sample1 sample1 WT 1 A 1.047160 sample2 sample2 WT 1 Ct 1.017620 sample3 sample3 tumor 1 A 1.056423 sample4 sample4 tumor 1 Ct 1.046753 sample5 sample5 WT 2 A 1.014201 sample6 sample6 WT 2 Ct 1.059479 sample7 sample7 tumor 2 A 1.035853 sample8 sample8 tumor 2 Ct 1.013360 And: > resultsNames(dds) [1] "Intercept" "patient_2_vs_1" "condition_WT_vs_tumor" "treatment_Ct_vs_A" "conditionWT.treatmentCt" If I do: > res <- results(dds, name="conditionWT.treatmentCt") I will get the gene for which there is a specific effect of treatment in the tumor, am I right? I understood that DESeq2 does make test taking into consideration paired samples, if the pairing information is put as a factor in the design formula (I understood it from http://seqanswers.com/forums/archive/index.php/t-34614.html, post by simon anders, 10-21-2013, 12:28 AM). So in the example above, the fold change would be calculated for each patients, and the test made on these fold changes, I am wrong? Is it possible to get the individual fold change for each patients? Thanks, Samuel deseq2 • 6.9k views ADD COMMENTlink modified 5.1 years ago by Michael Love24k • written 5.1 years ago by samuel collombet130 Answer: DESeq2 - paired sample, interaction and fold change 0 5.1 years ago by Michael Love24k United States Michael Love24k wrote: hi Samuel, On Wed, Jun 4, 2014 at 4:38 AM, samuel collombet <samuelcollombet at="" gmail.com=""> wrote: > Dear all, > > I would like to make an analysis with a multilevel design with an > interaction therm, with paired samples. > > If I have 2 patients, for which I have "tumour" and "control" samples, and > for each I have them treated with treatment A or B. I want to know if > treatment A as a specific effect on the tumor versus WT. As a toy example: >> dds <- makeExampleDESeqDataSet(n = 1000, m = 8) >> dds$patient=c("1","1","1","1","2","2","2","2") >> dds$condition=c("WT","WT","tumor","tumor","WT","WT","tumor","tumor") >> dds$treatment=c("A","Ct","A","Ct","A","Ct","A","Ct") Maybe you left this out because it's a toy example, but we recommend you use dds$treatment=factor(c("A","Ct","A","Ct","A","Ct","A","Ct"), levels=c("Ct","A")), so that the comparisons will be treatment / control. Also with WT / tumor. >> dds <- DESeqDataSet(dds, design= ~ patient + condition*treatment ) >> dds <- DESeq(dds) > > for which I got: >> colData(dds) > DataFrame with 8 rows and 5 columns > sample condition patient treatment sizeFactor > <character> <factor> <factor> <factor> <numeric> > sample1 sample1 WT 1 A 1.047160 > sample2 sample2 WT 1 Ct 1.017620 > sample3 sample3 tumor 1 A 1.056423 > sample4 sample4 tumor 1 Ct 1.046753 > sample5 sample5 WT 2 A 1.014201 > sample6 sample6 WT 2 Ct 1.059479 > sample7 sample7 tumor 2 A 1.035853 > sample8 sample8 tumor 2 Ct 1.013360 > > And: >> resultsNames(dds) > [1] "Intercept" "patient_2_vs_1" "condition_WT_vs_tumor" > "treatment_Ct_vs_A" "conditionWT.treatmentCt" > > If I do: >> res <- results(dds, name="conditionWT.treatmentCt") > > I will get the gene for which there is a specific effect of treatment in the > tumor, am I right? yes (except for a sign change, see note above on factor levels). > > I understood that DESeq2 does make test taking into consideration paired > samples, if the pairing information is put as a factor in the design formula > (I understood it from > http://seqanswers.com/forums/archive/index.php/t-34614.html, post by simon > anders, 10-21-2013, 12:28 AM). yes, by including the patient variable in the design, differences in counts which are due to patient differences are accounted for. > > So in the example above, the fold change would be calculated for each > patients, and the test made on these fold changes, I am wrong? The test above is on the specific effect of treatment in the tumor, accounting for patient differences. > Is it possible to get the individual fold change for each patients? Can you be more specific on which fold change? Are you interested in the tumor-treatment interaction term for each patient? How many replicates do you have for each combination of patient x condition x treatment? If not many, I'd recommend sticking to the general effect above. Mike > > Thanks, > Samuel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ADD COMMENTlink written 5.1 years ago by Michael Love24k Thanks Michael! On 05/06/2014 01:49, Michael Love wrote: > Maybe you left this out because it's a toy example, but we recommend > you use dds$treatment=factor(c("A","Ct","A","Ct","A","Ct","A","Ct"), > levels=c("Ct","A")), so that the comparisons will be treatment / > control. Also with WT / tumor. Thanks for the reminder, I did not understood it, now I got why sometimes I got the inverse of what I wanted... >> If I do: >>> res <- results(dds, name="conditionWT.treatmentCt") >> I will get the gene for which there is a specific effect of treatment in the tumor, am I right? > yes (except for a sign change, see note above on factor levels). > >> I understood that DESeq2 does make test taking into consideration paired samples, if the pairing information is put as a factor in the design formula (I understood it from http://seqanswers.com/forums/archive/index.php/t-34614.html, post by simon anders, 10-21-2013, 12:28 AM). > yes, by including the patient variable in the design, differences in > counts which are due to patient differences are accounted for. I am still not sure I get it, certainly because I am not familiar with expanded design matrices... the contrast "conditionWT.treatmentCt" will define the vector of contrast (named c in the doc/paper) that, multiplied by the vector of GLM coefficient Beta, will give the vector of coefficient that will be used to calculate the logarithmic posterior, right? Is it possible to get the contrast vector / pair of contrast vector that correspond to a resultName? > >> Is it possible to get the individual fold change for each patients? > Can you be more specific on which fold change? Are you interested in the tumor-treatment interaction term for each patient? yes > How many replicates do you have for each combination of patient x condition x treatment? 3 > If not many, I'd recommend sticking to the general effect above. We suspect that not all sample respond to the treatment, and so we would like to know if the 'non-significance' of some genes is due to global non-response, or 1 sample that is not responding at all (or if they are all responding differently, so the variability is too high to be significant). I can get this kind of information by getting the rlogTransformation of the counts and comparing the different samples, but these will not be the exact values used for the test... I would like to be able to extract these last ones. Many thanks!
hi Samuel, On Thu, Jun 5, 2014 at 4:35 AM, samuel collombet <samuelcollombet at="" gmail.com=""> wrote: > >> yes, by including the patient variable in the design, differences in >> counts which are due to patient differences are accounted for. > > I am still not sure I get it, certainly because I am not familiar with > expanded design matrices... > the contrast "conditionWT.treatmentCt" will define the vector of contrast > (named c in the doc/paper) that, multiplied by the vector of GLM coefficient > Beta, will give the vector of coefficient that will be used to calculate the > logarithmic posterior, right? > Is it possible to get the contrast vector / pair of contrast vector that > correspond to a resultName? > For the interaction terms, like 'conditionWT.treatmentCt', there is a single coefficient in the model, so we don't have to build a contrast. We just pull out the coefficient and it's standard error. When we want to contrast multiple terms, then the results() function creates a vector 'c'. The GLM coefficient vector beta is the maximum a posterior estimates. c * beta gives the numerator of the contrast as describes in ?results. If you supply a numeric contrast, the numbers should correspond to the terms in resultsNames. This is described in ?results and there are useful examples there. >> >>> Is it possible to get the individual fold change for each patients? >> >> Can you be more specific on which fold change? Are you interested in the >> tumor-treatment interaction term for each patient? > > yes > >> How many replicates do you have for each combination of patient x >> condition x treatment? > > 3 > >> If not many, I'd recommend sticking to the general effect above. > > We suspect that not all sample respond to the treatment, and so we would > like to know if the 'non-significance' of some genes is due to global > non-response, or 1 sample that is not responding at all (or if they are all > responding differently, so the variability is too high to be significant). > I can get this kind of information by getting the rlogTransformation of the > counts and comparing the different samples, but these will not be the exact > values used for the test... I would like to be able to extract these last > ones. > If you have replicates for each combination of patient, treatment and condition (I'd recommend at least 3), then you can use a model with 2nd order interactions. You will have to turn off the betaPrior, as we do not now support shrinkage of log fold changes for models with 2nd or higher order interactions. Here is some example code: # construct an example dataset with 8 combinations x 3 replicates each dds <- makeExampleDESeqDataSet(n = 1000, m = 24) dds$patient <- factor(rep(c("1","2"),each=12),levels=c("1","2")) dds$condition <- factor(rep(rep(c("WT","tumor"),each=6),2),levels=c("WT","tumor")) dds\$treatment <- factor(rep(rep(c("Ct","A"),each=3),4),levels=c("Ct","A")) as.data.frame(colData(dds)) # allow interactions between all factors design(dds) <- ~ patient*condition*treatment # turn off beta prior dds <- DESeq(dds, betaPrior=FALSE) # just to visualize the model matrix in this case (mm <- unname(attr(dds,"modelMatrix"))) par(mar=c(15,1,1,1)) image(t(mm[nrow(mm):1,]),xaxt="n",yaxt="n") axis(1,0:7/7,resultsNames(dds),las=2) # the results names resultsNames(dds) # this is the interaction term for patient 1 # i.e. does treatment A have a different effect in tumor for patient 1 results(dds, name="conditiontumor.treatmentA") # this is the interaction term for patient 2 # i.e. does treatment A have a different effect in tumor for patient 2 # note: we add the interaction from before with the additional interaction for patient 2 # see ?results for details on providing the list() results(dds, contrast=list(c("conditiontumor.treatmentA", "patient2.conditiontumor.treatmentA"),character(0))) # the previous contrast is the same as constructing the following numeric contrast ctrst <- as.numeric(resultsNames(dds) %in% c("conditiontumor.treatmentA", "patient2.conditiontumor.treatmentA")) ctrst results(dds, contrast=ctrst) Mike > Many thanks!