Search
Question: Paired Difference Analysis
0
10 months ago by
nhua0
University of Texas at Austin
nhua0 wrote:

Hello,

I have 79 samples composed of volunteers who received 5 different treatments. The data resembles the following:

 Status Treatment A02BG BG Aq A02AG AG Aq C03BG BG Cr C03AG AG Cr

I wanted to answer the question if the difference between the BG and AG of a treatment significantly different from that of another treatment. Is there a way to do a paired difference analysis using deseq?

I've used the following to answer a similar question: For each treatment, is there a significant difference of composition between the BG and AG samples?

dd_tb$group = factor(paste0(dd_tb$Treatment, dd_tb$Status)) design(dd_tb) = ~ group aq_res = results(dd_tb, contrast = c("group", "AqAG", "AqBG")) cr_res = results(dd_tb, contrast = c("group", "CrAG", "CrBG")) But this question compared the status differences per treatment. Are there any suggestions regarding how I can test if there's a difference among the BG and AG (status) differences between treatments? Yours, Nina ADD COMMENTlink modified 9 months ago • written 10 months ago by nhua0 Thanks! Also clarifying an earlier question, is there a need to filter out samples to control for the donors? I would think so because the gene expression should be compared between the same individuals. So, if Treatment Aq had 10 donors and Treatment Cr had 20 donors, and they only shared 10 of the same donors, as a filtering parameter should I only include the 10 donors that the two treatments shared when I create the DESeq2 object? ADD REPLYlink written 9 months ago by nhua0 You have a number of options: you can limit to just the samples that are paired and include the pair information in the design (you'll have to read up on the vignette section on this, there are also many support site posts on this topic), or you can use limma's duplicateCorrelation to include all samples, and specify those of which are paired (that is separate from DESeq2 entirely, we don't have this functionality), or you can use DESeq2 but not control for pair. This analysis choice is up to you, or you can consult with a local statistician to assist you further. ADD REPLYlink written 9 months ago by Michael Love18k 0 10 months ago by Michael Love18k United States Michael Love18k wrote: How many samples per treatment? Which treatments do you want to compare? All 5 not equal? Or specific pairs? ADD COMMENTlink written 10 months ago by Michael Love18k Hi Michael, These are the total samples we have per treatment. It probably doesn't help that not all of the same samples are found across all of the treatments. All of the treatments contain the same 8 samples, so would we only be able to include those samples in the analysis? The null hypothesis is that there is zero effect of using any one of the treatments on the OTU count data, so we would be testing all five of the treatments.  Treatment Total Samples A 14 C 18 E 19 T 16 W 12 Thanks for getting back with me so quickly! Yours, Nina ADD REPLYlink written 10 months ago by nhua0 I don't see (or can't guess) what is the sample information in the original table you posted. Do you also want to model sample-level variability? ADD REPLYlink written 10 months ago by Michael Love18k Hi Michael, The previous table laid out the different number of samples I had per treatment. They all varied because certain samples weren’t able to get sequenced. If I were to compare the difference of statuses (BG and AG) between the samples of treatments A and C, would I be able to since treatment C has a greater amount of sequenced samples? Yours, Nina ADD REPLYlink written 9 months ago by nhua0 I'm not following where you keep track of the donors, but you can use a design of ~treatment + treatment:status. For the effect in one group, use: results(dds, name="TreatmentCr.StatusBG") and then to do a comparison of the status effect between two groups, use: results(dds, contrast=list("TreatmentCr.StatusBG","TreatmentAq.StatusBG")) ADD REPLYlink written 9 months ago by Michael Love18k I see, but is there a way we could contrast (TreatmentCr.StatusBG, TreatmentCr.StatusAG) to (TreatmentAq.StatusBG, TreatmentAq.StatusAG)? I guess a difference of differences would be a more fitting analysis. ADD REPLYlink modified 9 months ago • written 9 months ago by nhua0 The two results() calls I show above are the BG vs AG effect in one treatment group, and the BG vs AG effect compared across two treatment groups. Take a look at the interactions section of the DESeq2 vignette which has a diagram of how interaction terms work. ADD REPLYlink written 9 months ago by Michael Love18k Referring to the DESeq2 vignette, for gene 2, using the “genotype:condition” interaction term means that the results function should return the difference between the condition effect of genotype II and condition effect for genotype III. And condition effect refers to the difference between condition A and B on that genotype? ADD REPLYlink written 9 months ago by nhua0 yes. if you have ~genotype + condition + genotype:condition, then you get interaction terms for each level of genotype other than the reference level. if you have ~genotype + genotype:condition, you get interaction terms for all levels of genotype. And these are the different condition effects for each genotype. If you go back to your experiment, with the design I specified, you will have the BG vs AG effect in each treatment group as interaction terms. You can build a results table for the BG vs AG effect in each treatment group, or you can compare them to get the BG vs AG effect compared across two treatment groups. This is done with the design above and the results() calls above. ADD REPLYlink written 9 months ago by Michael Love18k Another question. Earlier, I created this design using grouping: dd_tb <- phyloseq_to_deseq2(paired_joined_ps, design = ~ Status + Treatment) dd_tb$group <- factor(paste0(dd_tb$Treatment, dd_tb$Status))
design(dd_tb) <- ~ group

Do the results from this design match the results from:

results(dds, name="TreatmentCr.StatusBG")

Yes if you use the latest version of DESeq2, some of the comparisons can be made easily using the ~group design, but other comparisons (differences in status across treatment) are more difficult to formulate with ~group.

Also clarifying an earlier question, is there a need to filter out samples to control for the the donors? I would think so because the gene expression should be compared between the same individuals. So, if Treatment Aq had 10 donors and Treatment Cr had 20 donors, and they only shared 10 of the same donors, as a filtering parameter should I only include the 10 donors that the two treatments shared when I create the DESeq2 object?

I moved the previous answer to a comment on top level post and answered it above.

Hello,

I wanted to see what the results would be comparing the design that you posted to the results outputted by the group design. They’re the same values, but the log2FoldChange and stat columns are of the opposite signs. I think it’s because the grouping design compares “Crafter vs Crbefore“ while the other design compares “Crbefore vs Crafter”. Is there a way to specify the design in “design = ~Treatment + Treatment:Status” to perform a comparison to match the grouping design?

Group Design:
dd_tb <- phyloseq_to_deseq2(paired_joined_ps, design = ~ Status + Treatment)
dd_tb$group <- factor(paste0(dd_tb$Treatment, dd_tb\$Status))
design(dd_tb) <- ~ group

> results(dd_tb, contrast = c("group","Crafter", "Crbefore"))
log2 fold change (MLE): group Crafter vs Crbefore
Wald test p-value: group Crafter vs Crbefore
DataFrame with 1141 rows and 6 columns
baseMean log2FoldChange     lfcSE       stat
<numeric>      <numeric> <numeric>  <numeric>
4372795                        0.4172578      0.6003105 5.3786651  0.1116096

Design:
dds = phyloseq_to_deseq2(paired_joined_ps, design = ~Treatment + Treatment:Status)

> results(dds, name="TreatmentCr.Statusbefore")
log2 fold change (MLE): TreatmentCr.Statusbefore
Wald test p-value: TreatmentCr.Statusbefore
DataFrame with 1141 rows and 6 columns
baseMean log2FoldChange     lfcSE       stat
<numeric>      <numeric> <numeric>  <numeric>
4372795                        0.4172578     -0.6002659 5.3785485 -0.1116037  

Yes, you should level the factors so that the reference level is as you prefer. See the vignette "Note on factor levels"

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels

Good afternoon Dr. Love,

I tried my hand at using Limma, then metagenomeSeq since the fitZig function also allows for duplicate correlation. However, I had a few issues with using that package, so I decided to only include samples found in all of the treatments. However, when I pruned the samples from the phyloseq object, and ran

phyloseq_to_deseq2(inclusive_joined_ps, design = ~treatment + treatment:status)

Error in DESeqDataSet(se, design = design, ignoreRank) :
design contains one or more variables with all samples having the same value,
remove these variables from the design