Question

Paired Difference Analysis

0

Entering edit mode

nhua • 0

@nhua-13743

Last seen 6.1 years ago

University of Texas at Austin

Hello,

I have 79 samples composed of volunteers who received 5 different treatments. The data resembles the following:

	Status	Treatment
A02BG	BG	Aq
A02AG	AG	Aq
C03BG	BG	Cr
C03AG	AG	Cr

I wanted to answer the question if the difference between the BG and AG of a treatment significantly different from that of another treatment. Is there a way to do a paired difference analysis using deseq?

I've used the following to answer a similar question: For each treatment, is there a significant difference of composition between the BG and AG samples?

dd_tb$group = factor(paste0(dd_tb$Treatment, dd_tb$Status))

design(dd_tb) = ~ group 

aq_res = results(dd_tb, contrast = c("group", "AqAG", "AqBG"))

cr_res = results(dd_tb, contrast = c("group", "CrAG", "CrBG"))

But this question compared the status differences per treatment. Are there any suggestions regarding how I can test if there's a difference among the BG and AG (status) differences between treatments?

Yours,

Nina

deseq2 R differential gene expression • 1.9k views

ADD COMMENT • link 6.7 years ago • updated 6.6 years ago nhua • 0

0

Entering edit mode

Thanks!

Also clarifying an earlier question, is there a need to filter out samples to control for the donors? I would think so because the gene expression should be compared between the same individuals. So, if Treatment Aq had 10 donors and Treatment Cr had 20 donors, and they only shared 10 of the same donors, as a filtering parameter should I only include the 10 donors that the two treatments shared when I create the DESeq2 object?

ADD REPLY • link 6.6 years ago nhua • 0

0

Entering edit mode

You have a number of options: you can limit to just the samples that are paired and include the pair information in the design (you'll have to read up on the vignette section on this, there are also many support site posts on this topic), or you can use limma's duplicateCorrelation to include all samples, and specify those of which are paired (that is separate from DESeq2 entirely, we don't have this functionality), or you can use DESeq2 but not control for pair. This analysis choice is up to you, or you can consult with a local statistician to assist you further.

ADD REPLY • link 6.6 years ago Michael Love 41k

score 0 · Answer 1 · 2017-08-15

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 23 hours ago

United States

How many samples per treatment?

Which treatments do you want to compare? All 5 not equal? Or specific pairs?

ADD COMMENT • link 6.7 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

These are the total samples we have per treatment. It probably doesn't help that not all of the same samples are found across all of the treatments. All of the treatments contain the same 8 samples, so would we only be able to include those samples in the analysis? The null hypothesis is that there is zero effect of using any one of the treatments on the OTU count data, so we would be testing all five of the treatments.

Treatment	Total Samples
A	14
C	18
E	19
T	16
W	12

Thanks for getting back with me so quickly!

Yours,

Nina

ADD REPLY • link 6.7 years ago nhua • 0

0

Entering edit mode

I don't see (or can't guess) what is the sample information in the original table you posted. Do you also want to model sample-level variability?

ADD REPLY • link 6.7 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

The previous table laid out the different number of samples I had per treatment. They all varied because certain samples weren’t able to get sequenced. If I were to compare the difference of statuses (BG and AG) between the samples of treatments A and C, would I be able to since treatment C has a greater amount of sequenced samples?

Yours,

Nina

ADD REPLY • link 6.6 years ago nhua • 0

0

Entering edit mode

I'm not following where you keep track of the donors, but you can use a design of ~treatment + treatment:status.

For the effect in one group, use:

results(dds, name="TreatmentCr.StatusBG")

and then to do a comparison of the status effect between two groups, use:

results(dds, contrast=list("TreatmentCr.StatusBG","TreatmentAq.StatusBG"))

ADD REPLY • link 6.6 years ago Michael Love 41k

0

Entering edit mode

I see, but is there a way we could contrast (TreatmentCr.StatusBG, TreatmentCr.StatusAG) to (TreatmentAq.StatusBG, TreatmentAq.StatusAG)? I guess a difference of differences would be a more fitting analysis.

ADD REPLY • link 6.6 years ago nhua • 0

0

Entering edit mode

The two results() calls I show above are the BG vs AG effect in one treatment group, and the BG vs AG effect compared across two treatment groups. Take a look at the interactions section of the DESeq2 vignette which has a diagram of how interaction terms work.

ADD REPLY • link 6.6 years ago Michael Love 41k

0

Entering edit mode

Referring to the DESeq2 vignette, for gene 2, using the “genotype:condition” interaction term means that the results function should return the difference between the condition effect of genotype II and condition effect for genotype III. And condition effect refers to the difference between condition A and B on that genotype?

ADD REPLY • link 6.6 years ago nhua • 0

0

Entering edit mode

yes.

if you have ~genotype + condition + genotype:condition, then you get interaction terms for each level of genotype other than the reference level.

if you have ~genotype + genotype:condition, you get interaction terms for all levels of genotype. And these are the different condition effects for each genotype.

If you go back to your experiment, with the design I specified, you will have the BG vs AG effect in each treatment group as interaction terms. You can build a results table for the BG vs AG effect in each treatment group, or you can compare them to get the BG vs AG effect compared across two treatment groups. This is done with the design above and the results() calls above.

ADD REPLY • link 6.6 years ago Michael Love 41k

0

Entering edit mode

Another question. Earlier, I created this design using grouping:

dd_tb <- phyloseq_to_deseq2(paired_joined_ps, design = ~ Status + Treatment)
dd_tb$group <- factor(paste0(dd_tb$Treatment, dd_tb$Status))
design(dd_tb) <- ~ group

Do the results from this design match the results from:

results(dds, name="TreatmentCr.StatusBG")

ADD REPLY • link 6.6 years ago nhua • 0

0

Entering edit mode

Yes if you use the latest version of DESeq2, some of the comparisons can be made easily using the ~group design, but other comparisons (differences in status across treatment) are more difficult to formulate with ~group.

ADD REPLY • link 6.6 years ago Michael Love 41k

0

Entering edit mode

Also clarifying an earlier question, is there a need to filter out samples to control for the the donors? I would think so because the gene expression should be compared between the same individuals. So, if Treatment Aq had 10 donors and Treatment Cr had 20 donors, and they only shared 10 of the same donors, as a filtering parameter should I only include the 10 donors that the two treatments shared when I create the DESeq2 object?

ADD REPLY • link 6.6 years ago nhua • 0

0

Entering edit mode

I moved the previous answer to a comment on top level post and answered it above.

ADD REPLY • link 6.6 years ago Michael Love 41k

0

Entering edit mode

Hello,

I wanted to see what the results would be comparing the design that you posted to the results outputted by the group design. They’re the same values, but the log2FoldChange and stat columns are of the opposite signs. I think it’s because the grouping design compares “Crafter vs Crbefore“ while the other design compares “Crbefore vs Crafter”. Is there a way to specify the design in “design = ~Treatment + Treatment:Status” to perform a comparison to match the grouping design?

Group Design:
dd_tb <- phyloseq_to_deseq2(paired_joined_ps, design = ~ Status + Treatment)
dd_tb$group <- factor(paste0(dd_tb$Treatment, dd_tb$Status))
design(dd_tb) <- ~ group

> results(dd_tb, contrast = c("group","Crafter", "Crbefore"))
log2 fold change (MLE): group Crafter vs Crbefore 
Wald test p-value: group Crafter vs Crbefore 
DataFrame with 1141 rows and 6 columns
                                baseMean log2FoldChange     lfcSE       stat          
                               <numeric>      <numeric> <numeric>  <numeric>   
4372795                        0.4172578      0.6003105 5.3786651  0.1116096   


Design: 
dds = phyloseq_to_deseq2(paired_joined_ps, design = ~Treatment + Treatment:Status)

> results(dds, name="TreatmentCr.Statusbefore")
log2 fold change (MLE): TreatmentCr.Statusbefore 
Wald test p-value: TreatmentCr.Statusbefore 
DataFrame with 1141 rows and 6 columns
                                baseMean log2FoldChange     lfcSE       stat     
                               <numeric>      <numeric> <numeric>  <numeric>  
4372795                        0.4172578     -0.6002659 5.3785485 -0.1116037

ADD REPLY • link 6.5 years ago nhua • 0

0

Entering edit mode

Yes, you should level the factors so that the reference level is as you prefer. See the vignette "Note on factor levels"

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels

ADD REPLY • link 6.5 years ago Michael Love 41k

0

Entering edit mode

Good afternoon Dr. Love,

I tried my hand at using Limma, then metagenomeSeq since the fitZig function also allows for duplicate correlation. However, I had a few issues with using that package, so I decided to only include samples found in all of the treatments. However, when I pruned the samples from the phyloseq object, and ran

phyloseq_to_deseq2(inclusive_joined_ps, design = ~treatment + treatment:status)

I received this error:

Error in DESeqDataSet(se, design = design, ignoreRank) : 
  design contains one or more variables with all samples having the same value,
  remove these variables from the design

ADD REPLY • link 6.4 years ago nhua • 0

0

Entering edit mode

So... this naturally leads to the question: does the sentence make sense with respect to the variables you are working with?

ADD REPLY • link 6.4 years ago Michael Love 41k