Question

DESeq2 paired multifactor test

2

Entering edit mode

hwillenbrock ▴ 20

@hwillenbrock-6937

Last seen 11.2 years ago

Denmark

Hi,

I am wondering what the correct model formula and analysis approach is with DESeq2 for my multifactor paired design. I have tried to look at previous answers, e.g. DESEq2 comparison with two conditions and paired experiments and DESeq2 multiple factors nested design but I keep getting warnings from DESeq2, so I must be interpreting them wrong, or my design is slightly different.

What I have is 2 different tissue cell cultures, 5 replicates, that have been treated with a chemical. I have RNAseq count data for both the untreated and treated samples, so each treated vs. untreated is a pair:

sample tissue culture treatment
1 tissueX 1 unstimulated
2 tissueX 1 stimulated
3 tissueX 2 unstimulated
4 tissueX 2 stimulated
5 tissueX 3 unstimulated
6 tissueX 3 stimulated
7 tissueX 4 unstimulated
8 tissueX 4 stimulated
9 tissueX 5 unstimulated
10 tissueX 5 stimulated
11 tissueY 6 unstimulated
12 tissueY 6 stimulated
13 tissueY 7 unstimulated
14 tissueY 7 stimulated
15 tissueY 8 unstimulated
16 tissueY 8 stimulated
17 tissueY 9 unstimulated
18 tissueY 9 stimulated
19 tissueY 10 unstimulated
20 tissueY 10 stimulated

The genes that I'm interested in identifying, are those where the stimulation effect differs most between the two tissues. If not taking the paired design into account, I can use the model: ~tissue+treatment+tissue:treatment that is accepted without errors by DESeq2. However, I don't get very significant interactions and I'd like to take advantage of the paired design.

If I just analyze tissueX unstimulated versus stimulated, I can add the "culture" factor to the model and it improves the pvalues considerably: ~subject+treatment

Simply adding the subject term to the model formula as for the simple one tissue paired case, like this: ~subject + tissue+treatment+tissue:treatment gives me an error.

Trying to interpolate a previous answer to my case: ~ subject + treatment + tissue:treatment gives me the same error: "the model matrix is not full rank, so the model cannot be fit as specified."

How do I incorporate the paired design into the model formula and extract the correct significance values?

Thanks in advance,

Hanni

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] ggplot2_1.0.0 DESeq2_1.6.1
[3] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3
[5] GenomicRanges_1.18.1 GenomeInfoDb_1.2.2
[7] IRanges_2.0.0 S4Vectors_0.4.0
[9] Biobase_2.26.0 BiocGenerics_0.12.0

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.0
[4] base64enc_0.1-2 BatchJobs_1.4 BBmisc_1.7
[7] BiocParallel_1.0.0 brew_1.0-6 checkmate_1.5.0
[10] cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4
[13] DBI_0.3.1 digest_0.6.4 fail_1.2
[16] foreach_1.4.2 foreign_0.8-61 Formula_1.1-2
[19] genefilter_1.48.1 geneplotter_1.44.0 grid_3.1.0
[22] gtable_0.1.2 Hmisc_3.14-5 iterators_1.0.7
[25] labeling_0.3 lattice_0.20-29 latticeExtra_0.6-26
[28] locfit_1.5-9.1 MASS_7.3-31 munsell_0.4.2
[31] nnet_7.3-8 plyr_1.8.1 proto_0.3-10
[34] RColorBrewer_1.0-5 reshape2_1.4 rpart_4.1-8
[37] RSQLite_1.0.0 scales_0.2.4 sendmailR_1.2-1
[40] splines_3.1.0 stringr_0.6.2 survival_2.37-7
[43] tools_3.1.0 XML_3.98-1.1 xtable_1.7-4
[46] XVector_0.6.0

deseq2 paired • 17k views

ADD COMMENT • link updated 3.2 years ago by teabonng • 0 • written 11.3 years ago by hwillenbrock ▴ 20

score 5 · Answer 1 · 2014-10-28

hi Hanni,

There is a trick from the edgeR user guide (under the section "Comparisons Both Between and Within Subjects") for building a R formula for such an experiment. For the above column data, add a new column for culture:

dds$culture.nested = factor(rep(rep(1:5,each=2),2))

Make sure that unstimulated is the base level of treatment:

dds$treatment <- relevel(dds$treatment, "unstimulated")

so now the column data looks like:

     tissue culture    treatment culture.nested
 1  tissueX       1 unstimulated              1
 2  tissueX       1   stimulated              1
 3  tissueX       2 unstimulated              2
 4  tissueX       2   stimulated              2
 5  tissueX       3 unstimulated              3
 6  tissueX       3   stimulated              3
 7  tissueX       4 unstimulated              4
 8  tissueX       4   stimulated              4
 9  tissueX       5 unstimulated              5
 10 tissueX       5   stimulated              5
 11 tissueY       6 unstimulated              1
 12 tissueY       6   stimulated              1
 13 tissueY       7 unstimulated              2
 14 tissueY       7   stimulated              2
 15 tissueY       8 unstimulated              3
 16 tissueY       8   stimulated              3
 17 tissueY       9 unstimulated              4
 18 tissueY       9   stimulated              4
 19 tissueY      10 unstimulated              5
 20 tissueY      10   stimulated              5

Then use the design:

design(dds) = ~ tissue + tissue:culture.nested + tissue:treatment

To make the extraction of results simpler, specify a standard model matrix:

dds = DESeq(dds, modelMatrixType="standard")

To test for genes which respond to treatment effect in tissue X:

results(dds, name="tissuetissueX.treatmentstimulated")

To test for genes which respond to treatment effect in tissue Y:

results(dds, name="tissuetissueY.treatmentstimulated")

To test for genes where the treatment effect differs between the two tissues:

results(dds, contrast=list("tissuetissueY.treatmentstimulated", "tissuetissueX.treatmentstimulated"))

score 0 · Answer 2 · 2014-10-29

0

Entering edit mode

hwillenbrock ▴ 20

@hwillenbrock-6937

Last seen 11.2 years ago

Denmark

Hi Michael,

Thanks!

That works, but won't this make the model assume that culture 1 and 6 are the same, as if the tissue samples where taken from the same person (which they are not)?

ADD COMMENT • link 11.3 years ago hwillenbrock ▴ 20

0

Entering edit mode

It's fine because this difference is absorbed in the main tissue effect. For the comparisons you want to make, the interactions are all you need.

ADD REPLY • link 11.3 years ago Michael Love 43k

0

Entering edit mode

Hi,

I believe I have an analogous issue, however I am still not sure what the best way to proceed is and would be grateful for your help.

In my multi-factor design variables are:
treatment (before/after) / patient / disease (A/B) / response (Y/N)

For each patient I have one sample taken before treatment and one after (therefore with the same value for response and for disease).

1) The first question is to compare expression after/before treatment
I have been told by collaborators that the two diseases are expected to behave very differently and it does not make much sense to look at effects considering both together.
Thus, to keep things simple I created two different datasets and used design=~patient+ treatment in each of the them in order to extract the global effect of treatment across all patients, at the same time controlling for patient specific effects.
However, doing so I am ignoring the 'response' variable.

2) The second question is comparing responders (response=Y) vs non-responders, separately before and after treatment
One Naive solution would be to concentrate on only a subportion of the data (e.g. before treatment) and use design=~response
Instead, using all data with a design in which both patient and response variables are present gives problems with matrix rank.

Could you advise on the best design to answer both 1) and 2) ? And further clarify how the 'trick' to avoid problems with matrix rank should be implemented?

Thanks a lot!

ADD REPLY • link 10.5 years ago deut13 ▴ 80

0

Entering edit mode

1) If you want to compare expression before and after treatment, your current approach seems fine.

2) You can use the trick above to see if responders have a different treatment effect, controlling for patient differences. I would continue to split by disease to make things easier. Then follow the steps above, but for your analysis, "culture" is "patient" and "tissue" is "response".

Your design should look like ~ response + response:patient.nested + response:treatment. If you have different number of patients for response = Y and N then you'll have to do some extra work to create your own model matrix using model.matrix() and then to remove columns of the model matrix which have all zeros. See this other thread for how to do that:

C: DESeq2: Extracting interaction term

Run DESeq(dds, betaPrior=FALSE) or if you needed to create your own model matrix because the numbers of responders and non responders were different, run DESeq(dds, full=mm, betaPrior=FALSE). You will have the treatment effect in nonresponders and the treatment effect responders as the two interaction terms of treatment and response. These can be extracted with 'name' like above and contrasted with contrast=list(...) like above. The contrast tests for which genes the treatment effect was different for responders vs non-responders.

ADD REPLY • link 10.5 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thanks so much for your answer!
I would be extremely grateful if you could verify whether I implemented your suggestion correctly and clarify a few doubts regarding the extraction of results

This is how my projectinfo table looks like

sample treatment patient response patient.nested
       2   control       1        Y             1
       7   treated       1        Y             1
       6   control       2        Y             2
      10   treated      2     Y             2
       5   control       3        Y             3
      11   treated      3      Y             3
       1   control       4        N             1
       8   treated       4        N             1
       4   control       5        N             2
       9   treated       5        N             2
       3   control       6        N             3
      12   treated      6    N           3

Differential expression analysis:

> data <-DESeqDataSetFromMatrix(countData = countsTable,
+                                 colData = projectinfo,
+                                 design = ~response+response:patient.nested+response:treatment)
> diffexpr<-DESeq(data, test="Wald", betaPrior=FALSE)
> resultsNames(diffexpr)
[1] "Intercept"                  "response_Y_vs_N"            "responseN.patient.nested2"
[4] "responseY.patient.nested2" "responseN.patient.nested3" "responseY.patient.nested3"
[7] "responseN.treatmenttreated" "responseY.treatmenttreated"

The lists I want to extract

1a) genes with differential expression before/after treatment in responders
i.e. resp=Y&treatment=treated vs resp=Y&treatment=control
Is it correct that this is given by:
results(diffexpr, name="responseY.treatmenttreated") ?

1b) and analogously for non-responders (resp=N&treatment=treated vs resp=N&treatment=control) would be

results(diffexpr, name="responseN.treatmenttreated")

2) genes with differential expression responders/non-responders separately for treated and controls, i.e.

2a) resp=Y&treatment=treated vs resp=N&treatment=treated
and
2b) resp=Y&treatment=control vs resp=N&treatment=control

It is not clear to me whether I can get these out of

results(diffexpr, contrast=list("responseY.treatmenttreated", "responseN.treatmenttreated"))

I realize that these are questions related to linear modeling and not specific to your tool, so I need to improve my knowledge on the topic and hope to find a nearby statistician who can help.

Thanks a lot again for your help!

Fulvia

ADD REPLY • link 10.5 years ago • updated 10.0 years ago deut13 ▴ 80

0

Entering edit mode

1a) and 1b) yes

2a) and 2b) you cannot compare directly across response like this with a model which has fixed effects for controlling patient differences. How can you know if what you see if a change for response=Y and not specific to patient 1,2,3 in response? This you might want to discuss with a statistician if it's hard to see why you can't make this comparison.

The comparison I was referring to, where you contrast the two interaction terms, answers the question: is the treatment effect different for response=Y vs response=N.

ADD REPLY • link 10.5 years ago Michael Love 43k

0

Entering edit mode

Thanks for all the feedback! I have in the meantime been thinking about this and the importance of controlling for patient-to-patient differences.

ADD REPLY • link 10.5 years ago deut13 ▴ 80

score 0 · Answer 3 · 2014-11-20

0

Entering edit mode

alicebdennis • 0

@alicebdennis-7066

Last seen 9.7 years ago

Switzerland

Hi all,

I have a similar problem/nested design, but I think my problem is how to read in colData in the first place!

Is it possible to post your successful code in full here?

Thank you,

Alice

ADD COMMENT • link 11.2 years ago alicebdennis • 0

0

Entering edit mode

hi Alice,

You can post new questions with the "Ask question" button at the top. Your question here is posted as an answer to Hanni's question.

See this section of our DESeq2 workflow (the workflow goes at a slower pace than the vignette, explaining some base R functions as well):

"Typically, we have a table with experimental metadata for our samples. For your own project, you might create such a comma-separated value (CSV) file using a text editor or spreadsheet software such as Excel.

We load this file with read.csv. The parentheses around the last line are used to print the result in addition to storing it to the sampleTable object."

http://bioconductor.org/help/workflows/rnaseqGene/

If you didn't use summarizeOverlaps or htseq-count (i.e. you've already constructed a count matrix) you'll have to make sure that the column names of the counts matrix line up with the row names of the imported sample table.

ADD REPLY • link 11.2 years ago Michael Love 43k

score 0 · Answer 4 · 2014-11-24

Hi Michael,

One of my donor samples might be an outlier and I tried removing the samples from the analysis. This means that I now have the following samples:

    tissue culture    treatment culture.nested
 1  tissueX       1 unstimulated              1
 2  tissueX       1   stimulated              1
 3  tissueX       2 unstimulated              2
 4  tissueX       2   stimulated              2
 5  tissueX       3 unstimulated              3
 6  tissueX       3   stimulated              3
 7  tissueX       4 unstimulated              4
 8  tissueX       4   stimulated              4
 9  tissueX       5 unstimulated              5
 10 tissueX       5   stimulated              5
 11 tissueY       6 unstimulated              1
 12 tissueY       6   stimulated              1
 13 tissueY       7 unstimulated              2
 14 tissueY       7   stimulated              2
 15 tissueY       8 unstimulated              3
 16 tissueY       8   stimulated              3
 19 tissueY      10 unstimulated              5
 20 tissueY      10   stimulated              5

Now, when I try to do this:

design(dds) = ~ tissue + tissue:culture.nested + tissue:treatment

I get the error: "the model matrix is not full rank, so the model cannot be fit as specified."

Is there any way to do the analysis, still exploiting the paired design without having to remove the one of the tissueX cultures as well?

Thanks,

Hanni