Question

Confused by "composite" designs

0

Entering edit mode

kjo ▴ 70

@kjo-11078

Last seen 7.3 years ago

For concreteness, suppose that my metadata has the following form

    > # the following is a partial listing!
    > head(metadata, 12)
       drug dose replicate
    1  NONE  NaN         1
    2     A    1         1
    3     A   10         1
    4     A  100         1
    5     B    1         1
    6     B   10         1
    7     B  100         1
    8     C    1         1
    9     C   10         1
    10    C  100         1
    11 NONE  NaN         2
    12    A    1         2

In words, for each (biological) replicate, I have one "no-treatment" observation (drug = NONE, dose = NaN), and 9 "treatment" observations, where the latter result from applying 3 drugs at 3 different doses. (The metadata consists of n blocks of rows, one for each replicate number, and identical in every way, except for the value of the replicate column--which is constant for each block. IOW, if n is the number of replicates, table(metadata$drug, metadata$dose, useNA = "ifany") would produce something like this:

       1 10 100 NaN
  A    n  n   n   0
  B    n  n   n   0
  C    n  n   n   0
  NONE 0  0   0   n

The excerpt shown earlier includes the block for k=1, plus the first two rows of the block for k=2. Also, each value of the replicate column corresponds to a different batch of cells; the cells in each batch are grown together, and then divided into wells, and treated with one of the 10 possible combinations of the drug and dose factors.)

In its simplest form, the goal of such an experiment is to compare the effect of the various treatments against the untreated condition.

One way to do this would be to add a condition pseudo-factor to the metadata, to encode each combination of drug and dose:

    > # the following is a partial listing!
    > head(augmented_metadata, 12)
       drug dose replicate condition
    1  NONE  NaN         1         0
    2     A    1         1         1
    3     A   10         1         2
    4     A  100         1         3
    5     B    1         1         4
    6     B   10         1         5
    7     B  100         1         6
    8     C    1         1         7
    9     C   10         1         8
    10    C  100         1         9
    11 NONE  NaN         2         0
    12    A    1         2         1

Then I would begin my DESeq2 analysis with

    dds <- DESeqDataSetFromMatrix(countData = counts,
                                  colData = augmented_metadata,
                                  design = ~ condition)

...etc., and would get my results with expressions like

    results(dds, contrast = list("condition1", "condition0"))
    ...

...and so on.

On the upside, this approach is both conceptually straightforward and extensible to more factors (e.g. the case in which, in addtion to drug and dose, one also had the time between dosing and measuring; in this case the condition column would encode each combination of drug, dose, and time; etc.)

On the downside, the relationship among the conditions for each drug gets lost; IOW, all the treatment conditions are treated as unrelated to each other.

I know that, instead of creating a condition pseudo-factor, I can specify a "composite" design, like this

    dds <- DESeqDataSetFromMatrix(countData = counts,
                                  colData = metadata,
                                  design = ~ dose + agent)

...but I'm not sure how to specify meaningful contrasts given this design.

More specifically, assuming that resultsNames(dds) returns strings like this

    doseNaN     agentNONE
    dose1       agentA
    dose10      agentB
    dose100     agentC

...are the the results I want those given by the following 3 expressions?

    results(dds, contrast = list("agentA", "agentNONE"))
    results(dds, contrast = list("agentB", "agentNONE"))
    results(dds, contrast = list("agentC", "agentNONE"))

If so, how is the dose information taken into account?

deseq2 • 1.1k views

ADD COMMENT • link updated 7.6 years ago by Michael Love 41k • written 7.6 years ago by kjo ▴ 70

0

Entering edit mode

Can you say more about the replicates? Do you have multiple replicates for all combinations of drug and dose?

table(dds$drug, dds$dose)

Are the samples with replicate=1 related in any way?

ADD REPLY • link 7.6 years ago Michael Love 41k

0

Entering edit mode

For each k, the observations with replicate=k are related in that they refer to cells coming from the same batch. IOW, each replicate represents a batch of cells that were grown together, then split into wells, and treated with a particular combination of drug and dose.

For this question, it is OK to assume that the metadata consists of n blocks of rows, one for each replicate number, and identical in every way, except for the value of the replicate column (which is constant for each block). (In my original description, I show the block for k=1, plus the first two rows of the block for k=2.)

(Sorry, I should have made all these points clearer in my original post. I will fix it.)

ADD REPLY • link 7.6 years ago kjo ▴ 70

0

Entering edit mode

ADD REPLY • link 7.6 years ago kjo ▴ 70

0

Entering edit mode

kjo ▴ 70

@kjo-11078

Last seen 7.3 years ago

ADD COMMENT • link 7.6 years ago kjo ▴ 70

score 1 · Accepted Answer · 2016-09-14

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 19 hours ago

United States

Sorry, I've just been busy in the past 2 days. You say:

"the goal of such an experiment is to compare the effect of the various treatments against the untreated condition"

"the relationship among the conditions for each drug gets lost"

I guess I'm still a bit confused, how do you want to take dose into account when comparing against untreated? There are many different hypotheses to test here, but the simplest is to compare each drug x dosage combination against untreated. If you want to test other null hypotheses, I'd need for you to describe them first.

BTW, for the simple analysis, you can include replicate as well in the design, e.g.:

~ replicate + drugxdose

Where drugxdose is a new variable combining drug and dose.

ADD COMMENT • link 7.6 years ago Michael Love 41k

0

Entering edit mode

Thank you for your answer.

Following up on your last suggestion, first I want to clarify that the factor called condition in my original post was meant to be exactly equivalent to the new one you propose, drugxdose. IOW, two rows in the augmented_metadata table have the same value in the condition column if and only if they have the same values in the drug and dose columns.

Now, supposing that the design were ~ replicate + condition (which, as I just clarified, is essentially what you suggest at the end of your answer). In this case, resultsNames(dds) returned strings like

    replicate1    condition0
    replicate2    condition1
    replicate3    condition2
    .             condition3
    .             .
    .             .
                  .

(NB: the lengths of the two columns above are, in general, different.)

What would be the contrasts one would use in this case?

ADD REPLY • link 7.6 years ago kjo ▴ 70

1

Entering edit mode

If you want to compare drug B at 10 units to untreated while controlling for differences across replicate:

results(dds, contrast=c("drugxdose","B10","NN"))

...let me suggest you put "N" in the drug and dose columns instead of NONE and NaN. And things will be much easier for you if you use meaningful, descriptive levels (e.g. "B10") rather than numbers. You can construct a column 'drugxdose' like so:

dds$drugxdose <- factor(paste0(dds$drug, as.character(dds$dose)))

ADD REPLY • link 7.6 years ago Michael Love 41k

0

Entering edit mode

Also, make sure that class(dds$replicate) is factor and not numeric.

ADD REPLY • link 7.6 years ago Michael Love 41k