Repeated Measures mRNA expression analysis I
3
0
Entering edit mode
@charles-determan-jr-5949
Last seen 8.9 years ago
United States

Greetings,

I need to analyze data collected from an RNA-seq experiment. This consists of comparing two groups (control vs. treatment) and repeated sampling (1 hour, 2 hours, 3 hours). If this were a univariate problem I know I would use a 2-way rmANOVA analysis but this is RNA-seq and I have thousands of variables. I am very familiar with multiple packages for RNA differential expression analysis (e.g. DESeq2, edgeR, limma, etc.) but I have been unable to figure out what the most appropriate way to analyze such data in this circumstance. The closest answer I can find within the DESeq2 and edgeR manuals (limma is somewhat confusing to me) is to place to main treatment of interest at the end of the design formula, for example:

design(dds) <- formula(~ time + treatment)

Is this what is considered the appropriate way to address repeated measures in mRNA expression experiments? Any thoughts are appreciated.

Regards,

--
Charles Determan
Integrated Biosciences PhD Candidate
University of Minnesota

limma edgeR DESeq2 • 5.3k views
ADD COMMENT
0
Entering edit mode

Charles,

I am looking to do a similar analysis. What was the final method you ended up using to do your repeated measures analysis?

Cheers,
Nate

ADD REPLY
0
Entering edit mode

This question was continued and answered on a later thread, see: Repeated Measures mRNA expression analysis II

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 11 hours ago
United States
Hi Charles, On Jun 24, 2013, at 10:08 PM, Charles Determan Jr <deter088 at="" umn.edu=""> wrote: > > design(dds) <- formula(~ time + treatment) > > Is this what is considered the appropriate way to address repeated measures > in mRNA expression experiments? Any thoughts are appreciated. > Yes, this is the correct design formula for DESeq2 for estimating and testing the effect of treatment over all time points. We use by default a Wald test, and the likelihood ratio test is also implemented (see vignette). This is then a similar approach to calling anova.glm() on a glm fit for a single gene. Best, Mike
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 5 minutes ago
WEHI, Melbourne, Australia

Dear Charles,

The term "repeated measures" describes a situation in which repeated measurements are made on the same biological unit. Hence the repeated measurements are correlated. It is not clear from the brief information you give whether this is the case, or whether the different time points derive from independent biological samples. The model you give might or might not be correct, depending on the experimental units and the hypotheses that you plan to test. For most experiments it is not the right approach, for reasons that I have pointed out elsewhere:

https://www.stat.math.ethz.ch/pipermail/bioconductor/2013-June/053297.html

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Gordon,

I apologize for not being more definitive with my description. Your initial definition is my intention, consecutive measurements on the same biological units. I will look over the comments in the link you provided. Thank you for your insight, I appreciate any further thoughts you may have.

Regards,
Charles

ADD REPLY
0
Entering edit mode

Charles,

Are there only 2 biological units in your experiment? (One for treatment and one for control?) Or do you have multiple biological units in each group? Surely it must be the latter but, if so, your model does not take this into account.

What questions do you want to test?

Best
Gordon

ADD REPLY
0
Entering edit mode

To help clarify further here is a dataframe of the design.

   subject  group times
1        1 Treated    0hr
2        2 Treated    0hr
3        3 Control    0hr
4        4 Treated    0hr
5        5 Control    0hr
6        6 Control    0hr
7        1 Treated    1hr
8        2 Treated    1hr
9        3 Control    1hr

...

17       5 Control    2hr
18       6 Control    2hr

My thought process has been as follows:

In the edgeR userguide there is the treatment combination example

> targets
Sample Treat Time
1 Sample1 Placebo 0h
2 Sample2 Placebo 0h
3 Sample3 Placebo 1h
4 Sample4 Placebo 1h
5 Sample5 Placebo 2h
6 Sample6 Placebo 2h
7 Sample1 Drug 0h
8 Sample2 Drug 0h
9 Sample3 Drug 1h
10 Sample4 Drug 1h
11 Sample5 Drug 2h
12 Sample6 Drug 2h

which combines the groups to produce a single group (ex. Drug.1, Placebo.1, Drug.2, etc)

This seems potentially appropriate but this appears to assume independence between samples whereas my data consists of what you could call 'true repeated measures' on the same sample. This seems to draw on the paired samples and blocked examples.  These proceed by having the 'subject' as a factor as well, for example:

design <- model.matrix(~Subject+Treatment)

This leads me to guess that a combination of these techniques is required.  Perhaps merging the times and group factors in my dataset (see above) as 'newgroup' (e.g. Control.0, Control.1, Treatment.0, etc).  Then create the model formula:

design <- model.matrix(~Subject+newgroup)

Does this seem appropriate or am I way off base and over thinking this?  Thanks for any suggestions.

Regards,
Charles

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 5 minutes ago
WEHI, Melbourne, Australia

This question was continued and answered on a later thread, see: Repeated Measures mRNA expression analysis II

ADD COMMENT

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6