Entering edit mode

pbachali
▴
50

@pbachali-9651
Last seen 4.5 years ago

Hi, I have a question related to the design of the experiment to perform differential gene expression. I have a RNAseq timeseries experiment of 468 samples at three different time points (T0, week12, week24) of only one cohort (diseased). I don't have two condition here but I have three timepoints of diseased condition. I am using DeSeq2 and I created design as follows. I am wondering if I am going in correct direction or not.

```
dds <- DESeq(dds,test="LRT", reduced = ~ 1) - Is this correct?
dds <- DESeq(dds)
```

I am not getting errors either way. But I am not able to figure out which is more appropriate to my scenario. Any help is much appreciated.

Best

Back up a little bit, how did you create your original dds variable--the DESeq object? What was the design you had for that? You would have used some function like DESeqDataSetFromMatrix() or something similar to it.

With the code you've posted, the only difference is whether or not you're doing a LRT or Wald significance test.

Per https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis : "The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model."

So the LRT is useful if you want to test if some variable with multiple levels is significant as a whole for the differential expression. Whether that's appropriate to your scenario, I can't say--especially because I don't know what your design matrix includes.

Thanks for explaining the difference of using LRT and Wald test. Yes I created dds using DESeqDataSetFromMatrix() function and the code for is as follows: dds <- DESeqDataSetFromMatrix(countData = counts.data, colData = metastuff_final, design = ~ timepoint) And yes I have multiple timepoints (more than two factors) but have only one condition (have only diseased samples but not controls). Since I have multiple factors I wanted to use LRT. I have lupus samples treated with drug for 12 weeks and 24 weeks and baseline samples. I would like to do the differential gene expression of week12 samples compared to time0 and week24 compared to time0. Is it appropriate to use LRT with reduced model ~1 with above design matrix?

Sorry for the late reply--got side tracked.

I'm basing this answer off of what I know about general linear model regression testing which should still be applicable--I think. I'm going to explain some regression stuff below for the sake of clarity--forgive me if you already understand this.

When you do full vs reduced model testing--the hypothesis you are testing is: does whatever I add to the full model, compared to reduced model, improve that model's function?

Example:

Full model: Y ~ A + B + C + D

Reduced model: Y ~ A + B

Null hypothesis: Addition of both variables C AND D improves the model.

(Y is the output or dependent variable. The above reads Y regressed on A + B + C +D--where A, B, C, D are independent variables)

Note: a reduced model where Y ~ 1 (like in your LRT example), is just a regression model where only the intercept is included with no covariates.

In general with linear models, when you have ≥ 2 factors for a variable--you want to test if inclusion of that variable is significant or improves the model as a whole first. This would mean doing a LRT with the reduced model ~1.

Full model: Y~timepoint

Reduced model: Y~1

Null hypothesis: Addition of timepoint (with all of its factors) improves the model compared to one just based on the intercept

If you find significant differential expression--then the question becomes, well for which factors? For this, you'll move to Wald. Here the question becomes, is there significant differential expression at 12 weeks compared to 0 and/or 24 weeks compared to 0.

Please note for the proper comparisons to be done, your factored timepoint variable should have 0 weeks set as the reference level. You can doublecheck this by looking at "metastuff_final$timepoint".

The reference level will always be the first one reported--see example:

Example:

test

data$factoredvariableA B C C B B A

Levels: A B C

You can reorder or relevel the reference group using reorder() or relevel() functions. R by default sets the 1st alphabetical factor group as the reference group.

The Wald test should then tell if you RNA were significantly differentially expressed at 12 and 24 weeks compared to 0 weeks.

Hope that helps.