Differential Expression Analysis in edgeR using Anova
2
0
Entering edit mode
@ilovesuperheroes1993-17038
Last seen 4.8 years ago

Hi, I have 5 samples, namely the following: (1) Not transfected, untreated (2) Transfected but untreated (3) Transfected, treated, analyzed after 5 mins (4) Transfected, treated, analyzed after 60 mins (5) Transfected, treated, analyzed after 4 hrs [By transfected I mean a particular vector is present, and treated means treated with an antibody]

I do not have any replicates for any of the conditions. I am looking to perform an Anova test using edgeR, to see the gene expression at different time points wrt sample 2 (as given).

Could anyone tell me the edgeR code I should run to do the test? Normally, I define the groups, normalize the library, create a design matrix with the group and normalized DGEList, followed by estimating dispersion and performing the GLMQL tests. I am confused as to how to proceed in this case,as I have no replicates. I don't know how to define the groups or estimate dispersion in this case.

I would be very grateful if someone could help me with the edgeR codes. Thank you

edger anova gene expression Dispersion • 1.8k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

I suppose you've realized how much of a pain it is to not have replicates, so I won't harp on that. Suffice to say that you should have some strong words with whoever designed the experiment.

The obvious answer to your question is, as @timedreamer suggested, to read the relevant section of the edgeR user's guide. In this case, the most promising approach may be to manufacture some residual degrees of freedom by assuming that the expression is a smooth function of time (i.e., that can be modelled with a spline with few degrees of freedom). Specifically:

time <- c(0, 0, 5, 60, 240) # treat sample 2 as time '0'
transfected <- c("N", "Y", "Y", "Y", "Y")
spl <- splines::ns(time, df=2)
design <- model.matrix(~transfected + spl)

Lo and behold, this gives us a design matrix with 5 rows and 4 columns, i.e., one residual degree of freedom to estimate the dispersion. You can then proceed with a quasi-likelihood edgeR analysis to either identify the transfection effect (coef=2) or the time effect (coef=3:4). Note that the latter refers to any effect of time; it is not possible to compare specific time points with the above model.

If you want to compare specific time points, then you have no choice but to follow Option 3 in Section 2.11 to the letter, i.e., take the dispersion estimates from the above model and plug it into glmFit (followed by glmLRT) with a design matrix where each sample is its own group. This is not as good because glmLRT does not control the type I error rate correctly.

ADD COMMENT
0
Entering edit mode

Hi Lun, I want to ask a question. Why do you say that the model above cannot compare specific time points? Is it because when you use coef=3 to compare it with reference, both the time and transfect condition are different, so that it's impossible to say the DE is from time or transfection? Thank you

ADD REPLY
1
Entering edit mode
timedreamer ▴ 10
@timedreamer-18140
Last seen 4.9 years ago
New York University

Hi, I think you can find the answer in edgeR manual section 2.11 What to do if you have no replicates. Good luck.

ADD COMMENT

Login before adding your answer.

Traffic: 997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6