I will have results from three-way RNA-seq experiment. Now I'm trying to prepare for it's analysis.
There will be 4 maize lines (for now denoted: "A", "B", "C", "D") and two treatments: cold ("x") and regrowth ("c"). For each treatment there will be 6 time points.
I see that in the user quide there is example only for single-series analysis.
My question is: does
edgeR allow to compare time-course of RNA expression between eg. Ax vs Ac?
I'll be also interested in question "Do lines react differently to treatments?", eg. do line "A" and "B" react differently to temperature "c" compared to "x" (looking at time profiles).
Sorry for such hi-level question but I want to know if I should go the
edgeR way in my analysis.
Could you, please provide any hints?
Thank you @gordon-smyth . There will be three replicates of each time-series (line x condition). Is it possible to use spline approach and compare groups (combined line and treatment)?
Anything is possible but splines will make the analysis more complex and there is no need for it. What makes you think you need splines?
It is likely that I'm wrong, but I think that analysing time trends would be better than analysing each time point separately. It is intuitive for me (conception, not procedure), although I realized that in fact analysis is complicated in such setting. Also I've read that time-profile analysis eg. spline takes into account correlation between consecutive time points. Do you suggest that I should analyse time points separately?
I did not suggest analysing each time point separately, that would obviously be sub-optimal. In a standard analysis, time would enter as a factor and contrasts would test for time trends. You could certainly choose regression splines as the contrasts if you wanted to.
I have given as much help as I can. The brief information in your question isn't enough to give informed advice. It isn't clear for example whether each time course follows the course of the same organisms/cells or whether each time point profiles different organisms/cells.
Thank you Gordon for your time and patience, I'd try to be more specific. I've read carefully the three first chapters of
edgeRuser guide. It is very clear in describing contrasts. However, still I don't know how to include time. If I treat it like any other factor I can imagine only either contrasts for different time points in one line x treatment combination or for the same time point across line x treatment or just time main effect.
As of experimental design, each time course would be composed from pooled leaves from three plants. So each time point would use different plants.
There will be 4 maize lines (for now denoted: "A", "B", "C", "D") and two treatments: cold ("x") and regrowth ("c"). For each treatment there will be 6 time points (in my simulated data only 4).
Here is "target" file, already combining line and treatment in group variable.
Could you, please, provide an example of comparison of time trends? I'm searching the literature using
edgeRwith time-course, but without success.
What hypothesis are you trying to test?
I'd like to test:
Are there differences between treatment (cold vs regrowth) time profiles for each line?
Do lines react differently to treatments? ie. comparing differences in time-profiles between cold and regrowth across lines.
Does combination of model with splines outlined in
limmasection 9.6.2 "Many time points" and
edgeRwould be appropriate?
As an exercise I've tried to analyze simpler but similar experiment. The data is from real experiment (E-MTAB-11224) comparing wild type and mutant transcriptomes over four time points. I know that such sort series should be rather analyzed point by point but I can't find better test data.
Here are the raw counts
and experimental design (wiek = time)
Construction of model, according to section 9.6.2 "Many time points" form
Reading data into
DGEListobject and normalization
If I understand correctly "Groupwt:X1" "Groupwt:X2" "Groupwt:X3" are group * time interaction coefficients, so to find differences between expression curves I should make following test:
But I've done quick check of time-trends (using raw counts) for two top genes Zm00001eb040040 and Zm00001eb176310 and they are almost identical (for reference I also pasted raw data, it is for all replications but the conclusion would be the same if I'd average them):
Certainly I've messed something or still don't understand contrasts and coefficients...
With 4 time points there are 3 df between time points, so your spline analysis with
df=3is exactly the same as if you had treated time as a factor. The F-statistics and p-values are identical to what you would have obtained from a standand two-way analysis with time as an ordinary factor. Your test for interaction seems fine though.
You can't plot time trends using raw counts. You have to contruct the trend directly from
Xand the spline coefficients. Even if you plot raw data, you have to convert to log2-cpm values.
topTagsresults show they are definitely not identical.
Thank you very much @gordon-smyth ! I'm very happy that my test for interaction is okay. So I'll try to extend my test experiment to three groups with six time points. But before that I'll try to plot log-CPS from data presented above. As a basis I take example from
edgeRuser guide, page 113. But that was experiment without replications and one series. Do you have any suggestions which package I could use to plot observed values and fitted splines? Or I could just compute replicate averages and plot them?
My suggestion was not to use splines, in which case it's all pretty easy. Splines don't offer any essential gains for your replicated experiment.
If you really want to use splines, then you need to learn how to use the coefficients that come from the spline fit. That's not too hard either, but I just don't have time to write out complete code for you at the moment.
Thank you, I've just read https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#smooth-curves stating
So I understand the correct approach is to use stepwise regression inplemented for example by
maSigPro? I've tried it but I'm not sure what count normalization is needed for analysis.
Alternatively, could you please name the method which I should use? In the end I can just compare the expresssion by point-by-point two-way analysis (grouped line and temperature vs time as a factor). What gives better results than time-course analysis for < 8 time points (Spies et al 17). To be specific: I'd have 6 time points for 4 maize lines in two temperature regimes. Each sample will be done in triplicate.
Stepwise regression is completely unnecessary and completely inappropiate. There is nothing in our workflow paper that advised you to use stepwise regression.
It's a completely standard analysis. You just test for interaction as you have already done. The method doesn't need a "name".
I have not read Spies et al but what you say is a false conclusion. The standard analysis in edgeR is as good as is possible.
I just do not understand what the problem is. You are making it more difficult than it actually is. You have a competely standard design that can be easily analysed by standard methods as discussed in the edgeR and limma Guides either with splines or without. I don't see how extending the examples in the User's Guide to four lines and two treatments causes any problems. Your questions can be answered by testing for interaction, as you have shown you already know how to do. Plotting the trend take a few lines of R code, but you seem to know R well enough to be able to do it.
Do you have actually have data or is this is a theoretical exercise?
A the moment I don't have data from my principal investigator's experiment, so I'm trying to analyze other data with similar design.
I'll have full data (4 lines x 2 temperatures x 6 time points, 3 biol reps) after few months. Now I'm trying to create analysis workflow.
To answer that question, you test for interaction bttween time and treatment.
To answer that question you test for 3-way interaction between line, treatment and time.
This is all quite standard. You just fit linear models in edgeR or limma with or without splines.
Thank you @gordon-smyth for your patience and time. I see that I need to reread carefully the user guide so I could follow your advice.
edgeRuser-guide section 3.3 and have made new attempt to analyse test data. This time I combined two experiments for Triticum. Just to get real data with the same design as my future data. I've thought about factorail model, like described in section 3.3.4. But the fact that all interaction terms are relative to reference make comparisons hard for more complicated designs than simple treatment vs reference design. So I tried approach from section 3.3.1 as it indeed is the most intuitive and less prone to errors. So here is my code.
So 14542 genes have different expression profiles in INF vs MOCK treatment for line BASE, right?
Is this workflow okay? I plan to compare treatments (INF vs MOCK) for each line separately. That will answer my first experimental question.
Yes, the workflow looks ok. You have tested for any differences in time response profile between INF and MOCK. Note, you are testing for differences in the shape of the response profiles, but not for baseline expression differences between INF and MOCK. I assume that is indeed what you're after, just pointing it out.
I have few more questions.
eg. make contrasts
and make two tests, one for line BASE
c("INFvsMOCK.BASE6", "INFvsMOCK.BASE12", "INFvsMOCK.BASE24", "INFvsMOCK.BASE36", "INFvsMOCK.BASE48")
and second for line TRA
c("INFvsMOCK.TRA6", "INFvsMOCK.TRA12", "INFvsMOCK.TRA24", "INFvsMOCK.TRA36", "INFvsMOCK.TRA48")
Otherwise, if I must test interaction for each line separately then fdr correction would be incorrect (taking all comparisons as a whole).
ii. Clustering of expression profiles