Question

Time-course, comparison of two groups

0

Entering edit mode

boczniak767 ▴ 720

@maciej-jonczyk-3945

Last seen 28 days ago

Poland

Hi,

I will have results from three-way RNA-seq experiment. Now I'm trying to prepare for it's analysis.

There will be 4 maize lines (for now denoted: "A", "B", "C", "D") and two treatments: cold ("x") and regrowth ("c"). For each treatment there will be 6 time points.

I see that in the user quide there is example only for single-series analysis.

My question is: does edgeR allow to compare time-course of RNA expression between eg. Ax vs Ac? I'll be also interested in question "Do lines react differently to treatments?", eg. do line "A" and "B" react differently to temperature "c" compared to "x" (looking at time profiles).

Sorry for such hi-level question but I want to know if I should go the edgeR way in my analysis. Could you, please provide any hints?

edgeR • 1.9k views

ADD COMMENT • link 13 months ago • updated 12 months ago boczniak767 ▴ 720

score 0 · Answer 1 · 2023-03-19

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

edgeR allows any number of explanatory factors, so you can have time-series for any number of treatments.

If you have replicates, then I would analyse the experiment using the standard approach outlined in Section 3.3.1 of the edgeR User's Guide (i.e., combine all the explanatory conditions into one combined factor). If you don't have replicates, then I would use a spline time trend for each combination of maize line and treatment.

ADD COMMENT • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you @gordon-smyth . There will be three replicates of each time-series (line x condition). Is it possible to use spline approach and compare groups (combined line and treatment)?

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

Anything is possible but splines will make the analysis more complex and there is no need for it. What makes you think you need splines?

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

It is likely that I'm wrong, but I think that analysing time trends would be better than analysing each time point separately. It is intuitive for me (conception, not procedure), although I realized that in fact analysis is complicated in such setting. Also I've read that time-profile analysis eg. spline takes into account correlation between consecutive time points. Do you suggest that I should analyse time points separately?

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

I did not suggest analysing each time point separately, that would obviously be sub-optimal. In a standard analysis, time would enter as a factor and contrasts would test for time trends. You could certainly choose regression splines as the contrasts if you wanted to.

I have given as much help as I can. The brief information in your question isn't enough to give informed advice. It isn't clear for example whether each time course follows the course of the same organisms/cells or whether each time point profiles different organisms/cells.

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you Gordon for your time and patience, I'd try to be more specific. I've read carefully the three first chapters of edgeR user guide. It is very clear in describing contrasts. However, still I don't know how to include time. If I treat it like any other factor I can imagine only either contrasts for different time points in one line x treatment combination or for the same time point across line x treatment or just time main effect.

As of experimental design, each time course would be composed from pooled leaves from three plants. So each time point would use different plants.

There will be 4 maize lines (for now denoted: "A", "B", "C", "D") and two treatments: cold ("x") and regrowth ("c"). For each treatment there will be 6 time points (in my simulated data only 4).

Here is "target" file, already combining line and treatment in group variable.

 line treatment time group
Ax15.1     A       x   15    Ax
Ax15.2     A       x   15    Ax
Ax15.3     A       x   15    Ax
Ax22.1     A       x   22    Ax
Ax22.2     A       x   22    Ax
Ax22.3     A       x   22    Ax
Ax29.1     A       x   29    Ax
Ax29.2     A       x   29    Ax
Ax29.3     A       x   29    Ax
Ax36.1     A       x   36    Ax
Ax36.2     A       x   36    Ax
Ax36.3     A       x   36    Ax
Ac15.1     A       c   15    Ac
Ac15.2     A       c   15    Ac
Ac15.3     A       c   15    Ac
Ac22.1     A       c   22    Ac
Ac22.2     A       c   22    Ac
Ac22.3     A       c   22    Ac
Ac29.1     A       c   29    Ac
Ac29.2     A       c   29    Ac
Ac29.3     A       c   29    Ac
Ac36.1     A       c   36    Ac
Ac36.2     A       c   36    Ac
Ac36.3     A       c   36    Ac
Bx15.1     B       x   15    Bx
Bx15.2     B       x   15    Bx
Bx15.3     B       x   15    Bx
Bx22.1     B       x   22    Bx
Bx22.2     B       x   22    Bx
Bx22.3     B       x   22    Bx
Bx29.1     B       x   29    Bx
Bx29.2     B       x   29    Bx
Bx29.3     B       x   29    Bx
Bx36.1     B       x   36    Bx
Bx36.2     B       x   36    Bx
Bx36.3     B       x   36    Bx
Bc15.1     B       c   15    Bc
Bc15.2     B       c   15    Bc
Bc15.3     B       c   15    Bc
Bc22.1     B       c   22    Bc
Bc22.2     B       c   22    Bc
Bc22.3     B       c   22    Bc
Bc29.1     B       c   29    Bc
Bc29.2     B       c   29    Bc
Bc29.3     B       c   29    Bc
Bc36.1     B       c   36    Bc
Bc36.2     B       c   36    Bc
Bc36.3     B       c   36    Bc
Cx15.1     C       x   15    Cx
Cx15.2     C       x   15    Cx
Cx15.3     C       x   15    Cx
Cx22.1     C       x   22    Cx
Cx22.2     C       x   22    Cx
Cx22.3     C       x   22    Cx
Cx29.1     C       x   29    Cx
Cx29.2     C       x   29    Cx
Cx29.3     C       x   29    Cx
Cx36.1     C       x   36    Cx
Cx36.2     C       x   36    Cx
Cx36.3     C       x   36    Cx
Cc15.1     C       c   15    Cc
Cc15.2     C       c   15    Cc
Cc15.3     C       c   15    Cc
Cc22.1     C       c   22    Cc
Cc22.2     C       c   22    Cc
Cc22.3     C       c   22    Cc
Cc29.1     C       c   29    Cc
Cc29.2     C       c   29    Cc
Cc29.3     C       c   29    Cc
Cc36.1     C       c   36    Cc
Cc36.2     C       c   36    Cc
Cc36.3     C       c   36    Cc
Dx15.1     D       x   15    Dx
Dx15.2     D       x   15    Dx
Dx15.3     D       x   15    Dx
Dx22.1     D       x   22    Dx
Dx22.2     D       x   22    Dx
Dx22.3     D       x   22    Dx
Dx29.1     D       x   29    Dx
Dx29.2     D       x   29    Dx
Dx29.3     D       x   29    Dx
Dx36.1     D       x   36    Dx
Dx36.2     D       x   36    Dx
Dx36.3     D       x   36    Dx
Dc15.1     D       c   15    Dc
Dc15.2     D       c   15    Dc
Dc15.3     D       c   15    Dc
Dc22.1     D       c   22    Dc
Dc22.2     D       c   22    Dc
Dc22.3     D       c   22    Dc
Dc29.1     D       c   29    Dc
Dc29.2     D       c   29    Dc
Dc29.3     D       c   29    Dc
Dc36.1     D       c   36    Dc
Dc36.2     D       c   36    Dc
Dc36.3     D       c   36    Dc

Could you, please, provide an example of comparison of time trends? I'm searching the literature using edgeR with time-course, but without success.

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

What hypothesis are you trying to test?

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

I'd like to test:

Are there differences between treatment (cold vs regrowth) time profiles for each line?
Do lines react differently to treatments? ie. comparing differences in time-profiles between cold and regrowth across lines.

Does combination of model with splines outlined in limma section 9.6.2 "Many time points" and edgeR would be appropriate?

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

As an exercise I've tried to analyze simpler but similar experiment. The data is from real experiment (E-MTAB-11224) comparing wild type and mutant transcriptomes over four time points. I know that such sort series should be rather analyzed point by point but I can't find better test data.

Here are the raw counts

>head(cts, 2)
                m15.1 m15.2 m15.3 m22.1 m22.2 m22.3 m29.1 m29.2 m29.3 m36.1
Zm00001eb015280    56    67    42    49    52    25    25    12    23    12
Zm00001eb000610     0     0     0     2     0     0     0     0     0     0
                m36.2 m36.3 w15.1 w15.2 w15.3 w22.1 w22.2 w22.3 w29.1 w29.2
Zm00001eb015280     7    11    84    30    48    24    40    36    14    10
Zm00001eb000610     0     0     0     0     0     0     0     0     0     0
                w29.3 w36.1 w36.2 w36.3
Zm00001eb015280    14    14    17     6
Zm00001eb000610     0     0     0     0

and experimental design (wiek = time)

> coldata
      linia wiek
m15.1   vp5   15
m15.2   vp5   15
m15.3   vp5   15
m22.1   vp5   22
m22.2   vp5   22
m22.3   vp5   22
m29.1   vp5   29
m29.2   vp5   29
m29.3   vp5   29
m36.1   vp5   36
m36.2   vp5   36
m36.3   vp5   36
w15.1    wt   15
w15.2    wt   15
w15.3    wt   15
w22.1    wt   22
w22.2    wt   22
w22.3    wt   22
w29.1    wt   29
w29.2    wt   29
w29.3    wt   29
w36.1    wt   36
w36.2    wt   36
w36.3    wt   36

Construction of model, according to section 9.6.2 "Many time points" form limma userguide

#a basis for a natural regression spline
>library(splines)
>X = ns(coldata$wiek, df=3) # In limma 3 to 5 df is recommended so I use 3 for such short series
# Fit separate curves for the control and treatment groups
>Group <- factor(coldata$linia)
>design=model.matrix(~Group*X) # model with intercept

Reading data into DGEList object and normalization

>library(edgeR)
>cts.e=DGEList(counts=cts)
# filtering
>keep = filterByExpr(cts.e, design)
> table(keep)
keep
FALSE  TRUE 
19584 24719
>cts.e.f=cts.e[keep, keep.lib.sizes=F]
# TMM normalization
>cts.tmm <- calcNormFactors(cts.e.f)
# calculation of normalization factors
>cts.tmm=estimateDisp(cts.tmm, design)
# fitting Quasi-likelihood GLM model
>fit <- glmQLFit(cts.tmm, design, robust=TRUE)
# checking coefficients
> colnames(fit$coef)
[1] "(Intercept)" "Groupwt"     "X1"          "X2"          "X3"         
[6] "Groupwt:X1"  "Groupwt:X2"  "Groupwt:X3"

If I understand correctly "Groupwt:X1" "Groupwt:X2" "Groupwt:X3" are group * time interaction coefficients, so to find differences between expression curves I should make following test:

# test according to edgeR userguide section 4.8.8, and limma userguide section 9.6.2
>fit2 <- glmQLFTest(fit, coef=6:8)
# Top ten genes
>topTags(fit2, n=10)
Coefficient:  Groupwt:X1 Groupwt:X2 Groupwt:X3 
                logFC.Groupwt.X1 logFC.Groupwt.X2 logFC.Groupwt.X3   logCPM
Zm00001eb040040       0.78601733       -0.8236045        -1.360529 8.844410
Zm00001eb176310       0.43335761       -1.8882173        -2.491390 7.259827
Zm00001eb108930      -0.20014996       -1.6186250        -1.383365 7.048723
Zm00001eb052130      -0.04039974       -2.0158117        -2.351057 6.024964
Zm00001eb319240       0.85724643       -1.0979758        -1.555110 5.089349
Zm00001eb272820       0.21404487       -1.4681989        -1.374614 5.062015
Zm00001eb415460      -0.01191202       -1.1007960        -1.566832 7.418897
Zm00001eb202830       0.50597560       -0.6134339        -1.292589 8.271701
Zm00001eb371740      -0.69415762        0.7228735         1.397858 7.426577
Zm00001eb057860      -0.45411810       -1.1047224        -2.065202 6.515550
                       F       PValue          FDR
Zm00001eb040040 90.72442 1.411487e-13 3.474614e-09
Zm00001eb176310 87.02573 2.811290e-13 3.474614e-09
Zm00001eb108930 79.20611 6.590094e-13 5.430018e-09
Zm00001eb052130 70.53514 2.417192e-12 1.265101e-08
Zm00001eb319240 70.17509 2.558964e-12 1.265101e-08
Zm00001eb272820 65.53341 5.466304e-12 2.252026e-08
Zm00001eb415460 63.28859 8.027446e-12 2.834721e-08
Zm00001eb202830 61.83556 1.036081e-11 3.201360e-08
Zm00001eb371740 60.32484 1.358491e-11 3.399302e-08
Zm00001eb057860 64.40349 1.404033e-11 3.399302e-08

But I've done quick check of time-trends (using raw counts) for two top genes Zm00001eb040040 and Zm00001eb176310 and they are almost identical (for reference I also pasted raw data, it is for all replications but the conclusion would be the same if I'd average them):

Geneid  m15-1   m15-2   m15-3   m22-1   m22-2   m22-3   m29-1   m29-2   m29-3   m36-1   m36-2   m36-3   w15-1   w15-2   w15-3   w22-1   w22-2   w22-3   w29-1   w29-2   w29-3   w36-1   w36-2   w36-3
Zm00001eb040040 23078   22469   22230   13137   13306   12855   4411    4833    4586    1504    1898    1913    21686   18537   20783   12140   12348   12030   4940    5104    5242    613 622 525
Zm00001eb176310 10153   8910    9339    3433    3600    3482    817 922 932 283 377 381 10425   8385    9444    3331    3194    3116    679 840 803 61  65  38

Certainly I've messed something or still don't understand contrasts and coefficients...

ADD REPLY • link 13 months ago boczniak767 ▴ 720

1

Entering edit mode

With 4 time points there are 3 df between time points, so your spline analysis with df=3 is exactly the same as if you had treated time as a factor. The F-statistics and p-values are identical to what you would have obtained from a standand two-way analysis with time as an ordinary factor. Your test for interaction seems fine though.

I've done quick check of time-trends (using raw counts)

You can't plot time trends using raw counts. You have to contruct the trend directly from X and the spline coefficients. Even if you plot raw data, you have to convert to log2-cpm values.

they are almost identical

The topTags results show they are definitely not identical.

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you very much @gordon-smyth ! I'm very happy that my test for interaction is okay. So I'll try to extend my test experiment to three groups with six time points. But before that I'll try to plot log-CPS from data presented above. As a basis I take example from edgeR user guide, page 113. But that was experiment without replications and one series. Do you have any suggestions which package I could use to plot observed values and fitted splines? Or I could just compute replicate averages and plot them?

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

My suggestion was not to use splines, in which case it's all pretty easy. Splines don't offer any essential gains for your replicated experiment.

If you really want to use splines, then you need to learn how to use the coefficients that come from the spline fit. That's not too hard either, but I just don't have time to write out complete code for you at the moment.

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you, I've just read https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#smooth-curves stating

Keep in mind that in gene expression analyses, the same model is applied to every gene in the dataset, even though a simpler model may be sufficient for some genes, whilst a more complex one is needed for others.

So I understand the correct approach is to use stepwise regression inplemented for example by maSigPro? I've tried it but I'm not sure what count normalization is needed for analysis.

Alternatively, could you please name the method which I should use? In the end I can just compare the expresssion by point-by-point two-way analysis (grouped line and temperature vs time as a factor). What gives better results than time-course analysis for < 8 time points (Spies et al 17). To be specific: I'd have 6 time points for 4 maize lines in two temperature regimes. Each sample will be done in triplicate.

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

So I understand the correct approach is to use stepwise regression inplemented for example by maSigPro?

Stepwise regression is completely unnecessary and completely inappropiate. There is nothing in our workflow paper that advised you to use stepwise regression.

could you please name the method which I should use?

It's a completely standard analysis. You just test for interaction as you have already done. The method doesn't need a "name".

In the end I can just compare the expresssion by point-by-point two-way analysis (grouped line and temperature vs time as a factor). What gives better results than time-course analysis for < 8 time points (Spies et al 17).

I have not read Spies et al but what you say is a false conclusion. The standard analysis in edgeR is as good as is possible.

I just do not understand what the problem is. You are making it more difficult than it actually is. You have a competely standard design that can be easily analysed by standard methods as discussed in the edgeR and limma Guides either with splines or without. I don't see how extending the examples in the User's Guide to four lines and two treatments causes any problems. Your questions can be answered by testing for interaction, as you have shown you already know how to do. Plotting the trend take a few lines of R code, but you seem to know R well enough to be able to do it.

Do you have actually have data or is this is a theoretical exercise?

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

Do you have actually have data or is this is a theoretical exercise?

A the moment I don't have data from my principal investigator's experiment, so I'm trying to analyze other data with similar design.

I'll have full data (4 lines x 2 temperatures x 6 time points, 3 biol reps) after few months. Now I'm trying to create analysis workflow.

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

Are there differences between treatment (cold vs regrowth) time profiles for each line?

To answer that question, you test for interaction bttween time and treatment.

Do lines react differently to treatments? ie. comparing differences in time-profiles between cold and regrowth across lines.

To answer that question you test for 3-way interaction between line, treatment and time.

This is all quite standard. You just fit linear models in edgeR or limma with or without splines.

ADD REPLY • link 13 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you @gordon-smyth for your patience and time. I see that I need to reread carefully the user guide so I could follow your advice.

ADD REPLY • link 13 months ago boczniak767 ▴ 720

0

Entering edit mode

I've read edgeR user-guide section 3.3 and have made new attempt to analyse test data. This time I combined two experiments for Triticum. Just to get real data with the same design as my future data. I've thought about factorail model, like described in section 3.3.4. But the fact that all interaction terms are relative to reference make comparisons hard for more complicated designs than simple treatment vs reference design. So I tried approach from section 3.3.1 as it indeed is the most intuitive and less prone to errors. So here is my code.

# It is a test for 4 lines (NIL38, NIL51, BASE, TRA) x 2 treatments (INF, MOCK) x 6 time points (3, 6, 12, 24, 36, 48)
# "targets" file
> coldata=read.table("allsampleinfo", sep="\t", header=T, row.names=1)
# a sample of my "targets" object
> head(coldata, 24)
           linia treat time
ERR1201776 NIL38  MOCK    3
ERR1201777 NIL38  MOCK    3
ERR1201778 NIL38  MOCK    3
ERR1201782 NIL38  MOCK    6
ERR1201783 NIL38  MOCK    6
ERR1201784 NIL38  MOCK    6
ERR1201767 NIL38  MOCK   12
ERR1201768 NIL38  MOCK   12
ERR1201769 NIL38  MOCK   12
ERR1201770 NIL38  MOCK   24
ERR1201771 NIL38  MOCK   24
ERR1201772 NIL38  MOCK   24
ERR1201773 NIL38  MOCK   36
ERR1201774 NIL38  MOCK   36
ERR1201775 NIL38  MOCK   36
ERR1201779 NIL38  MOCK   48
ERR1201780 NIL38  MOCK   48
ERR1201781 NIL38  MOCK   48
ERR1201758 NIL38   INF    3
ERR1201759 NIL38   INF    3
ERR1201760 NIL38   INF    3
ERR1201764 NIL38   INF    6
ERR1201765 NIL38   INF    6
ERR1201766 NIL38   INF    6
# concatenating factors
> Group <- factor(paste(coldata$linia,coldata$treat,coldata$time,sep="."))
# one-way model
> design.one=model.matrix(~0 + Group, data=coldata)
> colnames(design.one) <- levels(Group)
# Reading-in, filtering and normalizing data
> cts <- as.matrix(read.table("allcounts2r", sep="\t", header=T, row.names="Geneid"))
# check data names
> all(rownames(coldata) == colnames(cts))
[1] TRUE
> cts.e=DGEList(counts=cts) # cts is a count matrix
> keep = filterByExpr(cts.e, design.one)
> table(keep)
keep
FALSE  TRUE 
48346 72398 
> cts.e.f=cts.e[keep, keep.lib.sizes=F]
> cts.tmm <- calcNormFactors(cts.e.f)
> cts.tmm=estimateDisp(cts.tmm, design.one)
# fitting one-way model
> fit.one=glmQLFit(cts.tmm, design.one, robust=T)
# coefficients
> colnames(fit.one)
 [1] "BASE.INF.12"   "BASE.INF.24"   "BASE.INF.3"    "BASE.INF.36"  
 [5] "BASE.INF.48"   "BASE.INF.6"    "BASE.MOCK.12"  "BASE.MOCK.24" 
 [9] "BASE.MOCK.3"   "BASE.MOCK.36"  "BASE.MOCK.48"  "BASE.MOCK.6"  
[13] "NIL38.INF.12"  "NIL38.INF.24"  "NIL38.INF.3"   "NIL38.INF.36" 
[17] "NIL38.INF.48"  "NIL38.INF.6"   "NIL38.MOCK.12" "NIL38.MOCK.24"
[21] "NIL38.MOCK.3"  "NIL38.MOCK.36" "NIL38.MOCK.48" "NIL38.MOCK.6" 
[25] "NIL51.INF.12"  "NIL51.INF.24"  "NIL51.INF.3"   "NIL51.INF.36" 
[29] "NIL51.INF.48"  "NIL51.INF.6"   "NIL51.MOCK.12" "NIL51.MOCK.24"
[33] "NIL51.MOCK.3"  "NIL51.MOCK.36" "NIL51.MOCK.48" "NIL51.MOCK.6" 
[37] "TRA.INF.12"    "TRA.INF.24"    "TRA.INF.3"     "TRA.INF.36"   
[41] "TRA.INF.48"    "TRA.INF.6"     "TRA.MOCK.12"   "TRA.MOCK.24"  
[45] "TRA.MOCK.3"    "TRA.MOCK.36"   "TRA.MOCK.48"   "TRA.MOCK.6" 
# contrasts for treat (INF vs MOCK) and time interaction at any point for one line (BASE)
> my.cotrasts=makeContrasts(
+ INFvsMOCK.BASE6 = (BASE.INF.6 - BASE.INF.3) - (BASE.MOCK.6 - BASE.MOCK.3),
+ INFvsMOCK.BASE12 = (BASE.INF.12 - BASE.INF.3) - (BASE.MOCK.12 - BASE.MOCK.3),
+ INFvsMOCK.BASE24 = (BASE.INF.24 - BASE.INF.3) - (BASE.MOCK.24 - BASE.MOCK.3),
+ INFvsMOCK.BASE36 = (BASE.INF.36 - BASE.INF.3) - (BASE.MOCK.36 - BASE.MOCK.3),
+INFvsMOCK.BASE48 = (BASE.INF.48 - BASE.INF.3) - (BASE.MOCK.48 - BASE.MOCK.3),
+ levels=design.one)
# test
> fit.intBASE=glmQLFTest(fit.one, contrast=my.cotrasts[,c("INFvsMOCK.BASE6", "INFvsMOCK.BASE12", "INFvsMOCK.BASE24", "INFvsMOCK.BASE36", "INFvsMOCK.BASE48")])
# top ten genes
> topTags(fit.intBASE, n=10)
Coefficient:  LR test on 5 degrees of freedom 
                   logFC.INFvsMOCK.BASE6 logFC.INFvsMOCK.BASE12
TraesCS2A02G083300            0.04882017            0.077802638
TraesCS7B02G354000           -0.35501617            0.009853977
TraesCS7A02G453500            0.06468774            0.449802787
TraesCS6B02G276800           -0.50561478           -0.898773532
TraesCS6D02G125600            0.32997571            0.239776090
TraesCS4A02G312600           -0.94447993           -1.112350393
TraesCSU02G018900            -0.09986677            0.395649036
TraesCS4A02G107600           -0.23828700           -0.294949139
TraesCS7A02G453600            0.43668509            0.751355133
TraesCS7D02G442900           -0.53533177            0.165202193
                   logFC.INFvsMOCK.BASE24 logFC.INFvsMOCK.BASE36
TraesCS2A02G083300             0.11981955               4.366195
TraesCS7B02G354000            -0.23088941               4.644750
TraesCS7A02G453500            -0.18882314               3.481330
TraesCS6B02G276800            -0.34534753               3.686998
TraesCS6D02G125600             0.15697392               5.131801
TraesCS4A02G312600            -0.39856420               5.083292
TraesCSU02G018900              0.12970085               4.013883
TraesCS4A02G107600            -0.06561859               3.690224
TraesCS7A02G453600             0.59902730               3.992775
TraesCS7D02G442900             0.37459720               7.286937
                   logFC.INFvsMOCK.BASE48   logCPM        F       PValue
TraesCS2A02G083300               4.446887 8.420750 54.65242 9.033951e-28
TraesCS7B02G354000               5.246944 7.967426 52.74892 3.415953e-27
TraesCS7A02G453500               4.434794 8.492496 52.59678 3.804608e-27
TraesCS6B02G276800               3.700509 6.961990 45.86680 5.643929e-25
TraesCS6D02G125600               6.067965 8.406653 45.20288 9.490134e-25
TraesCS4A02G312600               6.097936 2.852610 45.19128 2.104182e-24
TraesCSU02G018900                4.618597 7.714413 43.97739 2.510384e-24
TraesCS4A02G107600               3.833844 8.652186 43.22250 4.611855e-24
TraesCS7A02G453600               4.908181 7.853467 42.65623 7.311615e-24
TraesCS7D02G442900               8.005619 4.841572 44.39876 9.327409e-24
                            FDR
TraesCS2A02G083300 6.540400e-23
TraesCS7B02G354000 9.181533e-23
TraesCS7A02G453500 9.181533e-23
TraesCS6B02G276800 1.021523e-20
TraesCS6D02G125600 1.374133e-20
TraesCS4A02G312600 2.538976e-20
TraesCSU02G018900  2.596382e-20
TraesCS4A02G107600 4.173614e-20
TraesCS7A02G453600 5.881626e-20
TraesCS7D02G442900 6.752858e-20

# significant genes at FD<0.05 level
> x=topTags(fit.intBASE, n=100000)
> x2=subset(x$table, FDR<=0.05)
> dim(x2)
[1] 14542     9

So 14542 genes have different expression profiles in INF vs MOCK treatment for line BASE, right?

Is this workflow okay? I plan to compare treatments (INF vs MOCK) for each line separately. That will answer my first experimental question.

ADD REPLY • link 12 months ago boczniak767 ▴ 720

0

Entering edit mode

Yes, the workflow looks ok. You have tested for any differences in time response profile between INF and MOCK. Note, you are testing for differences in the shape of the response profiles, but not for baseline expression differences between INF and MOCK. I assume that is indeed what you're after, just pointing it out.

ADD REPLY • link 12 months ago Gordon Smyth 50k

0

Entering edit mode

Thank you,

I have few more questions.

Is it possible to test interaction for more than one line at a time?

eg. make contrasts

>  my.cotrasts2=makeContrasts(
+  INFvsMOCK.BASE6 = (BASE.INF.6 - BASE.INF.3) - (BASE.MOCK.6 - BASE.MOCK.3),
+  INFvsMOCK.BASE12 = (BASE.INF.12 - BASE.INF.3) - (BASE.MOCK.12 - BASE.MOCK.3),
+ INFvsMOCK.BASE24 = (BASE.INF.24 - BASE.INF.3) - (BASE.MOCK.24 - BASE.MOCK.3),
+  INFvsMOCK.BASE36 = (BASE.INF.36 - BASE.INF.3) - (BASE.MOCK.36 - BASE.MOCK.3),
+ INFvsMOCK.BASE48 = (BASE.INF.48 - BASE.INF.3) - (BASE.MOCK.48 - BASE.MOCK.3),
+  INFvsMOCK.TRA6 = (TRA.INF.6 - TRA.INF.3) - (TRA.MOCK.6 - TRA.MOCK.3),
+  INFvsMOCK.TRA12 = (TRA.INF.12 - TRA.INF.3) - (TRA.MOCK.12 - TRA.MOCK.3),
+ INFvsMOCK.TRA24 = (TRA.INF.24 - TRA.INF.3) - (TRA.MOCK.24 - TRA.MOCK.3),
+  INFvsMOCK.TRA36 = (TRA.INF.36 - TRA.INF.3) - (TRA.MOCK.36 - TRA.MOCK.3),
+ INFvsMOCK.TRA48 = (TRA.INF.48 - TRA.INF.3) - (TRA.MOCK.48 - TRA.MOCK.3),
+  levels=design.one)

and make two tests, one for line BASE

c("INFvsMOCK.BASE6", "INFvsMOCK.BASE12", "INFvsMOCK.BASE24", "INFvsMOCK.BASE36", "INFvsMOCK.BASE48")

and second for line TRA

c("INFvsMOCK.TRA6", "INFvsMOCK.TRA12", "INFvsMOCK.TRA24", "INFvsMOCK.TRA36", "INFvsMOCK.TRA48")

?

Otherwise, if I must test interaction for each line separately then fdr correction would be incorrect (taking all comparisons as a whole).

In the meantime I realized that it is possible to make interaction contrasts comparing sequential time points. Is it statistically sound? Is it better/worse than interaction relative to time 3 (first time point)?

For example

INFvsMOCK.BASE12 = (BASE.INF.6 - BASE.INF.3) - (BASE.MOCK.6 - BASE.MOCK.3)
INFvsMOCK.BASE12 = (BASE.INF.12 - BASE.INF.6) - (BASE.MOCK.12 - BASE.MOCK.6)
INFvsMOCK.BASE24 = (BASE.INF.24 - BASE.INF.12) - (BASE.MOCK.24 - BASE.MOCK.12)
INFvsMOCK.BASE36 = (BASE.INF.36 - BASE.INF.24) - (BASE.MOCK.36 - BASE.MOCK.24)
INFvsMOCK.BASE48 = (BASE.INF.48 - BASE.INF.36) - (BASE.MOCK.48 - BASE.MOCK.36)

As of pairwise comparison of lines' response to treatment I think about two things. i. Testing simultaneously following contrasts (is it three-way interaction?) example for lines BASE and TRA

INFvsMOCK.BASE.TRA6 = (BASE.INF.6 - BASE.INF.3) - (BASE.MOCK.6 - BASE.MOCK.3) - (TRA.INF.6 - TRA.INF.3) - (TRA.MOCK.6 - TRA.MOCK.3)
INFvsMOCK.BASE12.TRA = (BASE.INF.12 - BASE.INF.3) - (BASE.MOCK.12 - BASE.MOCK.3) - (TRA.INF.12 - TRA.INF.3) - (TRA.MOCK.12 - TRA.MOCK.3)
INFvsMOCK.BASE24.TRA = (BASE.INF.24 - BASE.INF.3) - (BASE.MOCK.24 - BASE.MOCK.3) - (TRA.INF.24 - TRA.INF.3) - (TRA.MOCK.24 - TRA.MOCK.3)
INFvsMOCK.BASE36.TRA = (BASE.INF.36 - BASE.INF.3) - (BASE.MOCK.36 - BASE.MOCK.3) - (TRA.INF.36 - TRA.INF.3) - (TRA.MOCK.36 - TRA.MOCK.3)
INFvsMOCK.BASE48.TRA = (BASE.INF.48 - BASE.INF.3) - (BASE.MOCK.48 - BASE.MOCK.3) - (TRA.INF.48 - TRA.INF.3) - (TRA.MOCK.48 - TRA.MOCK.3)

ii. Clustering of expression profiles

ADD REPLY • link 12 months ago boczniak767 ▴ 720