Question

How to conduct maSigPro analysis with only one condition?

0

Entering edit mode

Jiahao Tian • 0

@b9a7fe93

Last seen 4 months ago

Netherlands

Hi all,

I am trying to use maSigPro to conduct time-course analyis with only one condition. However, it is not feasible. I have checked the guideline or other blogs, but there is no answer. I wonder how to realize this. Thank you!

I use the example data here for practice.

library(maSigPro)
data(data.abiotic) 
data(edesign.abiotic)
edesign.abiotic <- as.data.frame(edesign.abiotic)
design.n <- edesign.abiotic[1:9,1:3]
data.n <- data.abiotic[,rownames(design.n)]

colnames(design.n) <-c("Time","Replicate","Group")
design <- make.design.matrix(design.n)
design$groups.vector
# [1] "Group" "Group"
head(design.n)

#             Time  Replicate   Group
#Control_3H_1   3   1   1   
#Control_3H_2   3   1   1   
#Control_3H_3   3   1   1   
#Control_9H_1   9   2   1   
#Control_9H_2   9   2   1   
#Control_9H_3   9   2   1   

head(data.n)
#        Control_3H_1  Control_3H_2  Control_3H_3  Control_9H_1  Control_9H_2
#STMDF90    0.13735714  -0.36530651 -0.15329448 0.44754535  0.287476796 
#STMCJ24    NA  NA  NA  NA  NA  
#STMJH42    0.07864449  0.10023285  -0.17365488 -0.25279484 0.184855409 
#STMDE66    0.22976991  0.47409748  0.46930716  0.37101059  -0.004992029    
#STMIX74    0.14407618  -0.48018637 -0.07847999 0.05692331  0.013045420 
#STMCL34    0.06494078  0.04472986  0.04280790  0.16128091  0.088048892 

fit <- p.vector(data.n,
                design,
                Q = 0.05,
                MT.adjust = "BH", 
                min.obs = 20)

# Error in dat[i, ] : subscript out of bounds

maSigPro masig • 867 views

ADD COMMENT • link updated 15 months ago by James W. MacDonald 68k • written 15 months ago by Jiahao Tian • 0

score 2 · Accepted Answer · 2024-10-09

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

You have a data.frame with 9 columns and are specifying a minimum of 20 observations per gene. Since you don't have 20 observations for any gene, it fails.

> fit <- p.vector(data.n,
                design,
                Q = 0.05,
                MT.adjust = "BH", min.obs = 9)
[1] "fitting  gene 100 out of 725"
[1] "fitting  gene 200 out of 725"
[1] "fitting  gene 300 out of 725"
[1] "fitting  gene 400 out of 725"
[1] "fitting  gene 500 out of 725"
[1] "fitting  gene 600 out of 725"
[1] "fitting  gene 700 out of 725"
[1] "no significant genes"
>

ADD COMMENT • link 15 months ago James W. MacDonald 68k

0

Entering edit mode

Yes, you are right! Great thanks! BTW, according to your working experience on maSigPro, do you think it is suitable for differential analysis of time-series RNA-seq data with only one condition or group design? I'm a beginner with this package so I'm not sure if this is appropriate. Thank you again!

ADD REPLY • link 15 months ago Jiahao Tian • 0

1

Entering edit mode

Yeah, it's fine. What maSigPro does in this case is to reformulate the data. Your design has columns that are not useful, and maSigPro is smart enough to ignore them. If you run it under the debugger, you can see what happens to your design matrix:

Browse[1]> design
$dis
              Time Time2
Control_3H_1     3     9
Control_3H_2     3     9
Control_3H_3     3     9
Control_9H_1     9    81
Control_9H_2     9    81
Control_9H_3     9    81
Control_27H_1   27   729
Control_27H_2   27   729
Control_27H_3   27   729

$groups.vector
[1] "Group" "Group"

$edesign
              Time Replicate Group
Control_3H_1     3         1     1
Control_3H_2     3         1     1
Control_3H_3     3         1     1
Control_9H_1     9         2     1
Control_9H_2     9         2     1
Control_9H_3     9         2     1
Control_27H_1   27         3     1
Control_27H_2   27         3     1
Control_27H_3   27         3     1

And then for each gene it fits a model using the reformulated data, which if you look at design$dis is fitting a linear and a quadratic term (time + time^2) to account for any possible curvature.

 model.glm <- glm(y ~ ., data = dis, family = family, epsilon = epsilon)

And the fitted model looks like this, where you can see that it's estimating an intercept, a time and a time^ component:

 model.glm

Call:  glm(formula = y ~ ., family = family, data = dis, epsilon = epsilon)

Coefficients:
(Intercept)         Time        Time2  
  -0.456452     0.121108    -0.003772  

Degrees of Freedom: 8 Total (i.e. Null);  6 Residual
Null Deviance:      0.5132 
Residual Deviance: 0.1998   AIC: -0.7258

It then fits a reduced model

 model.glm.0

Call:  glm(formula = y ~ 1, family = family, epsilon = epsilon)

Coefficients:
(Intercept)  
    0.08806  

Degrees of Freedom: 8 Total (i.e. Null);  8 Residual
Null Deviance:      0.5132 
Residual Deviance: 0.5132   AIC: 3.762

That doesn't contain any time coefficients, and then uses the anova function to do a Chi-square test comparing the full and reduced models, which in effect asks if the model that includes time fits the data better than a model with just an intercept (in this case a flat line at 0.088).

 anova(model.glm.0, model.glm, test = "Chisq")
Analysis of Deviance Table

Model 1: y ~ 1
Model 2: y ~ Time + Time2
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
1         8    0.51318                        
2         6    0.19985  2  0.31333 0.009064 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And you can see that this gene is significant before adjusting for multiplicity, but will not be significant after the adjustment.

This test is a likelihood ratio test (LRT), and is not that common for a gaussian fit - people normally use t-tests for conventional linear regression, like this:

 summary(model.glm)

Call:
glm(formula = y ~ ., family = family, data = dis, epsilon = epsilon)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.456452   0.194693  -2.344   0.0575 .
Time         0.121108   0.039486   3.067   0.0220 *
Time2       -0.003772   0.001244  -3.033   0.0230 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.03330824)

    Null deviance: 0.51318  on 8  degrees of freedom
Residual deviance: 0.19985  on 6  degrees of freedom
AIC: -0.72585

Where you can see that the p-values for both the linear and quadratic term are significant. But maSigPro allows you to use count data as well, in which case you would use either the negative binomial or Poisson distributions, and in which case one could argue that the LRT is a better statistic than the Wald. In addition, using an LRT provides one p-value rather than two, so it's easier to say if a gene is significant or not based on a single p-value.

ADD REPLY • link 15 months ago James W. MacDonald 68k

1

Entering edit mode

Technically it's fitting an analysis of deviance (ANODEV) test, but the principle is the same - deviance, like the likelihood, measures goodness of fit for a linear regression. The ratio of the deviance (or likelihood) tells you how different the models fit the data, and under the null is distributed Chi-square, so you can test for a difference in the model fit by comparing the ratio to a Chi-square distribution.

ADD REPLY • link 15 months ago James W. MacDonald 68k