Question: single continuous factor
0
gravatar for ashley.lu
18 months ago by
ashley.lu0
ashley.lu0 wrote:

Dear all, 

I have a question on how to used design matrix in edgeR to  perform differential expression analysis on continuous factor.

Here is the description of my data. I have one continuous factor measuring the level of protein A by immunostaining.  

And the question i would like to ask is :

Is there any genes differentially expressed (highly affected by protein A) according to the level of protein A measured by immunostaining .

Here is what the data looks like,

> protein=c(rnorm(n = 10,mean = 10,1),rnorm(n = 10,mean = 5,1.5))
> design1=model.matrix(~protein)
> head(design1)
  (Intercept)   protein
1           1 11.165201
2           1  9.504538
3           1 10.516862
4           1 11.914443
5           1 10.842974
6           1 10.311306
> design2=model.matrix(~0+protein)
> head(design2)
    protein
1 11.165201
2  9.504538
3 10.516862
4 11.914443
5 10.842974
6 10.311306

 

I would like understand the differences and underlying assumptions  between these two design matrix.

Does design2 assumes that when the expression of protein A is 0, the gene expression level is also zero? 

Where as design1 gives more correct assumption that for each gene, there will be one estimated expression level at zero protein A?

In addition, i will perform  glmQLFTest(fit_design1a, coef=2)  to conduct the differential expression analysis. But i am not sure how to interpret the logFC calculated here. Since the factor here is continuous, do we still interpret it as logFC ?

For example, a gene Matal1 that is significantly with logFC 0.40409621  do I interpret it as the expression level of Mata1 increases in 0.4 log fold changes for every unit increase of protein A?  

ADD COMMENTlink modified 18 months ago by Gordon Smyth39k • written 18 months ago by ashley.lu0
Answer: single continuous factor
2
gravatar for Aaron Lun
18 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Does design2 assumes that when the expression of protein A is 0, the gene expression level is also zero?

Yes, design2 assumes that when your staining intensity for protein A, the log-average count is also zero, i.e., the expected count is 1.

Where as design1 gives more correct assumption that for each gene, there will be one estimated expression level at zero protein A?

Yes, this is handled by the intercept, which accommodates some non-zero log-average count at zero staining.

Since the factor here is continuous, do we still interpret it as logFC ?

Yes, it's the log-fold change in expression for every unit of increase in protein A staining.

ADD COMMENTlink written 18 months ago by Aaron Lun25k
Answer: single continuous factor
1
gravatar for Gordon Smyth
18 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Ashley,

design1 is the standard design matrix for simple linear regression, so it is the one to use. design2 is the design matrix for "regression through the origin", which (as you have guessed) is not likely to be what you want.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Gordon Smyth39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 375 users visited in the last hour