I have a question on how to used design matrix in edgeR to perform differential expression analysis on continuous factor.
Here is the description of my data. I have one continuous factor measuring the level of protein A by immunostaining.
And the question i would like to ask is :
Is there any genes differentially expressed (highly affected by protein A) according to the level of protein A measured by immunostaining .
Here is what the data looks like,
> protein=c(rnorm(n = 10,mean = 10,1),rnorm(n = 10,mean = 5,1.5)) > design1=model.matrix(~protein) > head(design1) (Intercept) protein 1 1 11.165201 2 1 9.504538 3 1 10.516862 4 1 11.914443 5 1 10.842974 6 1 10.311306 > design2=model.matrix(~0+protein) > head(design2) protein 1 11.165201 2 9.504538 3 10.516862 4 11.914443 5 10.842974 6 10.311306
I would like understand the differences and underlying assumptions between these two design matrix.
Does design2 assumes that when the expression of protein A is 0, the gene expression level is also zero?
Where as design1 gives more correct assumption that for each gene, there will be one estimated expression level at zero protein A?
In addition, i will perform glmQLFTest(fit_design1a, coef=2) to conduct the differential expression analysis. But i am not sure how to interpret the logFC calculated here. Since the factor here is continuous, do we still interpret it as logFC ?
For example, a gene Matal1 that is significantly with logFC 0.40409621 , do I interpret it as the expression level of Mata1 increases in 0.4 log fold changes for every unit increase of protein A?