Search
Question: Interpretation of voom limma's intercept coefficient
0
2.2 years ago by
rubi90
rubi90 wrote:

Hi,

I have bacterial RNA-seq gene expression count data from from 4 different conditions (3 samples each), which differ from each other in the dosage of a certain gene which I manipulated.

Here's an example of the count data for three genes:

mat <- matrix(c(520,315,490,420,640,875,670,700,280,450,440,370,
5,6,1,3,10,6,10,5,2,7,4,1,
103230,112810,65160,96770,70830,108440,118970,98650,65500,117870,118630,78330),nrow=3)
colnames(mat) <- c("C_1","D_1","B_1","A_1","A_2","C_2","B_2","D_2","C_3","B_3","D_3","A_3")
rownames(mat) <- c("gene_1","gene_2","gene_3")

Here are the corresponding dosages (they are scaled such that the maximum is 1=100% and the minimum is 0=0%) of each sample:

dosage <- c(0.65,0,0.35,1,1,0.65,0.35,0,0.65,0.35,0,1)

And hence the design is:
design <- model.matrix(~dosage)

And then I fit voom:

dge <- DGEList(counts=mat)
dge.norm <- calcNormFactors(dge)
voom.norm <- voom(dge.norm,design,plot=TRUE)
voom.fit <- lmFit(voom.norm,design)
voom.fit <- eBayes(voom.fit)

My question is how to interpret the intercept of voom.fit$coefficients: > voom.fit An object of class "MArrayLM"$coefficients
(Intercept)      dosage
gene_1    18.53699 0.200114684
gene_2    18.45030 0.328356157
gene_3    18.13036 0.007383767

I take it that the dosage coefficient represent the log2 change in response (expression) to a unit of dosage, But the intercept values don't really seem like the mean log2 expression values.

Help would be appreciated,

rubi

Here's my sessionInfo:

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.9.6 piano_1.12.0     gridExtra_2.2.1  ggplot2_2.1.0    edgeR_3.14.0     limma_3.28.5

loaded via a namespace (and not attached):
[1] igraph_1.0.1        Rcpp_0.12.5         cluster_2.0.4       magrittr_1.5        BiocGenerics_0.18.0 munsell_0.4.3
[7] colorspace_1.2-6    plyr_1.8.4          caTools_1.17.1      tools_3.3.0         parallel_3.3.0      Biobase_2.32.0
[13] gtable_0.2.0        KernSmooth_2.23-15  marray_1.50.0       gtools_3.5.0        digest_0.6.9        sets_1.0-16
[19] bitops_1.0-6        slam_0.1-34         labeling_0.3        gdata_2.17.0        gplots_3.0.1        scales_0.4.0
[25] relations_0.6-6     chron_2.3-47

modified 2.2 years ago by James W. MacDonald48k • written 2.2 years ago by rubi90
3
2.2 years ago by
United States
James W. MacDonald48k wrote:

Since you are fitting dosage as a continuous variable, the intercept is the 'usual' intercept from the formula for a line. It's the (expected, because there are errors) logCPM value for a particular gene when the dosage is zero.

> head(cpm(dge, log = T))
C_1      D_1      B_1      A_1      A_2      C_2      B_2      D_2
gene_1 18.58216 17.72771 18.63134 18.44614 18.66854 17.26862 19.16603 19.15396
gene_2 17.85901 18.33538 18.69453 18.41372 18.93157 19.00557 18.16604 18.34661
gene_3 18.49643 18.78659 17.37262 18.16375 16.34664 18.26861 16.84413 16.34664
C_3      B_3      D_3      A_3
gene_1 18.48584 18.41933 18.68075 18.51419
gene_2 18.61387 17.96913 18.41055 18.52347
gene_3 17.82204 18.58359 17.81973 17.92464