Question

The coefficients not estimable

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 5.4 years ago

Finland

Good day.

I have two cultivar tree samples at different stages collected from area N and Z. The trees at area Z is 12-year-old while the trees at area Z is 6-year-old. Below is the description of my samples.

files   cv  stage   area    age description
./M1_St5.count.txt  M   5st N   12  stage5
./M2_St5.count.txt  M   5st N   12  stage5
./M3_St5.count.txt  M   5st N   12  stage5
./P1_St5.count.txt  P   5st Z   6   stage5
./P4_St5.count.txt  P   5st Z   6   stage5
./P5_St5.count.txt  P   5st Z   6   stage5
./M1_St6.count.txt  M   6st N   12  stage6
./M2_St6.count.txt  M   6st N   12  stage6
./M3_St6.count.txt  M   6st N   12  stage6
./P1_St6.count.txt  P   6st Z   6   stage6
./P4_St6.count.txt  P   6st Z   6   stage6
./P5_St6.count.txt  P   6st Z   6   stage6

I defined the factors like:

> stage <- factor(nt$sample$stage)
> area <- factor(nt$sample$area)
> age <- factor(nt$sample$age, level=c("12","6"))
> cv <- factor(nt$sample$cv)

I want to learn genes that response to area effect together with stage factor at FDR < 0.05; hence, I described the experiment setup like below and continued with common, trended, tagwise dispersions estimation. I can proceed with my test with no error message reported.

> nt.model2 <- model.matrix(~area+stage)
> nt.model2
   (Intercept) areaZ stageSt6
1            1     0        0
2            1     0        0
3            1     0        0
4            1     1        0
5            1     1        0
6            1     1        0
7            1     0        1
8            1     0        1
9            1     0        1
10           1     1        1
11           1     1        1
12           1     1        1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$area
[1] "contr.treatment"

> nt.m2 <- estimateGLMCommonDisp(nt, nt.model2, verbose=T)
Disp = 0.165 , BCV = 0.407
> nt.m2 <- estimateGLMTrendedDisp(nt.m2, nt.model2)
> nt.m2 <- estimateGLMTagwiseDisp(nt.m2, nt.model2)
> nt.m2.areaStage <- glmLRT(nt.m2.fit)
> summary(decideTestsDGE(nt.m2.areaStage, adjust.method="BH", p.value=0.05))

However, when I want to learn genes that response to area effect together with age factor with the experiment setup below, I received an error message when I continued with dispersion estimation.

> nt.model3 <- model.matrix(~area+age)
> nt.model3
   (Intercept) areaZ age6
1            1     0    0
2            1     0    0
3            1     0    0
4            1     1    1
5            1     1    1
6            1     1    1
7            1     0    0
8            1     0    0
9            1     0    0
10           1     1    1
11           1     1    1
12           1     1    1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$area
[1] "contr.treatment"
attr(,"contrasts")$age
[1] "contr.treatment"

> nt.m3 <- estimateGLMCommonDisp(nt, nt.model3, verbose=T)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 age6

Could the community please help in this case?

Thanks for your time and experience sharing.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.20.9 limma_3.34.9

edger • 1.3k views

ADD COMMENT • link updated 5.4 years ago by Gordon Smyth 53k • written 5.4 years ago by KJ Lim ▴ 420

score 0 · Answer 1 · 2020-09-01

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Look at the nt.model3 matrix -- you can see that the second and third columns are identical.

Your experiment has just four distinct groups with three replicates in each group. You should define just one factor to distinguish the four groups and use contrasts to make comparisons between the four groups, i.e., between the levels of the factor. Fitting additive models with multiple factors is a bit pointless.

ADD COMMENT • link 5.4 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Prof. Gordon, thanks for the feedback. Can you please share with me the right way to test for the genes that respond to area effect together with the age factor? Thank you for your time.

ADD REPLY • link 5.4 years ago KJ Lim ▴ 420

0

Entering edit mode

area and age are completely confounded with one another in your experiment. It is obviously impossible to separate the effects of the two factors.

ADD REPLY • link 5.4 years ago Gordon Smyth 53k

0

Entering edit mode

I see. That's pitty then. Thanks for your prompt reply.

ADD REPLY • link 5.4 years ago KJ Lim ▴ 420

0

Entering edit mode

I have edited my answer. Just treat your experiment as having four groups.

ADD REPLY • link 5.4 years ago Gordon Smyth 53k

0

Entering edit mode

All right, good to know that. Thanks for your suggestion and your time 🙂

ADD REPLY • link 5.4 years ago KJ Lim ▴ 420