The coefficients not estimable
1
0
Entering edit mode
KJ Lim ▴ 420
@kj-lim-5288
Last seen 3.7 years ago
Finland

Good day.

I have two cultivar tree samples at different stages collected from area N and Z. The trees at area Z is 12-year-old while the trees at area Z is 6-year-old. Below is the description of my samples.

files   cv  stage   area    age description
./M1_St5.count.txt  M   5st N   12  stage5
./M2_St5.count.txt  M   5st N   12  stage5
./M3_St5.count.txt  M   5st N   12  stage5
./P1_St5.count.txt  P   5st Z   6   stage5
./P4_St5.count.txt  P   5st Z   6   stage5
./P5_St5.count.txt  P   5st Z   6   stage5
./M1_St6.count.txt  M   6st N   12  stage6
./M2_St6.count.txt  M   6st N   12  stage6
./M3_St6.count.txt  M   6st N   12  stage6
./P1_St6.count.txt  P   6st Z   6   stage6
./P4_St6.count.txt  P   6st Z   6   stage6
./P5_St6.count.txt  P   6st Z   6   stage6

I defined the factors like:

> stage <- factor(nt$sample$stage)
> area <- factor(nt$sample$area)
> age <- factor(nt$sample$age, level=c("12","6"))
> cv <- factor(nt$sample$cv)

I want to learn genes that response to area effect together with stage factor at FDR < 0.05; hence, I described the experiment setup like below and continued with common, trended, tagwise dispersions estimation. I can proceed with my test with no error message reported.

> nt.model2 <- model.matrix(~area+stage)
> nt.model2
   (Intercept) areaZ stageSt6
1            1     0        0
2            1     0        0
3            1     0        0
4            1     1        0
5            1     1        0
6            1     1        0
7            1     0        1
8            1     0        1
9            1     0        1
10           1     1        1
11           1     1        1
12           1     1        1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$area
[1] "contr.treatment"

> nt.m2 <- estimateGLMCommonDisp(nt, nt.model2, verbose=T)
Disp = 0.165 , BCV = 0.407
> nt.m2 <- estimateGLMTrendedDisp(nt.m2, nt.model2)
> nt.m2 <- estimateGLMTagwiseDisp(nt.m2, nt.model2)
> nt.m2.areaStage <- glmLRT(nt.m2.fit)
> summary(decideTestsDGE(nt.m2.areaStage, adjust.method="BH", p.value=0.05))

However, when I want to learn genes that response to area effect together with age factor with the experiment setup below, I received an error message when I continued with dispersion estimation.

> nt.model3 <- model.matrix(~area+age)
> nt.model3
   (Intercept) areaZ age6
1            1     0    0
2            1     0    0
3            1     0    0
4            1     1    1
5            1     1    1
6            1     1    1
7            1     0    0
8            1     0    0
9            1     0    0
10           1     1    1
11           1     1    1
12           1     1    1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$area
[1] "contr.treatment"
attr(,"contrasts")$age
[1] "contr.treatment"

> nt.m3 <- estimateGLMCommonDisp(nt, nt.model3, verbose=T)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 age6

Could the community please help in this case?

Thanks for your time and experience sharing.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.20.9 limma_3.34.9
edger • 862 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia

Look at the nt.model3 matrix -- you can see that the second and third columns are identical.

Your experiment has just four distinct groups with three replicates in each group. You should define just one factor to distinguish the four groups and use contrasts to make comparisons between the four groups, i.e., between the levels of the factor. Fitting additive models with multiple factors is a bit pointless.

ADD COMMENT
0
Entering edit mode

Dear Prof. Gordon, thanks for the feedback. Can you please share with me the right way to test for the genes that respond to area effect together with the age factor? Thank you for your time.

ADD REPLY
0
Entering edit mode

area and age are completely confounded with one another in your experiment. It is obviously impossible to separate the effects of the two factors.

ADD REPLY
0
Entering edit mode

I see. That's pitty then. Thanks for your prompt reply.

ADD REPLY
0
Entering edit mode

I have edited my answer. Just treat your experiment as having four groups.

ADD REPLY
0
Entering edit mode

All right, good to know that. Thanks for your suggestion and your time 🙂

ADD REPLY

Login before adding your answer.

Traffic: 339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6