limma: coefficient not estimable when dataset reduced by a few samples
2
0
Entering edit mode
Natasha ▴ 440
@natasha-4640
Last seen 10.3 years ago
Dear List, I have array data for 42 tumorBT - tumorAT patients. Initial analysis (adding patient and batch covariates to the model) I managed to obtain a list of DE genes. However now, I need to reduce this set of 42 to 33 patients pairs and reanalyse the data. This time though it seems that it can't estimate the coefficient of one patient, would this not affect my results as NA coefficients are produced? What I do not understand is what has changed for this to happen now? How do I rectify it? Code: dim(tum.dat.red) 27555 66 ## Overall Sample Descriptions >batch = as.factor(t.info.red$Batch) batch [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 >treat = droplevels(t.info.red$Status_Treatment) >treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT")) treat [1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [64] Tumour_AT Tumour_AT Tumour_AT Levels: Tumour_BT Tumour_AT >patient = as.factor(t.info.red$Trial.number) patient [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34 55 45 [26] 55 54 54 37 57 37 57 35 52 35 52 17 7 12 10 14 11 16 15 14 16 15 17 7 12 [51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48 33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36 ... 57 ## Deisgning model with BATCH effect >design.tum = model.matrix(~treat+batch+patient) ## Fit model & make contrasts > fit.tum <- lmFit(tum.dat.red, design.tum) Coefficients not estimable: patient57 Warning message: Partial NA coefficients for 27555 probe(s) sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.12.0 loaded via a namespace (and not attached): [1] tools_2.15.0 Many Thanks in advance, Natasha
• 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Natasha, On 8/3/2012 8:11 AM, Natasha Sahgal wrote: > Dear List, > > I have array data for 42 tumorBT - tumorAT patients. Initial analysis > (adding patient and batch covariates to the model) I managed to obtain a > list of DE genes. > > However now, I need to reduce this set of 42 to 33 patients pairs and > reanalyse the data. This time though it seems that it can't estimate the > coefficient of one patient, would this not affect my results as NA > coefficients are produced? > > What I do not understand is what has changed for this to happen now? How > do I rectify it? You don't give any information about the design matrix with all 42 patients, but I wonder how you were able to fit a paired analysis with a batch effect in that case either. When you fit a paired analysis by estimating a per-patient block effect, you by definition are already estimating the maximum number of coefficients that you can with the number of samples in hand. As an example: > treat <- factor(rep(1:2, 42)) > batch <- factor(rep(1:2, each = 42)) > patient <- factor(rep(1:42, each= 2)) > is.fullrank(model.matrix(~treat+patient)) [1] TRUE > is.fullrank(model.matrix(~treat+batch+patient)) [1] FALSE So I don't see how you will be able to fit a paired analysis by blocking on patient if you have a batch effect as well, regardless of the number of patients. However, you can always do a conventional paired analysis, by first calculating the paired differences for each patient and then fitting the model. Note that in this case the coefficient you are estimating is the intra-pair difference, and you are testing that this difference is different from zero (or in other words, the contrast is implicit in the coefficient, so you don't need contrasts.fit()). Best, Jim > > Code: > dim(tum.dat.red) > 27555 66 > > ## Overall Sample Descriptions >> batch = as.factor(t.info.red$Batch) > batch > [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 1 2 2 > [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > Levels: 1 2 > > >> treat = droplevels(t.info.red$Status_Treatment) >> treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT")) > treat > [1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT > [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT > [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT > [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT > [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT > [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT > [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT > [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT > [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT > [64] Tumour_AT Tumour_AT Tumour_AT > Levels: Tumour_BT Tumour_AT > > >> patient = as.factor(t.info.red$Trial.number) > patient > [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34 > 55 45 > [26] 55 54 54 37 57 37 57 35 52 35 52 17 7 12 10 14 11 16 15 14 16 15 17 > 7 12 > [51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48 > 33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36 > ... 57 > > > ## Deisgning model with BATCH effect >> design.tum = model.matrix(~treat+batch+patient) > ## Fit model& make contrasts >> fit.tum<- lmFit(tum.dat.red, design.tum) > Coefficients not estimable: patient57 > Warning message: > Partial NA coefficients for 27555 probe(s) > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.12.0 > > loaded via a namespace (and not attached): > [1] tools_2.15.0 > > > > Many Thanks in advance, > Natasha > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 48 minutes ago
WEHI, Melbourne, Australia
Dear Natasha, The results limma has given you are completely correct. As James MacDonald has explained, the reason for the warning is that you have put too many variables into the model. There is no need to correct for the batch effect, because you are already correcting for baseline differences between the patients, and this includes the batch effect because the patients are different in the two batches. You will have got exactly the same warning even with 42 patients, unless one or more of the patient appeared in both batches. When you put the unnecessary batch term into model, limma automatically adjusts the model for you, removing the unnecessary terms, and gives you exactly the same paired t-test that you would have got had you not added the batch effect. The warning is just to let you know that an adjustment was made. In other words, limma gives you the correct test whether or not you put in extra unnecessary between-patient terms. The fact that NA coefficients are produced has no effect at all on your tests for the treatment effect. Best wishes Gordon ----------- original message ------------- [BioC] limma: coefficient not estimable when dataset reduced by a few samples Natasha Sahgal nsahgal at well.ox.ac.uk Fri Aug 3 14:11:31 CEST 2012 Dear List, I have array data for 42 tumorBT - tumorAT patients. Initial analysis (adding patient and batch covariates to the model) I managed to obtain a list of DE genes. However now, I need to reduce this set of 42 to 33 patients pairs and reanalyse the data. This time though it seems that it can't estimate the coefficient of one patient, would this not affect my results as NA coefficients are produced? What I do not understand is what has changed for this to happen now? How do I rectify it? Code: dim(tum.dat.red) 27555 66 ## Overall Sample Descriptions >batch = as.factor(t.info.red$Batch) batch [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 >treat = droplevels(t.info.red$Status_Treatment) >treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT")) treat [1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [64] Tumour_AT Tumour_AT Tumour_AT Levels: Tumour_BT Tumour_AT >patient = as.factor(t.info.red$Trial.number) patient [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34 55 45 [26] 55 54 54 37 57 37 57 35 52 35 52 17 7 12 10 14 11 16 15 14 16 15 17 7 12 [51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48 33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36 ... 57 ## Deisgning model with BATCH effect >design.tum = model.matrix(~treat+batch+patient) ## Fit model & make contrasts > fit.tum <- lmFit(tum.dat.red, design.tum) Coefficients not estimable: patient57 Warning message: Partial NA coefficients for 27555 probe(s) sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.12.0 loaded via a namespace (and not attached): [1] tools_2.15.0 Many Thanks in advance, Natasha ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Dear Prof. Gordon, Thank you for your response. I think I know now based on what you said below what went wrong. In the first instance where I have 42 patient pairs, there are a couple of tumour subtypes which are all taken into consideration. Here, as you mentioned I have more than one patient appearing in both batches. Hence I think ~treat+patient+batch worked and did not throw any errors. >head(tumour) Trial.number Tumour.type Status_Treatment Batch 1 50 SCC Tumour_BT 1 2 13 adeno Tumour_AT 1 5 50 SCC Tumour_AT 1 6 25 adeno Tumour_BT 1 9 13 adeno Tumour_BT 1 10 25 adeno Tumour_AT 1 > dim(tumour) [1] 84 4 >table(tumour$Trial.number,tumour$Batch) 1 2 7 0 2 10 0 2 11 0 2 12 0 2 13 2 0 14 0 2 15 0 2 16 0 2 17 0 2 18 2 0 20 0 2 21 1 1 23 0 2 24 0 2 25 2 0 26 0 2 27 1 1 28 2 0 29 1 1 30 2 0 31 1 1 32 2 0 34 2 0 35 2 0 36 0 2 37 2 0 39 2 0 40 0 2 41 1 1 42 2 0 44 1 1 45 2 0 46 0 2 48 0 2 50 2 0 51 2 0 52 2 0 53 2 0 54 2 0 55 2 0 56 2 0 57 2 0 Now however, when I reduce to 33 patient pairs, I look at only one tumor subtype where none of the patients belong to different batches. Thus I see now that in this case batch is an unnecessary term in the model. Thus all I need in this case is: ~treat+patient >head(tumour2) Trial.number Tumour.type Status_Treatment Batch 2 13 adeno Tumour_AT 1 6 25 adeno Tumour_BT 1 9 13 adeno Tumour_BT 1 10 25 adeno Tumour_AT 1 14 18 adeno Tumour_AT 1 18 28 adeno Tumour_BT 1 > dim(tumour2) [1] 66 4 > table(tumour2$Trial.number,tumour2$Batch) 1 2 7 0 2 10 0 2 11 0 2 12 0 2 13 2 0 14 0 2 15 0 2 16 0 2 17 0 2 18 2 0 20 0 2 23 0 2 24 0 2 25 2 0 26 0 2 28 2 0 30 2 0 32 2 0 34 2 0 35 2 0 36 0 2 37 2 0 39 2 0 40 0 2 42 2 0 45 2 0 48 0 2 51 2 0 52 2 0 53 2 0 54 2 0 55 2 0 57 2 0 Also thank you for explaining what the warning actually means and that limma output has no effect on it. Best Wishes, Natasha -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: 05 August 2012 03:11 To: Natasha Sahgal Cc: Bioconductor mailing list Subject: limma: coefficient not estimable when dataset reduced by a few samples Dear Natasha, The results limma has given you are completely correct. As James MacDonald has explained, the reason for the warning is that you have put too many variables into the model. There is no need to correct for the batch effect, because you are already correcting for baseline differences between the patients, and this includes the batch effect because the patients are different in the two batches. You will have got exactly the same warning even with 42 patients, unless one or more of the patient appeared in both batches. When you put the unnecessary batch term into model, limma automatically adjusts the model for you, removing the unnecessary terms, and gives you exactly the same paired t-test that you would have got had you not added the batch effect. The warning is just to let you know that an adjustment was made. In other words, limma gives you the correct test whether or not you put in extra unnecessary between-patient terms. The fact that NA coefficients are produced has no effect at all on your tests for the treatment effect. Best wishes Gordon ----------- original message ------------- [BioC] limma: coefficient not estimable when dataset reduced by a few samples Natasha Sahgal nsahgal at well.ox.ac.uk Fri Aug 3 14:11:31 CEST 2012 Dear List, I have array data for 42 tumorBT - tumorAT patients. Initial analysis (adding patient and batch covariates to the model) I managed to obtain a list of DE genes. However now, I need to reduce this set of 42 to 33 patients pairs and reanalyse the data. This time though it seems that it can't estimate the coefficient of one patient, would this not affect my results as NA coefficients are produced? What I do not understand is what has changed for this to happen now? How do I rectify it? Code: dim(tum.dat.red) 27555 66 ## Overall Sample Descriptions >batch = as.factor(t.info.red$Batch) batch [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 >treat = droplevels(t.info.red$Status_Treatment) >treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT")) treat [1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [64] Tumour_AT Tumour_AT Tumour_AT Levels: Tumour_BT Tumour_AT >patient = as.factor(t.info.red$Trial.number) patient [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34 55 45 [26] 55 54 54 37 57 37 57 35 52 35 52 17 7 12 10 14 11 16 15 14 16 15 17 7 12 [51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48 33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36 ... 57 ## Deisgning model with BATCH effect >design.tum = model.matrix(~treat+batch+patient) ## Fit model & make contrasts > fit.tum <- lmFit(tum.dat.red, design.tum) Coefficients not estimable: patient57 Warning message: Partial NA coefficients for 27555 probe(s) sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.12.0 loaded via a namespace (and not attached): [1] tools_2.15.0 Many Thanks in advance, Natasha ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY

Login before adding your answer.

Traffic: 276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6