Question: Correct creation of design matrix in limma regarding a multi-level experiment
gravatar for svlachavas
4.3 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Bioconductor community,

i will just refer to one of my previous posts as it is the basis of my new questions (Merge different datasets and perform differential expression analysis in limma) -in summary, regarding the merged dataset i have created, combining two colorectal cancer microarray affymetrix datasets[refering to the comparison of the patients cancer samples vs adjucent controls], i recently acquired important phenotypic information, regarding the anatomic location of the tumor of each patient. In detail,

                                   Disease      Location                  Meta_factor       Study
St_1_WL57.CEL           Normal       sigmoid_colon           0                      hgu133plus2
St_2_WL57.CEL           Cancer       sigmoid_colon           0                      hgu133plus2
St_N_EC59.CEL           Normal       sigmoid_colon           0                      hgu133plus2
St_T_EC59.CEL           Cancer       sigmoid_colon           0                      hgu133plus2
St_N_EJ58.CEL           Normal               cecum           0                         hgu133plus2
St_T_EJ58.CEL           Cancer               cecum           0                         hgu133plus2
St_N_FD58.CEL           Normal              rectum           0                         hgu133plus2
St_T_FD58.CEL           Cancer              rectum           0                         hgu133plus2
St_N_FG39.CEL           Normal               cecum           1                        hgu133plus2
St_T_FG39.CEL           Cancer               cecum           1                        hgu133plus2
St_1_GG52.CEL           Normal               cecum           1                        hgu133plus2
St_2_GG52.CEL           Cancer               cecum           1                        hgu133plus2
St_N_HJ33.CEL           Normal    descending_colon           0                   hgu133plus2
St_T_HJ33.CEL           Cancer    descending_colon           0                   hgu133plus2
St_N_HK57.CEL           Normal     ascending_colon           0                   hgu133plus2
St_T_HK57.CEL           Cancer     ascending_colon           0                   hgu133plus2
St_N_KA49.CEL           Normal    descending_colon           0                  hgu133plus2
St_T_KA49.CEL           Cancer    descending_colon           0                  hgu133plus2
St_N_KW40.CEL           Normal       sigmoid_colon           0                   hgu133plus2
St_T_KW40.CEL           Cancer       sigmoid_colon           0                   hgu133plus2
St_1_LH53.CEL           Normal    descending_colon           0                  hgu133plus2
St_2_LH53.CEL           Cancer    descending_colon           0                  hgu133plus2
St_N_LW46.CEL           Normal               cecum           0                      hgu133plus2
St_T_LW46.CEL           Cancer               cecum           0                      hgu133plus2
St_N_NH45.CEL           Normal  rectosigmoid_colon           0                 hgu133plus2
St_T_NH45.CEL           Cancer  rectosigmoid_colon           0                 hgu133plus2
St_N_PH42.CEL           Normal     ascending_colon           0                 hgu133plus2
St_T_PH42.CEL           Cancer     ascending_colon           0                 hgu133plus2
St_1_SSch60.CEL         Normal              rectum           0                     hgu133plus2
St_2_SSch60.CEL         Cancer              rectum           0                     hgu133plus2
St_N_WH37.CEL           Normal     ascending_colon           0                hgu133plus2
St_T_WH37.CEL           Cancer     ascending_colon           0                hgu133plus2
St_N_WM60.CEL           Normal right_colic_flexure           0                 hgu133plus2
St_T_WM60.CEL           Cancer right_colic_flexure           0                 hgu133plus2
7_Tesch.CEL             Normal               cecum           0                       hgu133a
8_Tesch.CEL             Cancer               cecum           0                       hgu133a
0687_04_Blauth.CEL      Normal  rectosigmoid_colon           0              hgu133a
0687_04_Blauth_1.CEL    Cancer  rectosigmoid_colon           0            hgu133a
0863_03_Schmidt.CEL     Normal     ascending_colon           0             hgu133a
0863_03_Schmidt_1.CEL   Cancer     ascending_colon           0           hgu133a
0948_04_Leiber.CEL      Normal     ascending_colon           0               hgu133a
0948_04_Leiber_1.CEL    Cancer     ascending_colon           0             hgu133a
1043_04_Nagel.CEL       Normal right_colic_flexure           0                hgu133a
1043_04_Nagel_1.CEL     Cancer right_colic_flexure           0              hgu133a
1103_03_Braun.CEL       Normal              rectum           1                  hgu133a
1103_03_Braun_1.CEL     Cancer              rectum           1                hgu133a
1234_06_Floersch.CEL    Normal       sigmoid_colon           0             hgu133a
1234_06_Floersch_1.CEL  Cancer       sigmoid_colon           0           hgu133a
1235_06_Hey.CEL         Normal    transverse_colon           0            hgu133a
1235_06_Hey_1.CEL       Cancer    transverse_colon           0          hgu133a
1236_06_Liebich.CEL     Normal               cecum           1               hgu133a
... (i dont include the rest in order to be below the word limit-in total 60 paired samples/30 patients)

So,my first important question is the below:

1. If i want just to perform a paired analysis for cancer vs control samples, adjusted for the tumor location, i believe the following is correct:

condition <- factor(eset.2$Disease, levels=c("Normal","Cancer")) 
pairs <- factor(rep(1:30, each = 2))

location <- eset_COMBAT$Location

[1] "factor"


         St_1_WL57.CEL          St_2_WL57.CEL          St_N_EC59.CEL 
         sigmoid_colon          sigmoid_colon          sigmoid_colon 
         St_T_EC59.CEL          St_N_EJ58.CEL          St_T_EJ58.CEL 
         sigmoid_colon                  cecum                  cecum 
         St_N_FD58.CEL          St_T_FD58.CEL          St_N_FG39.CEL 
                rectum                 rectum                  cecum 
9 Levels: ascending_colon cecum descending_colon ... transverse_colon

and then:

design <- model.matrix(~condition +pairs+location)

fit <- lmFit(eset_COMBAT,design)

but then i get the following message:

Coefficients not estimable: locationcecum locationdescending_colon locationleft_colic_flexure locationrectosigmoid_colon locationrectum locationright_colic_flexure locationsigmoid_colon locationtransverse_colon 
Warning message:
Partial NA coefficients for 22277 probe(s) 

Any help or suggestions would be essential !!

ADD COMMENTlink modified 4.3 years ago by Gordon Smyth39k • written 4.3 years ago by svlachavas740
Answer: Correct creation of design matrix in limma regarding a multi-level experiment
gravatar for Gordon Smyth
4.3 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

There is no need to include location in the model (and it is completely pointless to do so). You are already allowing every pair to have a different baseline, and hence all the locations are automatically different as well because they are associated with different pairs.

Just removing location will fix the problem. limma has done this for you automatically anyway, but removing it yourself will get rid of the warning.

There is also no strong need to correct for batch effect between the platforms, for the same reason -- adjusting for pairs already does everything.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Gordon Smyth39k

Dear Mr. Smyth,

firstly thank you for your recommendations-regarding the batch effect between the platforms, i used the guidelines from the inSilicoMerging R package when i merged the two datasets, based on their common probesets, because i wasnt sure about how to adjust for this effect-so your explanation gave me more confidence on this specific issue. On the other hand, regarding the issue with the location information, this leads me to my second very important question:

In detail, as in my current project we are highly motivated in implementing this information in our microarray analysis, as from both biology field and other perspectives, we are interested in identifying-if it is possible and regarding the results- any crusial and interesting gene expression alteration patterns between different anatomic locations of the patiens colorectal tumors involved in our analysis(i.e ascending vs descending colon, which could possibly underline fundamental functional and distinct pathways-groups of genes involved). Thus, i tried in the first place to incorporate in a wrong way this information.

On the other hand, as i searched from the limma vignette[in particular section 9.7, page 48], i thought maybe a similar implementation like the specific tutorial with the multifactor analysis would be correct, in order to perform two possible analyses:

1. To compare specific tumor (locations) vs their adjucent control samples(i.e ascending.tumor vs ascending.control)

2.If it is possible compare different tumor anatomic locations(i.e ascending vs descending tumor samples)


So, if my idea/implementation although naive is correct without any incorrect points, should i proceed like this (?):

combine <- factor(paste(eset_COMBAT$Disease, eset_COMBAT$Location, sep="."))

design <- model.matrix(~0 +combine)

colnames(design) <- levels(combine)


and then to estimate the correlation between the matched samples on the same patient:

corfit <- duplicateCorrelation(eset_COMBAT, design, block=pairs) ?

or regarding the block argument should i firstly incorporate this information in my phenoData object:

eset_COMBAT$Subjects <- pairs

and then corfit <- duplicateCorrelation(eset_COMBAT, design, block=eset_COMBAT$Subjects) ?


AND if the above are fine,

move  with fit <- lmFit(eset_COMBAT,design,block=eset_COMBAT$Subjects,correlation=corfit$consensus) 

and afterwards define the necessary contrasts with makeContrasts etc ??

To summarize, is my implementation regarding the use of the tumor location correct or it is different from the current tutorial i mentioned ??

Thank you for your patience on this matter

Any suggestions or evaluation regarding

my above answer would be beneficial for my analysis!



ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by svlachavas740
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 358 users visited in the last hour