Question

warnings or potential problems in limma procedure

0

Entering edit mode

Yi, Ming NIH/NCI [C] ▴ 100

@yi-ming-nihnci-c-4571

Last seen 10.6 years ago

United States

Hi, Dear List: I am still looking for some explanation or diagnosis about the following potential issue that I am not sure what I did is wrong or fine (I apologize if my previous post is not quite clear to the list) I am using limma to do the paired test on the following setting, my tar object looks like as below: > tar[1:5,] AccNum Patient_Type_Comb Type RACE ER 67 S10184 S10184_N_W_NEG N W NEG 66 S10184 S10184_T_W_NEG T W NEG 68 S10330 S10330_N_B_NEG N B NEG 69 S10330 S10330_T_B_NEG T B NEG 74 S10601 S10601_N_W_POS N W POS AccNum is the patient ID and the same patient have two types of samples: "N" for normal, "T" for tumor, Two Races in the sample population: W for "White", B for "Black" ER is for ER status: NEG for negative, POS for positive Patient_Type_Comb column is for showing the sample phenotype in one string The goal of the analysis is looking for the differential gene lists for each of the contrasts including ER positive tumor vs ER positive normal for matched same patient of only Black population (Africa America population), ER negative tumor vs ER negative normal for matched same patients of only White population (Caucasian population) etc as you can see more details in my design and contrast matrix setting (the key is need to consider the paired samples for the same patient with both tumor and normal samples (normal is surrounding normal tissue of the tumor tissue for the same patient), which is well controlled study. My data matrix (Partial, Array data) looks like the following: > mydata[1:5,1:4] S10184_N_W_NEG S10184_T_W_NEG S10330_N_B_NEG S10330_T_B_NEG 7936596 10.079810 10.810695 10.733401 11.369506 8037331 10.076718 10.217359 10.921994 10.389894 8023672 8.503989 8.786565 8.936260 9.384205 8128282 5.423744 4.826185 5.872070 4.486140 8063634 5.909231 6.773356 6.653584 6.408861 Here is how I set up my design and contrast matrix: >group1<-paste(tar$RACE,tar$Type,tar$ER, sep="."); > unique(group1) [1] "W.N.NEG" "W.T.NEG" "B.N.NEG" "B.T.NEG" "W.N.POS" "W.T.POS" "B.N.POS" "B.T.POS" > group<-factor(group1, levels=c( "W.N.NEG","W.T.NEG", "B.N.NEG", >"B.T.NEG", "W.N.POS", "W.T.POS", "B.N.POS", "B.T.POS")) >Samples<-factor(tar$AccNum); design<-model.matrix(~-1+group+Samples); > colnames(design)<-sub("group","",colnames(design)); > colnames(design)<-sub("Samples","",colnames(design)); > con.matrix<-makeContrasts(T.POS_N.POS=B.T.POS+W.T.POS-B.N.POS-W.N.POS, + B.T.POS_B.N.POS=B.T.POS-B.N.POS,W.T.POS_W.N.POS=W.T.POS-W.N.POS, + T.NEG_N.NEG=B.T.NEG+W.T.NEG-B.N.NEG-W.N.NEG, + B.T.NEG_B.N.NEG=B.T.NEG-B.N.NEG, W.T.NEG_W.N.NEG=W.T.NEG-W.N.NEG, + levels=design) Here is the partical contrast matrix: > con.matrix[,] Contrasts Levels T.POS_N.POS B.T.POS_B.N.POS W.T.POS_W.N.POS T.NEG_N.NEG B.T.NEG_B.N.NEG W.T.NEG_W.N.NEG W.N.NEG 0 0 0 -1 0 -1 W.T.NEG 0 0 0 1 0 1 B.N.NEG 0 0 0 -1 -1 0 B.T.NEG 0 0 0 1 1 0 W.N.POS -1 0 -1 0 0 0 W.T.POS 1 0 1 0 0 0 B.N.POS -1 -1 0 0 0 0 B.T.POS 1 1 0 0 0 0 S10330 0 0 0 0 0 0 S10601 0 0 0 0 0 0 S10618 0 0 0 0 0 0 S10929 0 0 0 0 0 0 S10940 0 0 0 0 0 0 However, when I tried to fir the data into the limma model, I run into the following warnings, which is what I am trying asking about: > lmFit(mydata,design)->fit1; Coefficients not estimable: S14697 S14730 S14810 Warning message: Partial NA coefficients for 26804 probe(s) This warning seems not bothering the subsequent steps as shown below, but I am not sure why I get warning here, could the list provide some insights or clues for me? that would be highly appreciated! >contrasts.fit(fit1, con.matrix)->fit2 > eBayes(fit2)->fit3 > allContrast<-colnames(fit3); > allContrast [1] "T.POS_N.POS" "B.T.POS_B.N.POS" "W.T.POS_W.N.POS" "T.NEG_N.NEG" "B.T.NEG_B.N.NEG" "W.T.NEG_W.N.NEG" I also did check specifically for the samples listed in the warning message > tar[tar$AccNum %in% c("S14697", "S14730", "S14810"),] AccNum Patient_Type_Comb Type RACE ER 57 S14697 S14697_N_W_POS N W POS 58 S14697 S14697_T_W_POS T W POS 55 S14730 S14730_N_B_NEG N B NEG 56 S14730 S14730_T_B_NEG T B NEG 59 S14810 S14810_N_B_POS N B POS 60 S14810 S14810_T_B_POS T B POS They appear to be common, which of all have paired samples (T vs N) and some of which are white/black and some are ER Negative and positive, seems not fall into any of the special category of the phenotype. I also check specifically for their data as below: > mydata[1:5,c("S14697_N_W_POS", "S14697_T_W_POS", "S14730_N_B_NEG", "S14730_T_B_NEG", "S14810_N_B_POS", "S14810_T_B_POS")] S14697_N_W_POS S14697_T_W_POS S14730_N_B_NEG S14730_T_B_NEG S14810_N_B_POS S14810_T_B_POS 7936596 11.024855 10.954703 10.832579 10.917364 10.631019 10.842098 8037331 9.807050 10.366058 10.285187 9.955208 10.410920 10.620751 8023672 8.734080 8.359230 8.559288 8.245623 8.613978 8.614790 8128282 5.489218 5.703427 5.026220 4.738774 5.362589 5.193500 8063634 6.562237 6.784427 6.632752 6.757525 6.887120 7.095357 Which also look normal to me. Thanks a lot in advance for your advice and suggestion! Best Myi

limma Category limma Category • 1.3k views

ADD COMMENT • link 14.8 years ago Yi, Ming NIH/NCI [C] ▴ 100

score 0 · Answer 1 · 2011-04-13

Hi, Dear List: I am still looking for some explanation or diagnosis about the following potential issue that I am not sure what I did is wrong or fine (I apologize if my previous post is not quite clear to the list) I am using limma to do the paired test on the following setting, my tar object looks like as below: > tar[1:5,] AccNum Patient_Type_Comb Type RACE ER 67 S10184 S10184_N_W_NEG N W NEG 66 S10184 S10184_T_W_NEG T W NEG 68 S10330 S10330_N_B_NEG N B NEG 69 S10330 S10330_T_B_NEG T B NEG 74 S10601 S10601_N_W_POS N W POS AccNum is the patient ID and the same patient have two types of samples: "N" for normal, "T" for tumor, Two Races in the sample population: W for "White", B for "Black" ER is for ER status: NEG for negative, POS for positive Patient_Type_Comb column is for showing the sample phenotype in one string The goal of the analysis is looking for the differential gene lists for each of the contrasts including ER positive tumor vs ER positive normal for matched same patient of only Black population (Africa America population), ER negative tumor vs ER negative normal for matched same patients of only White population (Caucasian population) etc as you can see more details in my design and contrast matrix setting (the key is need to consider the paired samples for the same patient with both tumor and normal samples (normal is surrounding normal tissue of the tumor tissue for the same patient), which is well controlled study. My data matrix (Partial, Array data) looks like the following: > mydata[1:5,1:4] S10184_N_W_NEG S10184_T_W_NEG S10330_N_B_NEG S10330_T_B_NEG 7936596 10.079810 10.810695 10.733401 11.369506 8037331 10.076718 10.217359 10.921994 10.389894 8023672 8.503989 8.786565 8.936260 9.384205 8128282 5.423744 4.826185 5.872070 4.486140 8063634 5.909231 6.773356 6.653584 6.408861 Here is how I set up my design and contrast matrix: >group1<-paste(tar$RACE,tar$Type,tar$ER, sep="."); > unique(group1) [1] "W.N.NEG" "W.T.NEG" "B.N.NEG" "B.T.NEG" "W.N.POS" "W.T.POS" "B.N.POS" "B.T.POS" > group<-factor(group1, levels=c( "W.N.NEG","W.T.NEG", "B.N.NEG", >"B.T.NEG", "W.N.POS", "W.T.POS", "B.N.POS", "B.T.POS")) >Samples<-factor(tar$AccNum); design<-model.matrix(~-1+group+Samples); > colnames(design)<-sub("group","",colnames(design)); > colnames(design)<-sub("Samples","",colnames(design)); > con.matrix<-makeContrasts(T.POS_N.POS=B.T.POS+W.T.POS-B.N.POS-W.N.POS, + B.T.POS_B.N.POS=B.T.POS-B.N.POS,W.T.POS_W.N.POS=W.T.POS-W.N.POS, + T.NEG_N.NEG=B.T.NEG+W.T.NEG-B.N.NEG-W.N.NEG, + B.T.NEG_B.N.NEG=B.T.NEG-B.N.NEG, W.T.NEG_W.N.NEG=W.T.NEG-W.N.NEG, + levels=design) Here is the partial contrast matrix: > con.matrix[,] Contrasts Levels T.POS_N.POS B.T.POS_B.N.POS W.T.POS_W.N.POS T.NEG_N.NEG B.T.NEG_B.N.NEG W.T.NEG_W.N.NEG W.N.NEG 0 0 0 -1 0 -1 W.T.NEG 0 0 0 1 0 1 B.N.NEG 0 0 0 -1 -1 0 B.T.NEG 0 0 0 1 1 0 W.N.POS -1 0 -1 0 0 0 W.T.POS 1 0 1 0 0 0 B.N.POS -1 -1 0 0 0 0 B.T.POS 1 1 0 0 0 0 S10330 0 0 0 0 0 0 S10601 0 0 0 0 0 0 S10618 0 0 0 0 0 0 S10929 0 0 0 0 0 0 S10940 0 0 0 0 0 0 However, when I tried to fit the data into the limma model, I run into the following warnings, which is what I am trying asking about: > lmFit(mydata,design)->fit1; Coefficients not estimable: S14697 S14730 S14810 Warning message: Partial NA coefficients for 26804 probe(s) This warning seems not bothering the subsequent steps as shown below, but I am not sure why I get warning here, could the list provide some insights or clues for me? that would be highly appreciated! >contrasts.fit(fit1, con.matrix)->fit2 > eBayes(fit2)->fit3 > allContrast<-colnames(fit3); > allContrast [1] "T.POS_N.POS" "B.T.POS_B.N.POS" "W.T.POS_W.N.POS" "T.NEG_N.NEG" "B.T.NEG_B.N.NEG" "W.T.NEG_W.N.NEG" I also did check specifically for the samples listed in the warning message > tar[tar$AccNum %in% c("S14697", "S14730", "S14810"),] AccNum Patient_Type_Comb Type RACE ER 57 S14697 S14697_N_W_POS N W POS 58 S14697 S14697_T_W_POS T W POS 55 S14730 S14730_N_B_NEG N B NEG 56 S14730 S14730_T_B_NEG T B NEG 59 S14810 S14810_N_B_POS N B POS 60 S14810 S14810_T_B_POS T B POS They appear to be common, which of all have paired samples (T vs N) and some of which are white/black and some are ER Negative and positive, seems not fall into any of the special category of the phenotype. I also check specifically for their data as below: > mydata[1:5,c("S14697_N_W_POS", "S14697_T_W_POS", "S14730_N_B_NEG", > "S14730_T_B_NEG", "S14810_N_B_POS", "S14810_T_B_POS")] S14697_N_W_POS S14697_T_W_POS S14730_N_B_NEG S14730_T_B_NEG S14810_N_B_POS S14810_T_B_POS 7936596 11.024855 10.954703 10.832579 10.917364 10.631019 10.842098 8037331 9.807050 10.366058 10.285187 9.955208 10.410920 10.620751 8023672 8.734080 8.359230 8.559288 8.245623 8.613978 8.614790 8128282 5.489218 5.703427 5.026220 4.738774 5.362589 5.193500 8063634 6.562237 6.784427 6.632752 6.757525 6.887120 7.095357 Which also look normal to me. Thanks a lot in advance for your advice and suggestion! Best Ming Ming Yi, Ph.D. Information System Program SAIC-Frederick, Inc. National Cancer Institute at Frederick Post Office Box B, Frederick, MD 21702