different gal files using limma

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 10 hours ago

WEHI, Melbourne, Australia

Dear Tiandao, Dealing with multiple gal files is very tricky, but possible. In limma, you need to read in the GPR files for each GAL file separately, identify control spots separately, and normalize separately. So, if you have two GAL files, you will end up with two normalized MAList objects MA1 and MA2. You will then need to align MA1 and MA2 by gene ID. There is a merge command, but very often the situation is too complex for this command to handle. Usually you will need to remove the control spots from MA1 and MA2 separately, to get down to a list of common genes, then sort MA1 to match the gene order of MA2, then cbind them together. If MA1 and MA2 are of the same length, with the same gene IDs, then something like this wil do the merge: m <- match(MA2$genes$ID, MA1$genes$ID) MA <- cbind(MA1[m,], MA2) There is any alternative method, which is to use the printorder() function to map spots back to the original 384-well plate positions, then align the arrays by 384-well plate. This method requires that the plates were used in the same order throughout the printing, except for control plates. You need to be very careful! Good luck. Gordon >Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT) >From: Tiandao Li <tiandao.li at="" usm.edu=""> >Subject: [BioC] different gal files using limma >To: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> >Message-ID: <pine.lnx.4.64.0709091401440.32134 at="" orca.st.usm.edu=""> >Content-Type: TEXT/PLAIN; charset=US-ASCII > >Hello, > >I am analyzing cDNA microarray data using limma. I generated the GAL file >using the program coming with chipwriter, everything looks great. However, >when I printed the first batch of chips, after the last dip of pins in the >first plates, print, wash, and the pins redipped again in the first plate >from the beginning, and print, wash, then stop to change the plate. The >company gave us the patch to solve this problem. So this gal file is a >little different than the rest batches of chips, the locations of genes, >MSP, and controls are different (5%). After hybridization, I used GenePix >Pro 6.1 for spotfinding. After reading the data into limma, I want to use >MSP and control spots for normalization. I don't know how to label >different gal files using readSpotTypes() in all chips. > >Thanks, > >Tiandao > >I am kind of new to R and limma. The following is my setting. > > > sessionInfo() >R version 2.5.1 (2007-06-27) >i386-pc-mingw32 > >locale: >LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >States.1252;LC_MONETARY=English_United >States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > >attached base packages: >[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" >[7] "base" > >other attached packages: > statmod limma > "1.3.0" "2.10.5" > >Codes for analysis > >library(limma) > >A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532") >B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635 >Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635 >Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 % >Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532 >Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% > >B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means >(635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)", >"Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F >Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of >Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532 >Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total >Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags", >"Normalize", "Autoflag") > ># read 6 test files >targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files >RG <- >read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A,o ther.columns=B) >spottypes <- readSpotTypes("spottypes3.txt") # short spot types >RG$genes$Status <- controlStatus(spottypes,RG) > >targets >SlideNumber FileName Cy3 Cy5 Name >1 13582917 N0 N1 N0N121 >2 13582918 N0 N1 N0N122 >3 13590446 N0 N1 N0N123 >4 13590420 N1 H1 N1H121 >5 13590521 N1 H1 N1H122 >6 13591193 N1 H1 N1H123 > >spottypes3 >SpotType ID Color >gene * black >Calibration Calib* blue >Ratio Ratio* red >Negative Neg*|Util* brown >MSP MSP orange >Alexa Alexa* yellow >blank NotDefined green

Microarray Normalization limma Microarray Normalization limma • 1.6k views

ADD COMMENT • link updated 17.6 years ago by Tiandao Li ▴ 260 • written 17.6 years ago by Gordon Smyth 52k

0

Entering edit mode

Tiandao Li ▴ 260

@tiandao-li-2372

Last seen 10.6 years ago

Dear Dr. Symth, Thanks for your help. I read in the gpr files using 2 gal files separately, then found the spot types separately, normalization separately, and remove all control spots separately, and only keep gene type for further analysis. Both MA1 and MA2 used the same gene ID s, however, MA2$genes$ID have 8 more genes than MA1. I used your code to match MA1 to MA2 m <- match(MA2$genes$ID, MA1$genes$ID) MA <- cbind(MA1[m,], MA2) I compared MA2 to MA2 part of MA, the numbers are identical, however, there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID. Because MA1 and MA2 aren't the same length and IDs. Could I still use it? There are 4 duplicate spots per gene on the array. I put 2 target files together to create a new target file, and use it to build design matrix for linear model. Is it OK? Sincerely, Tiandao On Tue, 11 Sep 2007, Gordon Smyth wrote: Dear Tiandao, Dealing with multiple gal files is very tricky, but possible. In limma, you need to read in the GPR files for each GAL file separately, identify control spots separately, and normalize separately. So, if you have two GAL files, you will end up with two normalized MAList objects MA1 and MA2. You will then need to align MA1 and MA2 by gene ID. There is a merge command, but very often the situation is too complex for this command to handle. Usually you will need to remove the control spots from MA1 and MA2 separately, to get down to a list of common genes, then sort MA1 to match the gene order of MA2, then cbind them together. If MA1 and MA2 are of the same length, with the same gene IDs, then something like this wil do the merge: m <- match(MA2$genes$ID, MA1$genes$ID) MA <- cbind(MA1[m,], MA2) There is any alternative method, which is to use the printorder() function to map spots back to the original 384-well plate positions, then align the arrays by 384-well plate. This method requires that the plates were used in the same order throughout the printing, except for control plates. You need to be very careful! Good luck. Gordon > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT) > From: Tiandao Li <tiandao.li at="" usm.edu=""> > Subject: [BioC] different gal files using limma > To: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> > Message-ID: <pine.lnx.4.64.0709091401440.32134 at="" orca.st.usm.edu=""> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > Hello, > > I am analyzing cDNA microarray data using limma. I generated the GAL file > using the program coming with chipwriter, everything looks great. However, > when I printed the first batch of chips, after the last dip of pins in the > first plates, print, wash, and the pins redipped again in the first plate > from the beginning, and print, wash, then stop to change the plate. The > company gave us the patch to solve this problem. So this gal file is a > little different than the rest batches of chips, the locations of genes, > MSP, and controls are different (5%). After hybridization, I used GenePix > Pro 6.1 for spotfinding. After reading the data into limma, I want to use > MSP and control spots for normalization. I don't know how to label > different gal files using readSpotTypes() in all chips. > > Thanks, > > Tiandao > > I am kind of new to R and limma. The following is my setting. > > > sessionInfo() > R version 2.5.1 (2007-06-27) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > [7] "base" > > other attached packages: > statmod limma > "1.3.0" "2.10.5" > > Codes for analysis > > library(limma) > > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532") > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635 > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635 > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 % > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532 > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% > > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)", > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532 > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags", > "Normalize", "Autoflag") > > # read 6 test files > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files > RG <- > read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A, other.columns=B) > spottypes <- readSpotTypes("spottypes3.txt") # short spot types > RG$genes$Status <- controlStatus(spottypes,RG) > > targets > SlideNumber FileName Cy3 Cy5 Name > 1 13582917 N0 N1 N0N121 > 2 13582918 N0 N1 N0N122 > 3 13590446 N0 N1 N0N123 > 4 13590420 N1 H1 N1H121 > 5 13590521 N1 H1 N1H122 > 6 13591193 N1 H1 N1H123 > > spottypes3 > SpotType ID Color > gene * black > Calibration Calib* blue > Ratio Ratio* red > Negative Neg*|Util* brown > MSP MSP orange > Alexa Alexa* yellow > blank NotDefined green

ADD COMMENT • link 17.6 years ago Tiandao Li ▴ 260

0

Entering edit mode

Dear Tiandao, It doesn't necessarily make sense to try to merge MAList if they aren't the same length and don't have the same IDs. I suggest you get down to a subset of probes for this is true, then try the merge command again. This assumes that the ID column of RG$genes has unambiguous identifiers for each probe. (I can't give you a lot of detail, because trying to troubleshoot this over the email is very hard.) BTW, I notice that you're reading the entire GPR files into your RGList objects. This will make huge objects. Do you need to do that? Why not just RG <- read.maimages(targets,source="genepix.median",ext="gpr") Best wishes Gordon At 07:26 AM 12/09/2007, Tiandao Li wrote: >Dear Dr. Symth, > >Thanks for your help. I read in the gpr files using 2 gal files >separately, then found the spot types separately, normalization >separately, and remove all control spots separately, and only keep gene >type for further analysis. Both MA1 and MA2 used the same gene ID s, >however, MA2$genes$ID have 8 more genes than MA1. I used your code to >match MA1 to MA2 > >m <- match(MA2$genes$ID, MA1$genes$ID) >MA <- cbind(MA1[m,], MA2) > >I compared MA2 to MA2 part of MA, the numbers are identical, however, >there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID. >Because MA1 and MA2 aren't the same length and IDs. Could I still use it? >There are 4 duplicate spots per gene on the array. > >I put 2 target files together to create a new target file, and use it to >build design matrix for linear model. Is it OK? > >Sincerely, > >Tiandao > >On Tue, 11 Sep 2007, Gordon Smyth wrote: > >Dear Tiandao, > >Dealing with multiple gal files is very tricky, but possible. In >limma, you need >to read in the GPR files for each GAL file separately, identify control spots >separately, and normalize separately. So, if you have two GAL files, you will >end up with two normalized MAList objects MA1 and MA2. > >You will then need to align MA1 and MA2 by gene ID. There is a merge command, >but very often the situation is too complex for this command to >handle. Usually >you will need to remove the control spots from MA1 and MA2 separately, to get >down to a list of common genes, then sort MA1 to match the gene order of MA2, >then cbind them together. > >If MA1 and MA2 are of the same length, with the same gene IDs, then something >like this wil do the merge: > > m <- match(MA2$genes$ID, MA1$genes$ID) > MA <- cbind(MA1[m,], MA2) > >There is any alternative method, which is to use the printorder() function to >map spots back to the original 384-well plate positions, then align the arrays >by 384-well plate. This method requires that the plates were used in the same >order throughout the printing, except for control plates. > >You need to be very careful! >Good luck. >Gordon > > > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT) > > From: Tiandao Li <tiandao.li at="" usm.edu=""> > > Subject: [BioC] different gal files using limma > > To: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> > > Message-ID: <pine.lnx.4.64.0709091401440.32134 at="" orca.st.usm.edu=""> > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > Hello, > > > > I am analyzing cDNA microarray data using limma. I generated the GAL file > > using the program coming with chipwriter, everything looks great. However, > > when I printed the first batch of chips, after the last dip of pins in the > > first plates, print, wash, and the pins redipped again in the first plate > > from the beginning, and print, wash, then stop to change the plate. The > > company gave us the patch to solve this problem. So this gal file is a > > little different than the rest batches of chips, the locations of genes, > > MSP, and controls are different (5%). After hybridization, I used GenePix > > Pro 6.1 for spotfinding. After reading the data into limma, I want to use > > MSP and control spots for normalization. I don't know how to label > > different gal files using readSpotTypes() in all chips. > > > > Thanks, > > > > Tiandao > > > > I am kind of new to R and limma. The following is my setting. > > > > > sessionInfo() > > R version 2.5.1 (2007-06-27) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > > States.1252;LC_MONETARY=English_United > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > > [7] "base" > > > > other attached packages: > > statmod limma > > "1.3.0" "2.10.5" > > > > Codes for analysis > > > > library(limma) > > > > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532") > > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635 > > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635 > > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 % > > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532 > > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% > > > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means > > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)", > > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F > > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of > > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532 > > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total > > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags", > > "Normalize", "Autoflag") > > > > # read 6 test files > > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files > > RG <- > > > read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A, other.columns=B) > > spottypes <- readSpotTypes("spottypes3.txt") # short spot types > > RG$genes$Status <- controlStatus(spottypes,RG) > > > > targets > > SlideNumber FileName Cy3 Cy5 Name > > 1 13582917 N0 N1 N0N121 > > 2 13582918 N0 N1 N0N122 > > 3 13590446 N0 N1 N0N123 > > 4 13590420 N1 H1 N1H121 > > 5 13590521 N1 H1 N1H122 > > 6 13591193 N1 H1 N1H123 > > > > spottypes3 > > SpotType ID Color > > gene * black > > Calibration Calib* blue > > Ratio Ratio* red > > Negative Neg*|Util* brown > > MSP MSP orange > > Alexa Alexa* yellow > > blank NotDefined green

ADD REPLY • link 17.6 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Dr. Smyth, MA2 had the full set of IDs (2716 genes), while MA1 only 8 IDs less than the full set of IDs, 2708 genes. I want to match MA1 to MA2, however, there are 8 "NA" in new MA$genes$ID instead of the IDs from MA2. The rest of them are the same. I will check if there is any different between MA1 and MA1 part of new MA. I am new to R and limma, I import the entire gpr files and export them to see if I do anything wrong. I used some items as quality controls. Everything is fine except "Log Ratio (635/532)" sometimes give me "character" instead of "numeric". Since I had 2 target files to read in gpr files. Now I put 2 target files together to create a new target file, and use it to build design matrix for linear model. Is it OK? Sincerely, Tiandao On Wed, 12 Sep 2007, Gordon Smyth wrote: Dear Tiandao, It doesn't necessarily make sense to try to merge MAList if they aren't the same length and don't have the same IDs. I suggest you get down to a subset of probes for this is true, then try the merge command again. This assumes that the ID column of RG$genes has unambiguous identifiers for each probe. (I can't give you a lot of detail, because trying to troubleshoot this over the email is very hard.) BTW, I notice that you're reading the entire GPR files into your RGList objects. This will make huge objects. Do you need to do that? Why not just RG <- read.maimages(targets,source="genepix.median",ext="gpr") Best wishes Gordon At 07:26 AM 12/09/2007, Tiandao Li wrote: > Dear Dr. Symth, > > Thanks for your help. I read in the gpr files using 2 gal files > separately, then found the spot types separately, normalization > separately, and remove all control spots separately, and only keep gene > type for further analysis. Both MA1 and MA2 used the same gene ID s, > however, MA2$genes$ID have 8 more genes than MA1. I used your code to > match MA1 to MA2 > > m <- match(MA2$genes$ID, MA1$genes$ID) > MA <- cbind(MA1[m,], MA2) > > I compared MA2 to MA2 part of MA, the numbers are identical, however, > there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID. > Because MA1 and MA2 aren't the same length and IDs. Could I still use it? > There are 4 duplicate spots per gene on the array. > > I put 2 target files together to create a new target file, and use it to > build design matrix for linear model. Is it OK? > > Sincerely, > > Tiandao > > On Tue, 11 Sep 2007, Gordon Smyth wrote: > > Dear Tiandao, > > Dealing with multiple gal files is very tricky, but possible. In limma, you > need > to read in the GPR files for each GAL file separately, identify control spots > separately, and normalize separately. So, if you have two GAL files, you will > end up with two normalized MAList objects MA1 and MA2. > > You will then need to align MA1 and MA2 by gene ID. There is a merge command, > but very often the situation is too complex for this command to handle. > Usually > you will need to remove the control spots from MA1 and MA2 separately, to get > down to a list of common genes, then sort MA1 to match the gene order of MA2, > then cbind them together. > > If MA1 and MA2 are of the same length, with the same gene IDs, then something > like this wil do the merge: > > m <- match(MA2$genes$ID, MA1$genes$ID) > MA <- cbind(MA1[m,], MA2) > > There is any alternative method, which is to use the printorder() function to > map spots back to the original 384-well plate positions, then align the arrays > by 384-well plate. This method requires that the plates were used in the same > order throughout the printing, except for control plates. > > You need to be very careful! > Good luck. > Gordon > > > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT) > > From: Tiandao Li <tiandao.li at="" usm.edu=""> > > Subject: [BioC] different gal files using limma > > To: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> > > Message-ID: <pine.lnx.4.64.0709091401440.32134 at="" orca.st.usm.edu=""> > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > Hello, > > > > I am analyzing cDNA microarray data using limma. I generated the GAL file > > using the program coming with chipwriter, everything looks great. However, > > when I printed the first batch of chips, after the last dip of pins in the > > first plates, print, wash, and the pins redipped again in the first plate > > from the beginning, and print, wash, then stop to change the plate. The > > company gave us the patch to solve this problem. So this gal file is a > > little different than the rest batches of chips, the locations of genes, > > MSP, and controls are different (5%). After hybridization, I used GenePix > > Pro 6.1 for spotfinding. After reading the data into limma, I want to use > > MSP and control spots for normalization. I don't know how to label > > different gal files using readSpotTypes() in all chips. > > > > Thanks, > > > > Tiandao > > > > I am kind of new to R and limma. The following is my setting. > > > > > sessionInfo() > > R version 2.5.1 (2007-06-27) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > > States.1252;LC_MONETARY=English_United > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > > [7] "base" > > > > other attached packages: > > statmod limma > > "1.3.0" "2.10.5" > > > > Codes for analysis > > > > library(limma) > > > > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532") > > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635 > > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635 > > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 % > > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532 > > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% > > > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means > > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)", > > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F > > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of > > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532 > > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total > > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags", > > "Normalize", "Autoflag") > > > > # read 6 test files > > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files > > RG <- > > > read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A, other.columns=B) > > spottypes <- readSpotTypes("spottypes3.txt") # short spot types > > RG$genes$Status <- controlStatus(spottypes,RG) > > > > targets > > SlideNumber FileName Cy3 Cy5 Name > > 1 13582917 N0 N1 N0N121 > > 2 13582918 N0 N1 N0N122 > > 3 13590446 N0 N1 N0N123 > > 4 13590420 N1 H1 N1H121 > > 5 13590521 N1 H1 N1H122 > > 6 13591193 N1 H1 N1H123 > > > > spottypes3 > > SpotType ID Color > > gene * black > > Calibration Calib* blue > > Ratio Ratio* red > > Negative Neg*|Util* brown > > MSP MSP orange > > Alexa Alexa* yellow > > blank NotDefined green

ADD REPLY • link 17.6 years ago Tiandao Li ▴ 260

0

Entering edit mode

Dear Dr. Smyth, MA2 had the full set of IDs (2716 genes), while MA1 only 8 IDs less than the full set of IDs, 2708 genes. I want to match MA1 to MA2, however, there are 8 "NA" in new MA$genes$ID instead of the IDs from MA2. The rest of them are the same. I will check if there is any different between MA1 and MA1 part of new MA. I used your codes to merge MALists from MA1 and MA2, I can't get the correct result file. > m <- match(MA2$genes$ID, MA1$genes$ID) > MA <- cbind(MA1[m,], MA2) So I used the follwoings to merge MA1 and MA2. The new MA file is the same one I joined manually. rownames(MA1$M) <- rownames(MA1$A) <- MA1$genes$ID MA3 <- new("MAList",list(M=MA1$M,A=MA1$A)) rownames(MA2$M) <- rownames(MA2$A) <- MA2$genes$ID MA4 <- new("MAList",list(M=MA2$M,A=MA2$A)) MA <- merge(MA4,MA3) I imported the entire gpr files and exported them to see if I do anything wrong. I also used some items as quality controls or to make some plots. Everything is fine, however "Log Ratio (635/532)" sometimes give me "character" instead of "numeric". Without importing the entire data, the PrintLayout was always wrong. Since I had 2 target files to read in gpr files. Now I put 2 target files together to create a new target file, and use it to build design matrix for linear model. design <- modelMatrix(targets,ref="N0") fit <- lmFit(MA,design) However, I got the warning message: Coefficients not estimable: M55 N6 Would you let me know what are the reasons that some coefficients can't be estimated from liear model? Sincerely, Tiandao On Wed, 12 Sep 2007, Gordon Smyth wrote: Dear Tiandao, It doesn't necessarily make sense to try to merge MAList if they aren't the same length and don't have the same IDs. I suggest you get down to a subset of probes for this is true, then try the merge command again. This assumes that the ID column of RG$genes has unambiguous identifiers for each probe. (I can't give you a lot of detail, because trying to troubleshoot this over the email is very hard.) BTW, I notice that you're reading the entire GPR files into your RGList objects. This will make huge objects. Do you need to do that? Why not just RG <- read.maimages(targets,source="genepix.median",ext="gpr") Best wishes Gordon At 07:26 AM 12/09/2007, Tiandao Li wrote: > Dear Dr. Symth, > > Thanks for your help. I read in the gpr files using 2 gal files > separately, then found the spot types separately, normalization > separately, and remove all control spots separately, and only keep gene > type for further analysis. Both MA1 and MA2 used the same gene ID s, > however, MA2$genes$ID have 8 more genes than MA1. I used your code to > match MA1 to MA2 > > m <- match(MA2$genes$ID, MA1$genes$ID) > MA <- cbind(MA1[m,], MA2) > > I compared MA2 to MA2 part of MA, the numbers are identical, however, > there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID. > Because MA1 and MA2 aren't the same length and IDs. Could I still use it? > There are 4 duplicate spots per gene on the array. > > I put 2 target files together to create a new target file, and use it to > build design matrix for linear model. Is it OK? > > Sincerely, > > Tiandao > > On Tue, 11 Sep 2007, Gordon Smyth wrote: > > Dear Tiandao, > > Dealing with multiple gal files is very tricky, but possible. In limma, you > need > to read in the GPR files for each GAL file separately, identify control spots > separately, and normalize separately. So, if you have two GAL files, you will > end up with two normalized MAList objects MA1 and MA2. > > You will then need to align MA1 and MA2 by gene ID. There is a merge command, > but very often the situation is too complex for this command to handle. > Usually > you will need to remove the control spots from MA1 and MA2 separately, to get > down to a list of common genes, then sort MA1 to match the gene order of MA2, > then cbind them together. > > If MA1 and MA2 are of the same length, with the same gene IDs, then something > like this wil do the merge: > > m <- match(MA2$genes$ID, MA1$genes$ID) > MA <- cbind(MA1[m,], MA2) > > There is any alternative method, which is to use the printorder() function to > map spots back to the original 384-well plate positions, then align the arrays > by 384-well plate. This method requires that the plates were used in the same > order throughout the printing, except for control plates. > > You need to be very careful! > Good luck. > Gordon > > > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT) > > From: Tiandao Li <tiandao.li at="" usm.edu=""> > > Subject: [BioC] different gal files using limma > > To: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> > > Message-ID: <pine.lnx.4.64.0709091401440.32134 at="" orca.st.usm.edu=""> > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > Hello, > > > > I am analyzing cDNA microarray data using limma. I generated the GAL file > > using the program coming with chipwriter, everything looks great. However, > > when I printed the first batch of chips, after the last dip of pins in the > > first plates, print, wash, and the pins redipped again in the first plate > > from the beginning, and print, wash, then stop to change the plate. The > > company gave us the patch to solve this problem. So this gal file is a > > little different than the rest batches of chips, the locations of genes, > > MSP, and controls are different (5%). After hybridization, I used GenePix > > Pro 6.1 for spotfinding. After reading the data into limma, I want to use > > MSP and control spots for normalization. I don't know how to label > > different gal files using readSpotTypes() in all chips. > > > > Thanks, > > > > Tiandao > > > > I am kind of new to R and limma. The following is my setting. > > > > > sessionInfo() > > R version 2.5.1 (2007-06-27) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > > States.1252;LC_MONETARY=English_United > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > > [7] "base" > > > > other attached packages: > > statmod limma > > "1.3.0" "2.10.5" > > > > Codes for analysis > > > > library(limma) > > > > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532") > > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635 > > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635 > > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 % > > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532 > > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% > > > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means > > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)", > > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F > > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of > > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532 > > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total > > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags", > > "Normalize", "Autoflag") > > > > # read 6 test files > > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files > > RG <- > > > read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A, other.columns=B) > > spottypes <- readSpotTypes("spottypes3.txt") # short spot types > > RG$genes$Status <- controlStatus(spottypes,RG) > > > > targets > > SlideNumber FileName Cy3 Cy5 Name > > 1 13582917 N0 N1 N0N121 > > 2 13582918 N0 N1 N0N122 > > 3 13590446 N0 N1 N0N123 > > 4 13590420 N1 H1 N1H121 > > 5 13590521 N1 H1 N1H122 > > 6 13591193 N1 H1 N1H123 > > > > spottypes3 > > SpotType ID Color > > gene * black > > Calibration Calib* blue > > Ratio Ratio* red > > Negative Neg*|Util* brown > > MSP MSP orange > > Alexa Alexa* yellow > > blank NotDefined green _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 17.6 years ago Tiandao Li ▴ 260

0

Entering edit mode

Dear Tiandao, I'm glad that you've successfully merged your separate gal files. But, please, do not address questions specifically to me. This is a mailing list with many people who might make helpful comments. You've given us no information at all about the design of your experiment, so no one has any chance of being able to tell you why some of your coefficients can be estimated. The message means that your experiment does not provide any information about the difference between these RNA sources and "N0", which you have specified as a reference. You need to give a little more thought to what comparisons you are really trying to make in your experiment. Best wishes Gordon At 07:08 AM 15/09/2007, Tiandao Li wrote: >Dear Dr. Smyth, > >MA2 had the full set of IDs (2716 genes), while MA1 only 8 IDs less than >the full set of IDs, 2708 genes. I want to match MA1 to MA2, however, >there are 8 "NA" in new MA$genes$ID instead of the IDs from MA2. The rest >of them are the same. I will check if there is any different between MA1 >and MA1 part of new MA. > >I used your codes to merge MALists from MA1 and MA2, I can't get the >correct result file. > > m <- match(MA2$genes$ID, MA1$genes$ID) > > MA <- cbind(MA1[m,], MA2) > >So I used the follwoings to merge MA1 and MA2. The new MA file is the same >one I joined manually. >rownames(MA1$M) <- rownames(MA1$A) <- MA1$genes$ID >MA3 <- new("MAList",list(M=MA1$M,A=MA1$A)) >rownames(MA2$M) <- rownames(MA2$A) <- MA2$genes$ID >MA4 <- new("MAList",list(M=MA2$M,A=MA2$A)) >MA <- merge(MA4,MA3) > >I imported the entire gpr files and exported them to see if I do anything >wrong. I also used some items as quality controls or to make some plots. >Everything is fine, however "Log Ratio (635/532)" sometimes give me >"character" instead of "numeric". Without importing the entire data, the >PrintLayout was always wrong. > >Since I had 2 target files to read in gpr files. Now I put 2 target files >together to create a new target file, and use it to build design matrix >for linear model. >design <- modelMatrix(targets,ref="N0") >fit <- lmFit(MA,design) >However, I got the warning message: > >Coefficients not estimable: M55 N6 > >Would you let me know what are the reasons that some coefficients can't be >estimated from liear model? > >Sincerely, > >Tiandao > > >On Wed, 12 Sep 2007, Gordon Smyth wrote: > >Dear Tiandao, > >It doesn't necessarily make sense to try to merge MAList if they >aren't the same >length and don't have the same IDs. I suggest you get down to a >subset of probes >for this is true, then try the merge command again. This assumes that the ID >column of RG$genes has unambiguous identifiers for each probe. (I >can't give you >a lot of detail, because trying to troubleshoot this over the email is very >hard.) > >BTW, I notice that you're reading the entire GPR files into your >RGList objects. >This will make huge objects. Do you need to do that? Why not just > > RG <- read.maimages(targets,source="genepix.median",ext="gpr") > >Best wishes >Gordon > >At 07:26 AM 12/09/2007, Tiandao Li wrote: > > Dear Dr. Symth, > > > > Thanks for your help. I read in the gpr files using 2 gal files > > separately, then found the spot types separately, normalization > > separately, and remove all control spots separately, and only keep gene > > type for further analysis. Both MA1 and MA2 used the same gene ID s, > > however, MA2$genes$ID have 8 more genes than MA1. I used your code to > > match MA1 to MA2 > > > > m <- match(MA2$genes$ID, MA1$genes$ID) > > MA <- cbind(MA1[m,], MA2) > > > > I compared MA2 to MA2 part of MA, the numbers are identical, however, > > there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID. > > Because MA1 and MA2 aren't the same length and IDs. Could I still use it? > > There are 4 duplicate spots per gene on the array. > > > > I put 2 target files together to create a new target file, and use it to > > build design matrix for linear model. Is it OK? > > > > Sincerely, > > > > Tiandao > > > > On Tue, 11 Sep 2007, Gordon Smyth wrote: > > > > Dear Tiandao, > > > > Dealing with multiple gal files is very tricky, but possible. In limma, you > > need > > to read in the GPR files for each GAL file separately, identify > control spots > > separately, and normalize separately. So, if you have two GAL > files, you will > > end up with two normalized MAList objects MA1 and MA2. > > > > You will then need to align MA1 and MA2 by gene ID. There is a > merge command, > > but very often the situation is too complex for this command to handle. > > Usually > > you will need to remove the control spots from MA1 and MA2 > separately, to get > > down to a list of common genes, then sort MA1 to match the gene > order of MA2, > > then cbind them together. > > > > If MA1 and MA2 are of the same length, with the same gene IDs, > then something > > like this wil do the merge: > > > > m <- match(MA2$genes$ID, MA1$genes$ID) > > MA <- cbind(MA1[m,], MA2) > > > > There is any alternative method, which is to use the printorder() > function to > > map spots back to the original 384-well plate positions, then > align the arrays > > by 384-well plate. This method requires that the plates were used > in the same > > order throughout the printing, except for control plates. > > > > You need to be very careful! > > Good luck. > > Gordon > > > > > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT) > > > From: Tiandao Li <tiandao.li at="" usm.edu=""> > > > Subject: [BioC] different gal files using limma > > > To: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> > > > Message-ID: <pine.lnx.4.64.0709091401440.32134 at="" orca.st.usm.edu=""> > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > Hello, > > > > > > I am analyzing cDNA microarray data using limma. I generated the GAL file > > > using the program coming with chipwriter, everything looks > great. However, > > > when I printed the first batch of chips, after the last dip of > pins in the > > > first plates, print, wash, and the pins redipped again in the first plate > > > from the beginning, and print, wash, then stop to change the plate. The > > > company gave us the patch to solve this problem. So this gal file is a > > > little different than the rest batches of chips, the locations of genes, > > > MSP, and controls are different (5%). After hybridization, I used GenePix > > > Pro 6.1 for spotfinding. After reading the data into limma, I want to use > > > MSP and control spots for normalization. I don't know how to label > > > different gal files using readSpotTypes() in all chips. > > > > > > Thanks, > > > > > > Tiandao > > > > > > I am kind of new to R and limma. The following is my setting. > > > > > > > sessionInfo() > > > R version 2.5.1 (2007-06-27) > > > i386-pc-mingw32 > > > > > > locale: > > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > > > States.1252;LC_MONETARY=English_United > > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > > > attached base packages: > > > [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" > > > [7] "base" > > > > > > other attached packages: > > > statmod limma > > > "1.3.0" "2.10.5" > > > > > > Codes for analysis > > > > > > library(limma) > > > > > > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532") > > > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635 > > > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635 > > > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 % > > > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532 > > > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% > > > > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means > > > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)", > > > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F > > > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of > > > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532 > > > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total > > > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags", > > > "Normalize", "Autoflag") > > > > > > # read 6 test files > > > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files > > > RG <- > > > > > > read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A, other.columns=B) > > > spottypes <- readSpotTypes("spottypes3.txt") # short spot types > > > RG$genes$Status <- controlStatus(spottypes,RG) > > > > > > targets > > > SlideNumber FileName Cy3 Cy5 Name > > > 1 13582917 N0 N1 N0N121 > > > 2 13582918 N0 N1 N0N122 > > > 3 13590446 N0 N1 N0N123 > > > 4 13590420 N1 H1 N1H121 > > > 5 13590521 N1 H1 N1H122 > > > 6 13591193 N1 H1 N1H123 > > > > > > spottypes3 > > > SpotType ID Color > > > gene * black > > > Calibration Calib* blue > > > Ratio Ratio* red > > > Negative Neg*|Util* brown > > > MSP MSP orange > > > Alexa Alexa* yellow > > > blank NotDefined green > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 17.6 years ago Gordon Smyth 52k

Login before adding your answer.