Question

Communicating gene-probe relationships to sam

0

Entering edit mode

Charles Crane ▴ 30

@charles-crane-1712

Last seen 9.6 years ago

Dear newsgroup: I am trying to use sam to identify single-feature polymorphisms between two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract probe-level data from my CEL files, background correct the pm values with rma, quantile normalize, and isolate the pm intensities in a data frame, but both sam and sam.snp consistently and instantly fail with "Error in FUN(data, cl, ...) : There should be at least 25 samples." This happens even though there are 22840 genes and 11 probes per gene in the data. What am I overlooking? Does sam need some sort of a mapping from the 11 probes to their parent gene? A session history follows. Thank your very much for your time. Also, do you plan to include directions for using sam for SNP analysis, going all the way from ReadAffy through sam for a case history, in a future vignette? It would save the uninitiated a lot of time and frustration. Charles Crane USDA-ARS, MWA, Crop Production and Pest Control Research Unit And Department of Botany and Plant Pathology, Purdue University > library(affy) > library(siggenes) > sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL", "000114_12hr_kindered_H20_inoc_3bx1.CEL", + "000122_5hr_peruvian_H20_inoc_14bx1.CEL", "000123_12hr_peruvian_H20_inoc_15bx1.CEL", "000141_Non-host-resistant-barley_16bx2.CEL", + "000157_24hr_Kindered_H2O_Inoc_1.CEL", "000158_0hr_Kindered_H2O_Inoc_4.CEL", "000159_0hr_Kindered_S_passerini_inoc_5.CEL", + "000161_0hr_Peruvian_H2O_inoc_13.CEL", "000162_0hr_Peruvian_S_passerini_17.CEL", "000189_0hr_403-rep2_H2O_inoc_21b.CEL", + "000190_5hr_403-rep2_H2O_inoc_22b.CEL", "000191_12hr_403-rep2_H2O_inoc_23b.CEL", "000192_24hr_403-rep2_H2O_inoc_24b.CEL", + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL", "000201_0hr_405-rep2_H2O_inoc_33b.CEL", "000202_5hr_405-rep2_H2O_inoc_34b.CEL", + "000203_12hr_405-rep2_H2O_inoc_35b.CEL", "000204_24hr_405-rep2_H2O_inoc_36b.CEL", "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"), + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", + "000123_12hr_peruvian_H20_inoc_15bx1", "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", + "000158_0hr_Kindered_H2O_Inoc_4", "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", "000190_5hr_403-rep2_H2O_inoc_22b", + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", "000193_0hr_403-rep2_inoc-s_pass_25b", + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", "000203_12hr_405-rep2_H2O_inoc_35b", + "000204_24hr_405-rep2_H2O_inoc_36b", "000205_0hr_405-rep2_inoc-s_pass_37b"), + phenoData = "Hordeumphenodata.txt") > signames <- geneNames(sidata) > sirma <- bg.correct(sidata, method = "rma") > sinorm <- normalize(sirma) > siprobeintensities <- as.data.frame(probes(sinorm, "pm")) > write.table(siprobeintensities, file = "siprobeintensities.txt") > sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples. > sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character') Read 251437 items > sigenenames[1:22] [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at" [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = sigenenames, B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples.

SNP Biobase siggenes SNP Biobase siggenes • 1.1k views

ADD COMMENT • link 18.0 years ago Charles Crane ▴ 30

score 0 · Answer 1 · 2006-05-05

Hi Charles, Charles Crane <ccrane at="" purdue.edu=""> writes: > I am trying to use sam to identify single-feature polymorphisms between > two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy > 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. > probe-level data from my CEL files, background correct the pm values with > rma, quantile normalize, and isolate the pm intensities in a data frame, but > both sam and sam.snp consistently and instantly fail with "Error in > FUN(data, cl, ...) : There should be at least 25 samples." This happens > even though there are 22840 genes and 11 probes per gene in the > data. But how many samples? If I'm reading your transcript correctly, it looks like you have 20 samples and as the method is telling you, it wants 25. I realize this isn't all that helpful, but I suspect the limitation is based on the statistical properties of the method. > Thank your very much for your time. Also, do you plan to include > directions for using sam for SNP analysis, going all the way from ReadAffy > through sam for a case history, in a future vignette? It would save the > uninitiated a lot of time and frustration. That would make a great contributed document (hint, hint) :-) > + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", > "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", > + "000123_12hr_peruvian_H20_inoc_15bx1", > "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", > + "000158_0hr_Kindered_H2O_Inoc_4", > "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", > + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", > "000190_5hr_403-rep2_H2O_inoc_22b", > + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", > "000193_0hr_403-rep2_inoc-s_pass_25b", > + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", > "000203_12hr_405-rep2_H2O_inoc_35b", > + "000204_24hr_405-rep2_H2O_inoc_36b", > "000205_0hr_405-rep2_inoc-s_pass_37b"), Here it looks like you have 20 samples. Do the docs explain anything about the 25 sample requirement? Hope that helps, + seth

score 0 · Answer 2 · 2006-05-05

Dear newsgroup: I am trying to use sam to identify single-feature polymorphisms between two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract probe-level data from my CEL files, background correct the pm values with rma, quantile normalize, and isolate the pm intensities in a data frame, but both sam and sam.snp consistently and instantly fail with "Error in FUN(data, cl, ...) : There should be at least 25 samples." This happens even though there are 22840 genes and 11 probes per gene in the data. What am I overlooking? Does sam need some sort of a mapping from the 11 probes to their parent gene? A session history follows. Thank your very much for your time. Also, do you plan to include directions for using sam for SNP analysis, going all the way from ReadAffy through sam for a case history, in a future vignette? It would save the uninitiated a lot of time and frustration. Charles Crane USDA-ARS, MWA, Crop Production and Pest Control Research Unit And Department of Botany and Plant Pathology, Purdue University > library(affy) > library(siggenes) > sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL", "000114_12hr_kindered_H20_inoc_3bx1.CEL", + "000122_5hr_peruvian_H20_inoc_14bx1.CEL", "000123_12hr_peruvian_H20_inoc_15bx1.CEL", "000141_Non-host-resistant-barley_16bx2.CEL", + "000157_24hr_Kindered_H2O_Inoc_1.CEL", "000158_0hr_Kindered_H2O_Inoc_4.CEL", "000159_0hr_Kindered_S_passerini_inoc_5.CEL", + "000161_0hr_Peruvian_H2O_inoc_13.CEL", "000162_0hr_Peruvian_S_passerini_17.CEL", "000189_0hr_403-rep2_H2O_inoc_21b.CEL", + "000190_5hr_403-rep2_H2O_inoc_22b.CEL", "000191_12hr_403-rep2_H2O_inoc_23b.CEL", "000192_24hr_403-rep2_H2O_inoc_24b.CEL", + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL", "000201_0hr_405-rep2_H2O_inoc_33b.CEL", "000202_5hr_405-rep2_H2O_inoc_34b.CEL", + "000203_12hr_405-rep2_H2O_inoc_35b.CEL", "000204_24hr_405-rep2_H2O_inoc_36b.CEL", "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"), + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", + "000123_12hr_peruvian_H20_inoc_15bx1", "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", + "000158_0hr_Kindered_H2O_Inoc_4", "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", "000190_5hr_403-rep2_H2O_inoc_22b", + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", "000193_0hr_403-rep2_inoc-s_pass_25b", + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", "000203_12hr_405-rep2_H2O_inoc_35b", + "000204_24hr_405-rep2_H2O_inoc_36b", "000205_0hr_405-rep2_inoc-s_pass_37b"), + phenoData = "Hordeumphenodata.txt") > signames <- geneNames(sidata) > sirma <- bg.correct(sidata, method = "rma") > sinorm <- normalize(sirma) > siprobeintensities <- as.data.frame(probes(sinorm, "pm")) > write.table(siprobeintensities, file = "siprobeintensities.txt") > sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples. > sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character') Read 251437 items > sigenenames[1:22] [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at" [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = sigenenames, B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples.

score 0 · Answer 3 · 2006-05-06

Charles, You used method='cat.stat', which is (only) suitable for SNPs, I think. This seems like an odd choice for detecting SFPs. You would have to do more work to get this to work properly if you actually were counting SNPs - i.e. you would have to **call** the SNPs. (And reasonably enough, sam() doesn't like small sample numbers either.) I suggest you use a linear model (method='d.stat') like the one in our Genome Research paper available here: http://naturalsystems.uchicago.edu/naturalvariation/sfp/ BTW, sam() will not object to very small sample numbers with 'method="d.stat"' HTH, Chuck Berry On Fri, 5 May 2006, Charles Crane wrote: > Dear newsgroup: > I am trying to use sam to identify single-feature polymorphisms between > two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy > 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract > probe-level data from my CEL files, background correct the pm values with > rma, quantile normalize, and isolate the pm intensities in a data frame, but > both sam and sam.snp consistently and instantly fail with "Error in > FUN(data, cl, ...) : There should be at least 25 samples." This happens > even though there are 22840 genes and 11 probes per gene in the data. What > am I overlooking? Does sam need some sort of a mapping from the 11 probes > to their parent gene? A session history follows. > Thank your very much for your time. Also, do you plan to include > directions for using sam for SNP analysis, going all the way from ReadAffy > through sam for a case history, in a future vignette? It would save the > uninitiated a lot of time and frustration. > > Charles Crane > USDA-ARS, MWA, Crop Production and Pest Control Research Unit > And Department of Botany and Plant Pathology, Purdue University > >> library(affy) >> library(siggenes) >> sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL", > "000114_12hr_kindered_H20_inoc_3bx1.CEL", > + "000122_5hr_peruvian_H20_inoc_14bx1.CEL", > "000123_12hr_peruvian_H20_inoc_15bx1.CEL", > "000141_Non-host-resistant-barley_16bx2.CEL", > + "000157_24hr_Kindered_H2O_Inoc_1.CEL", > "000158_0hr_Kindered_H2O_Inoc_4.CEL", > "000159_0hr_Kindered_S_passerini_inoc_5.CEL", > + "000161_0hr_Peruvian_H2O_inoc_13.CEL", > "000162_0hr_Peruvian_S_passerini_17.CEL", > "000189_0hr_403-rep2_H2O_inoc_21b.CEL", > + "000190_5hr_403-rep2_H2O_inoc_22b.CEL", > "000191_12hr_403-rep2_H2O_inoc_23b.CEL", > "000192_24hr_403-rep2_H2O_inoc_24b.CEL", > + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL", > "000201_0hr_405-rep2_H2O_inoc_33b.CEL", > "000202_5hr_405-rep2_H2O_inoc_34b.CEL", > + "000203_12hr_405-rep2_H2O_inoc_35b.CEL", > "000204_24hr_405-rep2_H2O_inoc_36b.CEL", > "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"), > + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", > "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", > + "000123_12hr_peruvian_H20_inoc_15bx1", > "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", > + "000158_0hr_Kindered_H2O_Inoc_4", > "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", > + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", > "000190_5hr_403-rep2_H2O_inoc_22b", > + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", > "000193_0hr_403-rep2_inoc-s_pass_25b", > + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", > "000203_12hr_405-rep2_H2O_inoc_35b", > + "000204_24hr_405-rep2_H2O_inoc_36b", > "000205_0hr_405-rep2_inoc-s_pass_37b"), > + phenoData = "Hordeumphenodata.txt") >> signames <- geneNames(sidata) >> sirma <- bg.correct(sidata, method = "rma") >> sinorm <- normalize(sirma) >> siprobeintensities <- as.data.frame(probes(sinorm, "pm")) >> write.table(siprobeintensities, file = "siprobeintensities.txt") >> sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) >> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = > dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279) > Error in FUN(data, cl, ...) : There should be at least 25 samples. >> sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character') > Read 251437 items >> sigenenames[1:22] > [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at" > [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" >> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = > sigenenames, B = 1000, ran = 11279) > Error in FUN(data, cl, ...) : There should be at least 25 samples. > > > > [ Part 3.6: "Included Message" ] > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717

score 0 · Answer 4 · 2006-05-08

0

Entering edit mode

Justin Borevitz ▴ 170

@justin-borevitz-1002

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060508/ 89d0e21b/attachment.pl

ADD COMMENT • link 18.0 years ago Justin Borevitz ▴ 170