Communicating gene-probe relationships to sam
4
0
Entering edit mode
@charles-crane-1712
Last seen 9.6 years ago
Dear newsgroup: I am trying to use sam to identify single-feature polymorphisms between two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract probe-level data from my CEL files, background correct the pm values with rma, quantile normalize, and isolate the pm intensities in a data frame, but both sam and sam.snp consistently and instantly fail with "Error in FUN(data, cl, ...) : There should be at least 25 samples." This happens even though there are 22840 genes and 11 probes per gene in the data. What am I overlooking? Does sam need some sort of a mapping from the 11 probes to their parent gene? A session history follows. Thank your very much for your time. Also, do you plan to include directions for using sam for SNP analysis, going all the way from ReadAffy through sam for a case history, in a future vignette? It would save the uninitiated a lot of time and frustration. Charles Crane USDA-ARS, MWA, Crop Production and Pest Control Research Unit And Department of Botany and Plant Pathology, Purdue University > library(affy) > library(siggenes) > sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL", "000114_12hr_kindered_H20_inoc_3bx1.CEL", + "000122_5hr_peruvian_H20_inoc_14bx1.CEL", "000123_12hr_peruvian_H20_inoc_15bx1.CEL", "000141_Non-host-resistant-barley_16bx2.CEL", + "000157_24hr_Kindered_H2O_Inoc_1.CEL", "000158_0hr_Kindered_H2O_Inoc_4.CEL", "000159_0hr_Kindered_S_passerini_inoc_5.CEL", + "000161_0hr_Peruvian_H2O_inoc_13.CEL", "000162_0hr_Peruvian_S_passerini_17.CEL", "000189_0hr_403-rep2_H2O_inoc_21b.CEL", + "000190_5hr_403-rep2_H2O_inoc_22b.CEL", "000191_12hr_403-rep2_H2O_inoc_23b.CEL", "000192_24hr_403-rep2_H2O_inoc_24b.CEL", + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL", "000201_0hr_405-rep2_H2O_inoc_33b.CEL", "000202_5hr_405-rep2_H2O_inoc_34b.CEL", + "000203_12hr_405-rep2_H2O_inoc_35b.CEL", "000204_24hr_405-rep2_H2O_inoc_36b.CEL", "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"), + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", + "000123_12hr_peruvian_H20_inoc_15bx1", "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", + "000158_0hr_Kindered_H2O_Inoc_4", "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", "000190_5hr_403-rep2_H2O_inoc_22b", + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", "000193_0hr_403-rep2_inoc-s_pass_25b", + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", "000203_12hr_405-rep2_H2O_inoc_35b", + "000204_24hr_405-rep2_H2O_inoc_36b", "000205_0hr_405-rep2_inoc-s_pass_37b"), + phenoData = "Hordeumphenodata.txt") > signames <- geneNames(sidata) > sirma <- bg.correct(sidata, method = "rma") > sinorm <- normalize(sirma) > siprobeintensities <- as.data.frame(probes(sinorm, "pm")) > write.table(siprobeintensities, file = "siprobeintensities.txt") > sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples. > sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character') Read 251437 items > sigenenames[1:22] [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at" [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = sigenenames, B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples.
SNP Biobase siggenes SNP Biobase siggenes • 1.1k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 9.6 years ago
Hi Charles, Charles Crane <ccrane at="" purdue.edu=""> writes: > I am trying to use sam to identify single-feature polymorphisms between > two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy > 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. > probe-level data from my CEL files, background correct the pm values with > rma, quantile normalize, and isolate the pm intensities in a data frame, but > both sam and sam.snp consistently and instantly fail with "Error in > FUN(data, cl, ...) : There should be at least 25 samples." This happens > even though there are 22840 genes and 11 probes per gene in the > data. But how many samples? If I'm reading your transcript correctly, it looks like you have 20 samples and as the method is telling you, it wants 25. I realize this isn't all that helpful, but I suspect the limitation is based on the statistical properties of the method. > Thank your very much for your time. Also, do you plan to include > directions for using sam for SNP analysis, going all the way from ReadAffy > through sam for a case history, in a future vignette? It would save the > uninitiated a lot of time and frustration. That would make a great contributed document (hint, hint) :-) > + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", > "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", > + "000123_12hr_peruvian_H20_inoc_15bx1", > "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", > + "000158_0hr_Kindered_H2O_Inoc_4", > "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", > + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", > "000190_5hr_403-rep2_H2O_inoc_22b", > + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", > "000193_0hr_403-rep2_inoc-s_pass_25b", > + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", > "000203_12hr_405-rep2_H2O_inoc_35b", > + "000204_24hr_405-rep2_H2O_inoc_36b", > "000205_0hr_405-rep2_inoc-s_pass_37b"), Here it looks like you have 20 samples. Do the docs explain anything about the 25 sample requirement? Hope that helps, + seth
ADD COMMENT
0
Entering edit mode
@charles-crane-1712
Last seen 9.6 years ago
Dear newsgroup: I am trying to use sam to identify single-feature polymorphisms between two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract probe-level data from my CEL files, background correct the pm values with rma, quantile normalize, and isolate the pm intensities in a data frame, but both sam and sam.snp consistently and instantly fail with "Error in FUN(data, cl, ...) : There should be at least 25 samples." This happens even though there are 22840 genes and 11 probes per gene in the data. What am I overlooking? Does sam need some sort of a mapping from the 11 probes to their parent gene? A session history follows. Thank your very much for your time. Also, do you plan to include directions for using sam for SNP analysis, going all the way from ReadAffy through sam for a case history, in a future vignette? It would save the uninitiated a lot of time and frustration. Charles Crane USDA-ARS, MWA, Crop Production and Pest Control Research Unit And Department of Botany and Plant Pathology, Purdue University > library(affy) > library(siggenes) > sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL", "000114_12hr_kindered_H20_inoc_3bx1.CEL", + "000122_5hr_peruvian_H20_inoc_14bx1.CEL", "000123_12hr_peruvian_H20_inoc_15bx1.CEL", "000141_Non-host-resistant-barley_16bx2.CEL", + "000157_24hr_Kindered_H2O_Inoc_1.CEL", "000158_0hr_Kindered_H2O_Inoc_4.CEL", "000159_0hr_Kindered_S_passerini_inoc_5.CEL", + "000161_0hr_Peruvian_H2O_inoc_13.CEL", "000162_0hr_Peruvian_S_passerini_17.CEL", "000189_0hr_403-rep2_H2O_inoc_21b.CEL", + "000190_5hr_403-rep2_H2O_inoc_22b.CEL", "000191_12hr_403-rep2_H2O_inoc_23b.CEL", "000192_24hr_403-rep2_H2O_inoc_24b.CEL", + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL", "000201_0hr_405-rep2_H2O_inoc_33b.CEL", "000202_5hr_405-rep2_H2O_inoc_34b.CEL", + "000203_12hr_405-rep2_H2O_inoc_35b.CEL", "000204_24hr_405-rep2_H2O_inoc_36b.CEL", "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"), + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", + "000123_12hr_peruvian_H20_inoc_15bx1", "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", + "000158_0hr_Kindered_H2O_Inoc_4", "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", "000190_5hr_403-rep2_H2O_inoc_22b", + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", "000193_0hr_403-rep2_inoc-s_pass_25b", + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", "000203_12hr_405-rep2_H2O_inoc_35b", + "000204_24hr_405-rep2_H2O_inoc_36b", "000205_0hr_405-rep2_inoc-s_pass_37b"), + phenoData = "Hordeumphenodata.txt") > signames <- geneNames(sidata) > sirma <- bg.correct(sidata, method = "rma") > sinorm <- normalize(sirma) > siprobeintensities <- as.data.frame(probes(sinorm, "pm")) > write.table(siprobeintensities, file = "siprobeintensities.txt") > sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples. > sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character') Read 251437 items > sigenenames[1:22] [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at" [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = sigenenames, B = 1000, ran = 11279) Error in FUN(data, cl, ...) : There should be at least 25 samples.
ADD COMMENT
0
Entering edit mode
Charles Berry ▴ 290
@charles-berry-5754
Last seen 5.1 years ago
United States
Charles, You used method='cat.stat', which is (only) suitable for SNPs, I think. This seems like an odd choice for detecting SFPs. You would have to do more work to get this to work properly if you actually were counting SNPs - i.e. you would have to **call** the SNPs. (And reasonably enough, sam() doesn't like small sample numbers either.) I suggest you use a linear model (method='d.stat') like the one in our Genome Research paper available here: http://naturalsystems.uchicago.edu/naturalvariation/sfp/ BTW, sam() will not object to very small sample numbers with 'method="d.stat"' HTH, Chuck Berry On Fri, 5 May 2006, Charles Crane wrote: > Dear newsgroup: > I am trying to use sam to identify single-feature polymorphisms between > two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy > 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract > probe-level data from my CEL files, background correct the pm values with > rma, quantile normalize, and isolate the pm intensities in a data frame, but > both sam and sam.snp consistently and instantly fail with "Error in > FUN(data, cl, ...) : There should be at least 25 samples." This happens > even though there are 22840 genes and 11 probes per gene in the data. What > am I overlooking? Does sam need some sort of a mapping from the 11 probes > to their parent gene? A session history follows. > Thank your very much for your time. Also, do you plan to include > directions for using sam for SNP analysis, going all the way from ReadAffy > through sam for a case history, in a future vignette? It would save the > uninitiated a lot of time and frustration. > > Charles Crane > USDA-ARS, MWA, Crop Production and Pest Control Research Unit > And Department of Botany and Plant Pathology, Purdue University > >> library(affy) >> library(siggenes) >> sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL", > "000114_12hr_kindered_H20_inoc_3bx1.CEL", > + "000122_5hr_peruvian_H20_inoc_14bx1.CEL", > "000123_12hr_peruvian_H20_inoc_15bx1.CEL", > "000141_Non-host-resistant-barley_16bx2.CEL", > + "000157_24hr_Kindered_H2O_Inoc_1.CEL", > "000158_0hr_Kindered_H2O_Inoc_4.CEL", > "000159_0hr_Kindered_S_passerini_inoc_5.CEL", > + "000161_0hr_Peruvian_H2O_inoc_13.CEL", > "000162_0hr_Peruvian_S_passerini_17.CEL", > "000189_0hr_403-rep2_H2O_inoc_21b.CEL", > + "000190_5hr_403-rep2_H2O_inoc_22b.CEL", > "000191_12hr_403-rep2_H2O_inoc_23b.CEL", > "000192_24hr_403-rep2_H2O_inoc_24b.CEL", > + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL", > "000201_0hr_405-rep2_H2O_inoc_33b.CEL", > "000202_5hr_405-rep2_H2O_inoc_34b.CEL", > + "000203_12hr_405-rep2_H2O_inoc_35b.CEL", > "000204_24hr_405-rep2_H2O_inoc_36b.CEL", > "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"), > + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2", > "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1", > + "000123_12hr_peruvian_H20_inoc_15bx1", > "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1", > + "000158_0hr_Kindered_H2O_Inoc_4", > "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13", > + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b", > "000190_5hr_403-rep2_H2O_inoc_22b", > + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b", > "000193_0hr_403-rep2_inoc-s_pass_25b", > + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b", > "000203_12hr_405-rep2_H2O_inoc_35b", > + "000204_24hr_405-rep2_H2O_inoc_36b", > "000205_0hr_405-rep2_inoc-s_pass_37b"), > + phenoData = "Hordeumphenodata.txt") >> signames <- geneNames(sidata) >> sirma <- bg.correct(sidata, method = "rma") >> sinorm <- normalize(sirma) >> siprobeintensities <- as.data.frame(probes(sinorm, "pm")) >> write.table(siprobeintensities, file = "siprobeintensities.txt") >> sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2) >> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = > dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279) > Error in FUN(data, cl, ...) : There should be at least 25 samples. >> sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character') > Read 251437 items >> sigenenames[1:22] > [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at" > [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at" > [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" > [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at" >> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names = > sigenenames, B = 1000, ran = 11279) > Error in FUN(data, cl, ...) : There should be at least 25 samples. > > > > [ Part 3.6: "Included Message" ] > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
ADD COMMENT
0
Entering edit mode
@justin-borevitz-1002
Last seen 9.6 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060508/ 89d0e21b/attachment.pl
ADD COMMENT

Login before adding your answer.

Traffic: 938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6