Inherits(x,"data.frame") error in SamR
3
0
Entering edit mode
McGee, Monnie ▴ 300
@mcgee-monnie-1108
Last seen 7.1 years ago
Dear Group, I trying to use samr. I have read a previous post about the ease of use of siggenes vs. samr. It is so true! I used siggenes originally, but that doesn't help me with the problem I am having. I still need to use samr because I want to assess sample size using sam.assess.samplesize. To assess sample size using sam, I need to supply sam.assess.samplesize with a "data" vector. I can't understand how to form this vector - perhaps a manual would help, but the manual is not on the SAM website as the R-help files claim. I am using a PowerMac G5 with R Version 2.2.1 (2005-12-20 r36812) installed. I would like to use samr to assess the sample size requirements for an experiment I am planning. I have some training data, which is the drosophila spike-in experiment data given in Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biology, 6, R16. Here is what I have done: gsbatch = ReadAffy() # the experiment consists of 3 technical replicates from "control" chips # and 3 technical replicates from Spike-in chips on th DrosGenome1 chip > gs.rma = rma(gsbatch) # get expression values ## get the exprSet into a format that samr can manage: > gs.rma.fr = as.data.frame.exprSet(gs.rma) > gs.mat = matrixgs.rma.fr$exprs,nrow=14010,ncol=6) > gs.mat.con = gs.mat[,1:3] > gs.mat.si = gs.mat[,4:6] > gs.mat.sam = rbind(gs.mat.con,gs.mat.si) ## this is a matrix with dim 28020 by 3, control arrays on top, spike-ins on bottom ## grouping vector > y = c(rep(1,14010),rep(2,14010)) > geneid = as.character(1:nrow(gs.mat.sam)) > genenames = gs.rma.fr$genenames[1:14010] > data = list(x=gs.mat.sam, y =y , geneid = geneid, genenames = rep(genenames,2),logged2=TRUE) > samr(data,resp.type="Two class unpaired",nperms=20) Error in inherits(x, "data.frame") : (subscript) logical subscript too long I also tried deleting the geneid & genenames vectors from the "data" list, but still received the same error. I can't figure this out. I am sure the problem is in the way that I defined the "data" list, but, without a manual, I really don't understand what I did wrong. Thank you for your help, Monnie Monnie McGee, Ph.D. Assistant Professor Department of Statistical Science Southern Methodist University Ph: 214-768-2462 Fax: 214-768-4035 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of bioconductor-request@stat.math.ethz.ch Sent: Thu 2/16/2006 5:00 AM To: bioconductor at stat.math.ethz.ch Subject: Bioconductor Digest, Vol 36, Issue 14 Send Bioconductor mailing list submissions to bioconductor at stat.math.ethz.ch To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at stat.math.ethz.ch You can reach the person managing the list at bioconductor-owner at stat.math.ethz.ch When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Affy Package - MAplot Function help needed... (Joern Wessels) 2. Re: Affy Package - MAplot Function help needed... (James W. MacDonald) 3. Re: RMA normalization when using subsets of samples (Ron Ophir) 4. LPE (Nicolas Servant) 5. Re: Timeseries loop design analysis using Limma or Maanova? (Pete) 6. Re: Timeseries loop design analysis using Limma or Maanova? (James W. MacDonald) 7. interpretation of vsn normalized data (Maurice Melancon) 8. interpretation of vsn normalized data (Maurice Melancon) 9. Fold Change values after RMA (kfbargad at ehu.es) 10. Re: interpretation of vsn normalized data (Wolfgang Huber) 11. Re: Fold Change values after RMA (James W. MacDonald) 12. ANN: BioC2006 Conference Scheduled for August in Seattle (Nianhua Li) 13. Run GOHyperG without specifying a chip (knaxerov at ix.urz.uni-heidelberg.de) 14. Data Frame Error in Affycomp (John.Brozek at it-omics.com) 15. Re: Fold Change values after RMA (Ben Bolstad) 16. Differences: mas5/mas5calls vs. call.expr/pairwise.comparison (Benjamin Otto) 17. comparing correlation coefficients (fwd) (Ilhem Diboun) 18. Re: Differences: mas5/mas5calls vs. call.expr/pairwise.comparison (Benjamin Otto) 19. GeneR (Antoine Lucas) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Feb 2006 13:09:19 +0100 From: "Joern Wessels" <wessels@staff.uni-marburg.de> Subject: [BioC] Affy Package - MAplot Function help needed... To: <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <000101c63228$a93548c0$b8acf889 at pc17368> Content-Type: text/plain; charset="iso-8859-1" Hi everybody, this is my very first post to this mailing list :-) I have got a problem I could not solve via manual or google: When I use the Maplot function of the affy package on more than three array data sets, the text in the fields showing Median and IQR is to big to be shown correctly. I tried to use the "cex.main, cex.lab, cex.axis, cex.sub but aside from changing axis text I could not change anything. Using ?MAplot did not produce any useable help (for me as a beginner at least). I used the following line to get my graphs: MAplot(Data, pairs = TRUE) I sombody could help me with that I would be one happy R rookie user. Thanks, J?rn ________________ J?rn We?els Diplom-Biologe Philipps-Universit?t Marburg Fachbereich Biologie - Tierphysiologie Karl-von-Frisch-Str. 8 35032 Marburg an der Lahn Tel.: +49 (0) 6421 28 23547 Fax: +49 (0) 6421 28 28937 Mobil: +49 (0) 170 9346198 E-Mail: wessels at staff.uni-marburg.de Webseite: http://cgi-host.uni- marburg.de/~omtierph/stoff/member.php?mem=wessels&lang=d e ------------------------------ Message: 2 Date: Wed, 15 Feb 2006 09:36:39 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] Affy Package - MAplot Function help needed... To: Joern Wessels <wessels at="" staff.uni-marburg.de=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <43F33C77.8070705 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Joern, Joern Wessels wrote: > Hi everybody, > this is my very first post to this mailing list :-) I have got a problem I > could not solve via manual or google: > When I use the Maplot function of the affy package on more than three array > data sets, the text in the fields showing Median and IQR is to big to be > shown correctly. > I tried to use the "cex.main, cex.lab, cex.axis, cex.sub but aside from > changing axis text I could not change anything. Using ?MAplot did not > produce any useable help (for me as a beginner at least). > > I used the following line to get my graphs: > > MAplot(Data, pairs = TRUE) > > I sombody could help me with that I would be one happy R rookie user. Well, unfortunately the cex arguments for the text is hard coded, so no combination of cex.xxx = ? is going to change things. I will probably fix this so you can pass the cex argument to the function, but that will take a few days to propagate to the devel download repository and would require that you use the devel version of R. Neither of these things is likely to be a good thing for a rookie, so the best fix in this case is for you to make a modified function that will do what you want (which has the added benefit of helping you to learn R). When you call MAplot() with pairs = TRUE, this function calls another function called mva.pairs(). If you type mva.pairs at an R prompt, it will be printed to the screen: > mva.pairs function (x, labels = colnames(x), log.it = TRUE, span = 2/3, family.loess = "gaussian", digits = 3, line.col = 2, main = "MVA plot", ...) { if log.it) x <- log2(x) J <- dim(x)[2] frame() old.par <- par(no.readonly = TRUE) on.exit(par(old.par)) par(mfrow = c(J, J), mgp = c(0, 0.2, 0), mar = c(1, 1, 1, 1), oma = c(1, 1.4, 2, 1)) for (j in 1:(J - 1)) { par(mfg = c(j, j)) plot(1, 1, type = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "") text(1, 1, labels[j], cex = 2) for (k in (j + 1):J) { par(mfg = c(j, k)) yy <- x[, j] - x[, k] xx <- (x[, j] + x[, k])/2 sigma <- IQR(yy) mean <- median(yy) ma.plot(xx, yy, tck = 0, show.statistics = FALSE, pch = ".", xlab = "", ylab = "", tck = 0, span = span, ...) par(mfg = c(k, j)) txt <- format(sigma, digits = digits) txt2 <- format(mean, digits = digits) plot(c(0, 1), c(0, 1), type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n") text(0.5, 0.5, paste(paste("Median:", txt2), paste("IQR:", txt), sep = "\n"), cex = 2) } } par(mfg = c(J, J)) plot(1, 1, type = "n", xaxt = "n", yaxt = "n", xlab = "", ylab = "") text(1, 1, labels[J], cex = 2) mtext("A", 1, outer = TRUE, cex = 1.5) mtext("M", 2, outer = TRUE, cex = 1.5, las = 1) mtext(main, 3, outer = TRUE, cex = 1.5) invisible() } You can then copy this output and paste it into your favorite text editor. Fix the first line to say something like this: my.mva.pairs <- function (x, labels = colnames(x), log.it = TRUE, span = 2/3, family.loess = "gaussian", digits = 3, line.col = 2, main = "MVA plot", cex.text = 2, ...) Note the change in the function name (along with the <- ), and the addition of cex.text = 2. Now change the line that reads text(0.5, 0.5, paste(paste("Median:", txt2), paste("IQR:", txt), sep = "\n"), cex = 2) to say text(0.5, 0.5, paste(paste("Median:", txt2), paste("IQR:", txt), sep = "\n"), cex = cex.text) Now you can either copy/paste this function back into your R session, or save it as something like my.mva.pairs.R and source() it into your R session. You are almost there - MAplot does some pre-processing of the data first that you will have to do by hand. You need to extract the raw intensity data from your AffyBatch and pass that to your new function: pms <- unlist(indexProbes(Data, "both")) x <- intensity(Data)[pms, ] my.mva.pairs(x, cex.text = 1) Poking around in other people's code and seeing what it does/changing it to do slightly different things is one of the best ways IMO to learn R. HTH, Jim > > Thanks, > J?rn > > > ________________ > J?rn We?els > Diplom-Biologe > > Philipps-Universit?t Marburg > Fachbereich Biologie - Tierphysiologie > Karl-von-Frisch-Str. 8 > 35032 Marburg an der Lahn > > Tel.: +49 (0) 6421 28 23547 > Fax: +49 (0) 6421 28 28937 > Mobil: +49 (0) 170 9346198 > E-Mail: wessels at staff.uni-marburg.de > Webseite: > http://cgi-host.uni- marburg.de/~omtierph/stoff/member.php?mem=wessels&lang=d > e > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ------------------------------ Message: 3 Date: Wed, 15 Feb 2006 17:05:13 +0200 From: "Ron Ophir" <ron.ophir@weizmann.ac.il> Subject: Re: [BioC] RMA normalization when using subsets of samples To: <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <s3f35f58.078 at="" wisemail.weizmann.ac.il=""> Content-Type: text/plain; charset=US-ASCII Dear all, I think that D.R. Godstein has tried to answer Sylvia's question in http://ludwig-sun2.unil.ch/~darlene/ms/prRMA.pdf Ron >>> <larry.lapointe at="" csiro.au=""> 02/15/06 11:55 AM >>> Dear Martin, We have run up to 550 chips achieving a reasonable processing time -- not more than an hour or so. The practical limits seem to be more related to machine RAM and R memory management. RMA normalization of 550 chips occupies about 12 GB or so on our quad processor Opteron- based system. Larry Lawrence LaPointe CSIRO Bioinformatics for Human Health Sydney, Australia -----Original Message----- From: bioconductor-bounces at stat.math.ethz.ch on behalf of martin.schumacher at novartis.com Sent: Wed 2/15/2006 7:43 PM To: bioconductor at stat.math.ethz.ch Cc: Subject: Re: [BioC] RMA normalization when using subsets of samples Dear Colleagues, Greetings from Switzerland ! I agree with the statements of Wolfgang and Adai. Using all chips will certainly put you on the safe side. I wonder what you feel would be the minimal number of chips for a "stable" (meaning that a larger set would give essentially the same results) RMA processing? People from GeneLogic told me that about 20 chips are sufficient. Is it possible to run RMA using Bioconductor with 200 chips and get the results back within a reasonable time? Best regards, Martin Adaikalavan Ramasamy <ramasamy at="" cancer.org.uk=""> Sent by: bioconductor-bounces at stat.math.ethz.ch 15.02.2006 01:01 Please respond to ramasamy To: Wolfgang Huber <huber at="" ebi.ac.uk=""> cc: Sylvia.Merk at ukmuenster.de, bioconductor at stat.math.ethz.ch, (bcc: Martin Schumacher/PH/Novartis) Subject: Re: [BioC] RMA normalization when using subsets of samples Category: This would be a problem if one or more of the resulting subsets is small and contains outliers. My preference is to preprocess all arrays together. My reasoning is that doing this will give RMA median polish (and to a lesser extent with the quantile normalisation) steps much more information to work with. Regards, Adai On Wed, 2006-02-15 at 17:16 +0000, Wolfgang Huber wrote: > Dear Sylvia, > > this might not be the answer that you want to hear, but for the end > result it should not matter (substantially) which of the two > possibilities you take, and I would be worried if it did. > > Best wishes > Wolfgang > > ------------------------------------- > Wolfgang Huber > European Bioinformatics Institute > European Molecular Biology Laboratory > Cambridge CB10 1SD > England > Phone: +44 1223 494642 > Fax: +44 1223 494486 > Http: www.ebi.ac.uk/huber > ------------------------------------- > > Sylvia.Merk at ukmuenster.de wrote: > > Dear bioconductor list, > > > > I have a question concerning RMA-normalization: > > > > There are for example 200 CEL-Files and the clinicians have several > > research questions - each concernig only a subset of the 200 samples > > including the possibility that some samples are included in more than > > one question. > > > > There are two possibilities to normalize the CEL-Files: > > > > 1.: Read all 200 chips in an affybatch-object and normalize all 200 > > chips together and further analyze the required subset. > > > > 2.: Read only the required chips in an affybatch-object, normalize these > > chips and then perform further analysis > > I think that this approach is the better one but it has the disadvantage > > that some samples are included in several normalizations ending in > > different gene expression levels for a single sample. > > > > What is (from a statisticians view) the appropriate approach to > > normalize CEL-Files in this case? > > > > Thank you in advance > > Sylvia > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor ------------------------------ Message: 4 Date: Wed, 15 Feb 2006 16:07:16 +0100 From: Nicolas Servant <nicolas.servant@curie.fr> Subject: [BioC] LPE To: bioconductor at stat.math.ethz.ch Message-ID: <43F343A4.9030500 at curie.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed -- Nicolas Servant Equipe Bioinformatique Institut Curie 26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE Email: Nicolas.Servant at curie.fr Tel: 01 44 32 42 75 ------------------------------ Message: 5 Date: Wed, 15 Feb 2006 15:38:57 -0000 From: "Pete" <p.underhill@har.mrc.ac.uk> Subject: Re: [BioC] Timeseries loop design analysis using Limma or Maanova? To: <bioconductor at="" stat.math.ethz.ch="">, "Jenny Drnevich" <drnevich at="" uiuc.edu=""> Message-ID: <03d001c63245$ef9799e0$a2f1f682 at mrc153> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Thanks for your response, >>Hello all, >> >>I have been asked to analyse a set of timecourse data with an unusual >>incomplete loop design. This is the design of this type I have looked at >>and I'm not entirely sure how to treat it. >> >>The initial (and fairly easy) question asked of the data is, what are the >>differences between the mutant and the control animals at each timepoint? > > I am interested in how you are going to analyze the differences between > mutants and controls at each time point given that there is no replication > of the control animals (only 1 control pool). I just advised a researcher > against this kind of experimental design because I could not think offhand > of a way to analyze it statistically. If there is a statistically valid > method, I would like to know about it. > I'm not quite sure I understand your point here? I was going to treat this as a simple dye swap experiment, ignoring time and comparing mutant to WT. Is this not a statistically valid approach? There are 3 independ mutant samples compared in dyeswaps to the WT pool. I understand that there is no biological replicate for the WT pool, however it is technically replicated at the dyeswap level and cDNA synthesis level. The biological variation of the WT population is not of immediate interest in this case, hence a pool was used. Individual mutant samples were used instead of a pool, because only a limited number of mutants were available. > >>The second question is how the mutant changes across the timeseries. The >>authors >>wish to use a bayesian timeseries clustering algorithmn to analyse this, >>but >>this requires a standardised measure for the mutant at each timepoint. > > How are you going to implement this bayesian timeseries clustering? My > interpretation of clustering algorithms in general is that they should not > be used to determine which genes are "differentially" expressed, but > rather > one should first use a statistical model to determine differential > expression, then only cluster the genes that show a significant difference > somewhere along the time series to find groups of genes that show a > similar > expression pattern. My approach to this situation would be something along > the lines of a single-channel analysis using a mixed model with array + > dye > + treatment + time + treatment*time, and then cluster genes that showed a > significant time effect, using the lsmeans for each mutant*timepoint > group. > The lack of replication of the controls may cause this not to work... > > Cheers, > Jenny I agree with your statement about clustering, and prehaps I didn't word my question very clearly. The timeseires clustering will indeed be performed on genes selected as differential with respect to time. In the past I have used the MAANOVA package to select these differential genes, however in that particular case, the samples were all compared back to a single reference sample rather than multiple references that are then compared to eachother in an incomplete loop. The issue I am concerned with is how to both, select genes that have a time effect, and how/what to use as a standardised expression level for these genes so that it can then be used in the clustering. Cheers Pete > > > >>I am unsure quite how to achieve this second point and welcome any >>suggestions or references that may help. Is this something I could do in >>Limma or MAanova? >> >> >>The data are from spotted, two-colour, oligo arrays. There are 6 >>timepoints. >>At each timepoint, tissue samples from 3 individual mutant animals are >>compared to a control pool of WT animals at the same timepoint, with dye >>swaps. In addition each control pool has then been compared in a dye swap >>to >>the next timepoint control pool. See diagram below (if it comes out >>correctly!) or the table further below where a1 a2 a3 represent any 3 >>individual animals. >> >> >> >>a1t1 a2t1 a3t1 a1t2 a2t2 a3t2 etc............ >> \\ || // \\ || // >> Control t1 ========= Control t2 ==== etc............... >> >>or >> >>SLIDE CY3 CY5 >>1 a1t1 control t1 >>2 control t1 a1t1 >>3 a2t1 control t1 >>4 control t1 a2t1 >>5 a3t1 control t1 >>6 control t1 a3t1 >>7 a1t2 control t2 >>8 control t2 a1t2 >>9 a2t2 control t2 >>10 control t2 a2t2 >>11 a3t2 control t2 >>12 control t2 a3t2 >>13 a1t3 control t3 >>14 control t3 a1t3 >>15 a2t3 control t3 >>16 control t3 a2t3 >>17 a3t3 control t3 >>18 control t3 a3t3 >>19 a1t4 control t4 >>20 control t4 a1t4 >>21 a2t4 control t4 >>22 control t4 a2t4 >>23 a3t4 control t4 >>24 control t4 a3t4 >>25 a1t5 control t5 >>26 control t5 a1t5 >>27 a2t5 control t5 >>28 control t5 a2t5 >>29 a3t5 control t5 >>30 control t5 a3t5 >>31 a1t6 control t6 >>32 control t6 a1t6 >>33 a2t6 control t6 >>34 control t6 a2t6 >>35 a3t6 control t6 >>36 control t6 a3t6 >>37 control t1 control t2 >>38 control t2 control t1 >>39 control t2 control t3 >>40 control t3 control t2 >>41 control t3 control t4 >>42 control t4 control t3 >>43 control t4 control t5 >>44 control t5 control t4 >>45 control t5 control t6 >>46 control t6 control t5 >> >> >>Many thanks >> >>Pete >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > ------------------------------ Message: 6 Date: Wed, 15 Feb 2006 11:01:13 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] Timeseries loop design analysis using Limma or Maanova? To: Pete <p.underhill at="" har.mrc.ac.uk=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <43F35049.6000701 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Pete, Pete wrote: > I'm not quite sure I understand your point here? I was going to treat this > as a simple dye swap experiment, ignoring time and comparing mutant to WT. > Is this not a statistically valid approach? There are 3 independ mutant > samples compared in dyeswaps to the WT pool. I understand that there is no > biological replicate for the WT pool, however it is technically replicated > at the dyeswap level and cDNA synthesis level. The biological variation of > the WT population is not of immediate interest in this case, hence a pool > was used. Individual mutant samples were used instead of a pool, because > only a limited number of mutants were available. You can certainly do something like this, but there are some caveats. First, by comparing WT to mutant and ignoring time you are essentially looking at a main effect that might not be of much interest (hence why would you make the effort to do a time series?). Usually a more interesting question is to look for genes that are differentially expressed between mutant and WT at particular times, which I assume is why Jenny said you have no replication. Second, when you compare biological replicates to technical replicates you are underestimating the true variability of the WT samples, which may result in apparent significance where there may have been none had biological replicates been used for WT samples as well. This is usually only a problem when you try to validate the results (using new biologically replicated samples), if there are many genes that fail to validate. Since the validation step is usually much slower and laborious, decreasing the number of false positives in the microarray step is often worth the time and effort. Best, Jim -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ------------------------------ Message: 7 Date: Wed, 15 Feb 2006 09:49:12 -0800 From: Maurice Melancon <dmso12@gmail.com> Subject: [BioC] interpretation of vsn normalized data To: bioconductor at stat.math.ethz.ch Message-ID: <87ba8bf70602150949s6e9e2af4t5ca024b1b77d394b at mail.gmail.com> Content-Type: text/plain Hello All, I used vsn to normalze my one-channel cDNA microarray experiment. I'm sorry if this is an elementary question (I'm not a math person) but can vsn data be interpreted in similar fashion to log2 data, e.g. 1 log vale = 2-fold induction? What would be the appropriate transformation to get to either log2 or raw data from vsn data? Briefly, what I did was to normalize using vsn, then I used SAS to run anovas with pairwise comparisons and anova slicing. Using the estimate function returns estimated differences between the reported means. I am seeking then to bridge the gap between these estimates and actual fold changes. I think this can be done, but I am unsure about how to either reverse-transform the vsn data or how to interpret it biologically (e.g. 1 log = 2x fold change) WIth thanks Maurice [[alternative HTML version deleted]] ------------------------------ Message: 8 Date: Tue, 14 Feb 2006 16:16:49 -0800 From: Maurice Melancon <dmso12@gmail.com> Subject: [BioC] interpretation of vsn normalized data To: bioconductor at stat.math.ethz.ch Message-ID: <87ba8bf70602141616t5fb5275o29c41c9a950f4825 at mail.gmail.com> Content-Type: text/plain Hello All, I used vsn to normalze my one-channel cDNA microarray experiment. I'm sorry if this is an elementary question (I'm not a math person) but can vsn data be interpreted in similar fashion to log2 data, e.g. 1 log vale = 2-fold induction? What would be the appropriate transformation to get to either log2 or raw data from vsn data? WIth thanks Maurice [[alternative HTML version deleted]] ------------------------------ Message: 9 Date: Tue, 14 Feb 2006 10:30:23 +0100 (CET) From: kfbargad@ehu.es Subject: [BioC] Fold Change values after RMA To: bioconductor at stat.math.ethz.ch Message-ID: <1759269561kfbargad at ehu.es> Content-Type: text/plain; charset="ISO-8859-1" Dear List, I have come across an article (Choudary et al. 2005, PNAS 102,15653- 15658) where they state that the FC values after RMA preprocessing "always" remain below a maximum of 2.0 Is this right? I have performed some analyses using RMA, quantile normalisation and limma and am getting M values higher than 2, and if M = log2(FC), then FC values are higher than 4. What am I missing? Any comments on this? Thanks in advance, David ------------------------------ Message: 10 Date: Thu, 16 Feb 2006 18:16:24 +0000 From: Wolfgang Huber <huber@ebi.ac.uk> Subject: Re: [BioC] interpretation of vsn normalized data To: Maurice Melancon <dmso12 at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <43F4C178.6020305 at ebi.ac.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Maurice, in statistics it is sometimes useful to differentiate between (a) the estimator and (b) the true underlying quantity that you want to estimate. For example, if you want to estimate the expectation value of a symmetric distribution, you can use the mean, or the median as estimators. They are both correct, but depending on the data they can provide different, and more or less appropriate answers. With microarrays, (b) is the fold-change, that is the change in mRNA abundance. The log-ratio of fluorescence intensities is a simple and intuitive estimator for this, but if the fluorescence intensities become small, this estimator can have unpleasant properties, like large variance. The glog-ratio (what vsn provides) is an alternative estimator, which avoids the variance explosion, for the price of being biased towards 0 when the fluorescence intensities are small. Note that the vsn function returns glog to base e (so a glog-ratio of 1 corresponds to an estimated fold change of exp(1) = 2.718..) while many other packages use log2. Hope this helps Wolfgang Maurice Melancon wrote: > Hello All, > > I used vsn to normalze my one-channel cDNA microarray experiment. I'm sorry > if this is an elementary question (I'm not a math person) but can vsn data > be interpreted in similar fashion to log2 data, e.g. 1 log vale = 2-fold > induction? What would be the appropriate transformation to get to either > log2 or raw data from vsn data? > > Briefly, what I did was to normalize using vsn, then I used SAS to run > anovas with pairwise comparisons and anova slicing. Using the estimate > function returns estimated differences between the reported means. I am > seeking then to bridge the gap between these estimates and actual fold > changes. I think this can be done, but I am unsure about how to either > reverse-transform the vsn data or how to interpret it biologically (e.g. 1 > log = 2x fold change) > > WIth thanks > > Maurice > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber ------------------------------ Message: 11 Date: Wed, 15 Feb 2006 13:37:00 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] Fold Change values after RMA To: kfbargad at ehu.es Cc: bioconductor at stat.math.ethz.ch Message-ID: <43F374CC.4060407 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi David, kfbargad at ehu.es wrote: > Dear List, > > I have come across an article (Choudary et al. 2005, PNAS 102,15653- > 15658) where they state that the FC values after RMA > preprocessing "always" remain below a maximum of 2.0 Is this right? No, this is not correct. There are other factual errors in that paper as well, which makes me wonder if there was a breakdown in communication between the statisticians and those who wrote the paper. That said, it is my understanding that fold change values in brain are often very small, so they may simply be trying to indicate that using a fold change of two is not reasonable in that context. Best, Jim > > I have performed some analyses using RMA, quantile normalisation and > limma and am getting M values higher than 2, and if M = log2(FC), then > FC values are higher than 4. What am I missing? Any comments on this? > > Thanks in advance, > > David > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ------------------------------ Message: 12 Date: Wed, 15 Feb 2006 11:28:39 -0800 From: Nianhua Li <nli@fhcrc.org> Subject: [BioC] ANN: BioC2006 Conference Scheduled for August in Seattle To: bioconductor at stat.math.ethz.ch, r-help at stat.math.ethz.ch Message-ID: <43F380E7.9010603 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed ================================== BioC2006 Where Software and Biology Connect ================================== This conference will highlight current developments within and beyond Bioconductor, a world-wide open source and open development software project for the analysis and comprehension of genomic data. Our goal is to provide a forum in which to discuss the use and design of software for analyzing data arising in biology with a focus on Bioconductor and genomic data. Where: Fred Hutchinson Cancer Research Center Seattle WA. When: August 3 and 4, 2006 What: Morning Talks: 8:30-12:00 Afternoon Practicals: 2:00-5:00 Thursday Evening 5:00-7:30 Posters and Wine & Cheese Fees: 300 USD for attendees registered before July 1 250 USD for Bioconductor package maintainers or FHCRC employees 125 USD for enrolled full-time students The online registration form and conference details are now available at http://www.bioconductor.org/BioC2006 (You will be redirected to our secure server: https://cobra.fhcrc.org/BioC2006 ------------------------------ Message: 13 Date: Wed, 15 Feb 2006 20:41:53 -0500 From: knaxerov@ix.urz.uni-heidelberg.de Subject: [BioC] Run GOHyperG without specifying a chip To: bioconductor at stat.math.ethz.ch Message-ID: <20060215204153.9bs7tl84f4c4s0sw at wwwmail-new.urz.uni- heidelberg.de> Content-Type: text/plain; charset=ISO-8859-1 Hello everyone, here's a really trivial question, but I can't find the answer anywhere: can I use GOHyperG() in GOstats without specifying a particular microarray chip? I'd simply like to pass a list of EntrezGene IDs as my "total population". Thanks!!!! Kamila ------------------------------ Message: 14 Date: Thu, 16 Feb 2006 09:07:58 +0100 From: John.Brozek@it-omics.com Subject: [BioC] Data Frame Error in Affycomp To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <ofbf9d5073.df6d05ad-onc1257117.002c3b0e-c1257117.002cad01 at="" genfit.com=""> Content-Type: text/plain Dear all, I encountered the same problem described by monnie McGee using the Affycomp library. As adviced by the vignette I've used the " read.newspikein" function with HG-U133A spikein data. >library(affy) >library(affycomp) >spike133 <- ReadAffy(...) >eset <- expresso(spike133, bgcorrect.method="rma", normalize.method="quantiles", pmcorrect.method="pmonly", summary.method="medianpolish") >new.eset <- exprs(eset) >write.table(data.frame(new.eset,check.names=FALSE),"rma-133.csv",sep= ",",col.names=NA,quote=FALSE) >read.newspikein("rma-133.csv") Error in "[.data.frame"(s, , rownames(pData(pd))) : undefined columns selected Any suggestions ? Thanks John Brozek [[alternative HTML version deleted]] ------------------------------ Message: 15 Date: Wed, 15 Feb 2006 10:15:34 -0800 From: Ben Bolstad <bmb@bmbolstad.com> Subject: Re: [BioC] Fold Change values after RMA To: kfbargad at ehu.es Cc: bioconductor at stat.math.ethz.ch Message-ID: <1140027334.3781.19.camel at localhost.localdomain> Content-Type: text/plain Two points: 1. In general estimates of FC off microarrays tend to be smaller than the truth, irrespective of processing algorithm. 2. There is no specific reason why RMA should limit to FC values of 2.0 (and it does not do this in general, as you have observed with your own dataset). In the case of Choudary et al they are studying gene expression changes in the brain and my understanding is that these fold changes are typically small, perhaps explaining the comment. Ben On Tue, 2006-02-14 at 10:30 +0100, kfbargad at ehu.es wrote: > Dear List, > > I have come across an article (Choudary et al. 2005, PNAS 102,15653- > 15658) where they state that the FC values after RMA > preprocessing "always" remain below a maximum of 2.0 Is this right? > > I have performed some analyses using RMA, quantile normalisation and > limma and am getting M values higher than 2, and if M = log2(FC), then > FC values are higher than 4. What am I missing? Any comments on this? > > Thanks in advance, > > David > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor ------------------------------ Message: 16 Date: Thu, 16 Feb 2006 10:24:34 +0100 From: "Benjamin Otto" <b.otto@uke.uni-hamburg.de> Subject: [BioC] Differences: mas5/mas5calls vs. call.expr/pairwise.comparison To: "BioClist" <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <noeokkcpbgiaippdonmgceljcbaa.b.otto at="" uke.uni-="" hamburg.de=""> Content-Type: text/plain; charset="iso-8859-1" Dear BioC members, in my last calculations I noticed that the affy-package combination of mas5() and mas5calls() results in different Present/Absent calls than the simpleaffy-package version with call.exprs() and pairwise.comparison(). That was the more surprising for me as I thought simpleaffy was just some wrapper around the affy-package automating some high-level steps. A comparison of the results with the ones returned by the affymetrix software revealed that the simpleaffy version is nearly identical while the affy version is different one. Is there some error in my code? The exact commands I used were: x <- read.affy() #version 1: mas <- mas5(x,sc=sometgt) mas.call <- mas5calls(x) #version 2: simplemas <- call.exprs(x,"mas5",sc=sometgt) simplemas.cmp <- pairwise.comparison(simplemas,"treatment",spots=x) regards Benjamin ------------------------------ Message: 17 Date: Thu, 16 Feb 2006 10:03:50 +0000 (GMT) From: Ilhem Diboun <idiboun@biochemistry.ucl.ac.uk> Subject: [BioC] comparing correlation coefficients (fwd) To: bioconductor at stat.math.ethz.ch Message-ID: <pine.lnx.4.44.0602161001490.19097-100000 at="" w3pain=""> Content-Type: TEXT/PLAIN; charset=US-ASCII Dear all I would greaty appreaciate any help with the following. Can Pearson correlation coefficients from data on different range or scale be compared??.For example, if I compute the (r) value from a pair of intensity ratio datasets spanning the range -10 to +10, can I compare it with an the (r) value from another pair of intensity ratio datasets spanning a different range say -5 to +5. Similarily, can I compare the (r) value from correlating a pair of intensity ratio datasets with that from correlating a pair of absolute intensity datasets (where the scale of the data is different). The question that I would want to address from such comparison is whether the ratios covary better than the raw intensities. Please let me know if this is not clear enough... Many thanks. ------------------------------ Message: 18 Date: Thu, 16 Feb 2006 11:21:44 +0100 From: "Benjamin Otto" <b.otto@uke.uni-hamburg.de> Subject: Re: [BioC] Differences: mas5/mas5calls vs. call.expr/pairwise.comparison To: "Benjamin Otto" <b.otto at="" uke.uni-hamburg.de="">, "BioClist" <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <noeokkcpbgiaippdonmggelkcbaa.b.otto at="" uke.uni-="" hamburg.de=""> Content-Type: text/plain; charset="us-ascii" I just checked the expresson values returned by the two methods. The interesting thing is, mas5(x,sc=sometgt) yields results nearly identical to the affymetrix ones. The simplaffy call.exprs(x,"mas5",sc=sometgt) return values a little bit different different. Looking the resulting values it makes no big difference if I call call.exprs() with "mas5" or "mas5-R". So why do I get such different results? Here are my first five rows: > smp2 <- call.exprs(x,"mas5-R",sc=sometgt) > exprs(smp2)[1:5,] 1026_HG-U133.CEL 62_HG-U133.CEL 1007_s_at 4.850861 4.657905 1053_at 2.438049 3.517322 117_at 4.199809 4.803548 121_at 6.758776 6.241071 1255_g_at 4.450771 1.855238 > exprs(simplemas)[1:5,] 1026_HG-U133.CEL 62_HG-U133.CEL 1007_s_at 4.851125 4.658149 1053_at 2.438313 3.517566 117_at 4.200072 4.803792 121_at 6.759039 6.241314 1255_g_at 4.451034 1.855482 > exprs(mas)[1:5,] 1026_HG-U133.CEL 62_HG-U133.CEL 1007_s_at 28.857236 25.244638 1053_at 5.419085 11.450371 117_at 18.376736 27.926217 121_at 108.291468 75.639653 1255_g_at 21.868322 3.618115 > log(exprs(mas))[1:5,] 1026_HG-U133.CEL 62_HG-U133.CEL 1007_s_at 3.362361 3.228614 1053_at 1.689927 2.438022 117_at 2.911086 3.329566 121_at 4.684826 4.325981 1255_g_at 3.085039 1.285953 and here is a copy of the corresponding function calls of mas5() and call.exprs(): > mas5 function (object, normalize = TRUE, sc = 500, analysis = "absolute", ...) { res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", normalize = FALSE, summary.method = "mas", ...) if (normalize) res <- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis) return(res) } #calls.exprs ... else if (algorithm == "mas5-R") { if is.na(method)) { tmp1 <- expresso(x, normalize = FALSE, bgcorrect.method = "mas", pmcorrect.method = "mas", summary.method = "mas") tmp <- affy.scalevalue.exprSet(tmp1, sc = sc) } else { tmp1 <- expresso(x, normalize.method = method, bgcorrect.method = "mas", pmcorrect.method = "mas", summary.method = "mas") tmp <- affy.scalevalue.exprSet(tmp1, sc = sc) } ... else if (algorithm == "mas5") { tmp <- justMAS(x, tgt = sc) if (!do.log) { exprs(tmp) <- 2^exprs(tmp) } ... I don't really see the difference. regards, benjamin > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin > Otto > Sent: 16 February 2006 10:25 > To: BioClist > Subject: [BioC] Differences: mas5/mas5calls vs. > call.expr/pairwise.comparison > > > Dear BioC members, > > in my last calculations I noticed that the affy-package combination of > mas5() and mas5calls() results in different Present/Absent calls than the > simpleaffy-package version with call.exprs() and > pairwise.comparison(). That > was the more surprising for me as I thought simpleaffy was just > some wrapper > around the affy-package automating some high-level steps. A comparison of > the results with the ones returned by the affymetrix software > revealed that > the simpleaffy version is nearly identical while the affy version is > different one. Is there some error in my code? > The exact commands I used were: > > x <- read.affy() > > #version 1: > mas <- mas5(x,sc=sometgt) > mas.call <- mas5calls(x) > > #version 2: > simplemas <- call.exprs(x,"mas5",sc=sometgt) > simplemas.cmp <- pairwise.comparison(simplemas,"treatment",spots=x) > > > regards > Benjamin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > ------------------------------ Message: 19 Date: Thu, 16 Feb 2006 11:55:11 +0100 From: Antoine Lucas <antoinelucas@libertysurf.fr> Subject: [BioC] GeneR To: bioconductor at stat.math.ethz.ch Message-ID: <20060216115511.1817fcb4.antoinelucas at libertysurf.fr> Content-Type: text/plain; charset=ISO-8859-15 Dear R users, There is a new release of package GeneR. Briefly, GeneR package provides tools to manipulate large DNA/protein sequences (like a whole chromosome). By "manipulate" I mean of course extract, append, concatenate; but also look for "word", orfs or masked positions, or to returns compositions of nono-di-tri nucleotides. It has been designed to work with large vectors so that it can concatenate all exons of a chromosome at once, or the composition of same exons. There are I/O functions to work with standard bank files like Fasta, Genebank and Embl. And last, but not least we provide a complete arithmetic on "segments" i.e. functions like "union", "intersection", "not" between two sets of segments. One application could be to "substract" all CDS from genes and deduce UTR regions. In new release we add some functions working on Embl files (to read headers, or features), and correct some bugs. I hope you will enjoy it ! Regards, Antoine. -- Antoine Lucas Centre de g?n?tique Mol?culaire, CNRS 91198 Gif sur Yvette Cedex Tel: (33)1 69 82 38 89 Fax: (33)1 69 82 38 77 ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 36, Issue 14 ********************************************
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 7.1 years ago
Hi Monnie, I have a few suggestions, but as I haven't used the functions you want to use I'm not certain they will help. First, if what you want is to get a hold of the expression matrix in the exprSet, you do not need to convert to data.frame first. Instead of this: > gs.rma.fr = as.data.frame.exprSet(gs.rma) > gs.mat = matrixgs.rma.fr$exprs,nrow=14010,ncol=6) Try: gs.mat = exprs(gs.rma) The error msg you saw: > samr(data,resp.type="Two class unpaired",nperms=20) Error in inherits(x, "data.frame") : (subscript) logical subscript too long seems to suggest that samr is expecting a data.frame, not a list. They are similar, but not the same. So you might try: data = data.frame(x=gs.mat.sam, y=y , geneid = geneid, genenames=rep(genenames,2), logged2=TRUE) HTH, + seth ADD COMMENT 0 Entering edit mode @james-w-macdonald-5106 Last seen 2 days ago United States Hi Monnie, McGee, Monnie wrote: > Dear Group, > > I trying to use samr. I have read a previous post about the ease of > use of siggenes vs. samr. It is so true! I used siggenes > originally, but that doesn't help me with the problem I am having. I > still need to use samr because I want to assess sample size using > sam.assess.samplesize. To assess sample size using sam, I need to > supply sam.assess.samplesize with a "data" vector. I can't > understand how to form this vector - perhaps a manual would help, but > the manual is not on the SAM website as the R-help files claim. I > am using a PowerMac G5 with R Version 2.2.1 (2005-12-20 r36812) > installed. > > I would like to use samr to assess the sample size requirements for > an experiment I am planning. I have some training data, which is > the drosophila spike-in experiment data given in Choe, S. E., > Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005). > Preferred analysis methods for Affymetrix GeneChips revealed by a > wholly defined control dataset. Genome Biology, 6, R16. > > Here is what I have done: gsbatch = ReadAffy() # the experiment > consists of 3 technical replicates from "control" chips # and 3 > technical replicates from Spike-in chips on th DrosGenome1 chip > >> gs.rma = rma(gsbatch) # get expression values > > ## get the exprSet into a format that samr can manage: > >> gs.rma.fr = as.data.frame.exprSet(gs.rma) gs.mat = >> matrixgs.rma.fr$exprs,nrow=14010,ncol=6) gs.mat.con = gs.mat[,1:3] >> gs.mat.si = gs.mat[,4:6] gs.mat.sam = rbind(gs.mat.con,gs.mat.si) >> ## this is a matrix with dim 28020 by 3, control arrays on top, >> spike-ins on bottom > > ## grouping vector > >> y = c(rep(1,14010),rep(2,14010)) geneid = >> as.character(1:nrow(gs.mat.sam)) genenames = >> gs.rma.fr$genenames[1:14010] data = list(x=gs.mat.sam, y =y , >> geneid = geneid, genenames = rep(genenames,2),logged2=TRUE) >> samr(data,resp.type="Two class unpaired",nperms=20) > > Error in inherits(x, "data.frame") : (subscript) logical subscript > too long I also tried deleting the geneid & genenames vectors from > the "data" list, but still received the same error. > > I can't figure this out. I am sure the problem is in the way that I > defined the "data" list, but, without a manual, I really don't > understand what I did wrong. Ah, but there is a manual, or at least there are man pages! Terseness is the norm, so you have to be very careful when you read what is written, as the devil is often in the details. You have length(y) == 28020, whereas the man page for samr says: data: Data object with components x- p by n matrix of features, one observation per column (missing values allowed); y- n-vector of outcome measurements; censoring.status- n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome Note that the y vector is supposed to be an n-vector of outcome measurements whereas you have a p-vector of outcome measurements. Also note that 'x' is supposed to be a matrix, whereas you likely have a data.frame. Sometimes the term matrix is used to mean 'any rectangular arrangement of data', but a data.frame is in fact a list, so if the author really means matrix and doesn't have any error checking to coerce a data.frame to a matrix, an error may occur as well. I find the examples are often more enlightening than Arguments section, so they are always worth reading. I am confused why you are 'stacking' the data like this. Instead, I would think something like this would give you what you want (note that 'data' is a really bad variable name to use, as you are masking the data() function - it is useful to type a possible variable name at an R prompt first to see if a function pops up): Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), genenames = geneNames(gs.rma), geneid = 1:14010) mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20) HTH, Jim > > Thank you for your help, Monnie > > Monnie McGee, Ph.D. Assistant Professor Department of Statistical > Science Southern Methodist University Ph: 214-768-2462 Fax: > 214-768-4035 > -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ADD COMMENT 0 Entering edit mode McGee, Monnie ▴ 300 @mcgee-monnie-1108 Last seen 7.1 years ago Dear Jim, Thanks very much. With a slight modification, your code worked. The modification was this: Instead of: Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), genenames = geneNames(gs.rma), geneid = 1:14010) mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20) I needed to use Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), genenames = geneNames(gs.rma), geneid = 1:14010, logged2=TRUE) mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20) Sam returns the following error without the "logged2" part in the data statement: perm= 1 Error in if (logged2) { : argument is of length zero I thought I was following the example given in the man pages perfectly, but, evidently, I wasn't. The part about y being an n-vector got away from me. And I got the variable name "data" from the example, by the way. Thanks for the tip on masking R functions. Best & Thanks again, Monnie Monnie McGee, Ph.D. Assistant Professor Department of Statistical Science Southern Methodist University Ph: 214-768-2462 Fax: 214-768-4035 -----Original Message----- From: James W. MacDonald [mailto:jmacdon@med.umich.edu] Sent: Thu 2/16/2006 1:13 PM To: McGee, Monnie Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Inherits(x,"data.frame") error in SamR Hi Monnie, McGee, Monnie wrote: > Dear Group, > > I trying to use samr. I have read a previous post about the ease of > use of siggenes vs. samr. It is so true! I used siggenes > originally, but that doesn't help me with the problem I am having. I > still need to use samr because I want to assess sample size using > sam.assess.samplesize. To assess sample size using sam, I need to > supply sam.assess.samplesize with a "data" vector. I can't > understand how to form this vector - perhaps a manual would help, but > the manual is not on the SAM website as the R-help files claim. I > am using a PowerMac G5 with R Version 2.2.1 (2005-12-20 r36812) > installed. > > I would like to use samr to assess the sample size requirements for > an experiment I am planning. I have some training data, which is > the drosophila spike-in experiment data given in Choe, S. E., > Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005). > Preferred analysis methods for Affymetrix GeneChips revealed by a > wholly defined control dataset. Genome Biology, 6, R16. > > Here is what I have done: gsbatch = ReadAffy() # the experiment > consists of 3 technical replicates from "control" chips # and 3 > technical replicates from Spike-in chips on th DrosGenome1 chip > >> gs.rma = rma(gsbatch) # get expression values > > ## get the exprSet into a format that samr can manage: > >> gs.rma.fr = as.data.frame.exprSet(gs.rma) gs.mat = >> matrixgs.rma.fr$exprs,nrow=14010,ncol=6) gs.mat.con = gs.mat[,1:3] >> gs.mat.si = gs.mat[,4:6] gs.mat.sam = rbind(gs.mat.con,gs.mat.si) >> ## this is a matrix with dim 28020 by 3, control arrays on top, >> spike-ins on bottom > > ## grouping vector > >> y = c(rep(1,14010),rep(2,14010)) geneid = >> as.character(1:nrow(gs.mat.sam)) genenames = >> gs.rma.fr\$genenames[1:14010] data = list(x=gs.mat.sam, y =y , >> geneid = geneid, genenames = rep(genenames,2),logged2=TRUE) >> samr(data,resp.type="Two class unpaired",nperms=20) > > Error in inherits(x, "data.frame") : (subscript) logical subscript > too long I also tried deleting the geneid & genenames vectors from > the "data" list, but still received the same error. > > I can't figure this out. I am sure the problem is in the way that I > defined the "data" list, but, without a manual, I really don't > understand what I did wrong. Ah, but there is a manual, or at least there are man pages! Terseness is the norm, so you have to be very careful when you read what is written, as the devil is often in the details. You have length(y) == 28020, whereas the man page for samr says: data: Data object with components x- p by n matrix of features, one observation per column (missing values allowed); y- n-vector of outcome measurements; censoring.status- n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome Note that the y vector is supposed to be an n-vector of outcome measurements whereas you have a p-vector of outcome measurements. Also note that 'x' is supposed to be a matrix, whereas you likely have a data.frame. Sometimes the term matrix is used to mean 'any rectangular arrangement of data', but a data.frame is in fact a list, so if the author really means matrix and doesn't have any error checking to coerce a data.frame to a matrix, an error may occur as well. I find the examples are often more enlightening than Arguments section, so they are always worth reading. I am confused why you are 'stacking' the data like this. Instead, I would think something like this would give you what you want (note that 'data' is a really bad variable name to use, as you are masking the data() function - it is useful to type a possible variable name at an R prompt first to see if a function pops up): Data <- list(x = as.matrix(exprs(gs.rma)), y = rep(1:2, each = 3), genenames = geneNames(gs.rma), geneid = 1:14010) mysamr <- samr(Data, resp.type = "Two class unpaired", nperms = 20) HTH, Jim > > Thank you for your help, Monnie > > Monnie McGee, Ph.D. Assistant Professor Department of Statistical > Science Southern Methodist University Ph: 214-768-2462 Fax: > 214-768-4035 > -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623