Options for spatial normalization? (Oliver Homann)
0
0
Entering edit mode
Last seen 17 months ago
United States
Hi Olivier, There is also nnNorm package that does spatial normalization within print-tips. It uses pseudo-spatial coordinates to avoid over-normalization, and the values of a given spot are not used when computing the amount of bias for the corresponding spot (via a cross-validation using neural nets). If your arrays does not show spatial artifacts nnNorm normalization will behave similarly with print-tip loess. Adi Laurentiu TARCA, Ph.D. Research Associate, NIH-Perinatology Research Branch, Wayne State University, 3990 John R., Detroit, Michigan 48201 Tel: 1-313-5775305 Cell: 1-313-4043116 http://vortex.cs.wayne.edu/tarca/ >[1] Are there any other methods for spatial normalization of two- color > data implemented in R? > [2] In my attempts to develop a normalization pipeline I have been > stymied by the need to ascertain on a slide-by-slide basis which types > of normalization are needed (e.g. pin/intensity/spatial). Do any > of you > have a "rule-of-thumb", or better yet a quantitative approach to > making > this decision? -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of bioconductor-request at stat.math.ethz.ch Sent: Thursday, March 08, 2007 6:00 AM To: bioconductor at stat.math.ethz.ch Subject: Bioconductor Digest, Vol 49, Issue 8 Send Bioconductor mailing list submissions to bioconductor at stat.math.ethz.ch To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at stat.math.ethz.ch You can reach the person managing the list at bioconductor-owner at stat.math.ethz.ch When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Re: crlmm warning (Morten Mattingsdal) 2. Re: crlmm warning (James W. MacDonald) 3. Re: crlmm warning (Benilton Carvalho) 4. Re: crlmm warning (Morten Mattingsdal) 5. Re: Unable to compile the impute package on debian with gcc 4.1 (Seth Falcon) 6. Re: rma on new samples (Hassane, Duane) 7. Re: crlmm warning (Benilton Carvalho) 8. NEwbie: How to determine significant enrichment differences of GO term vectors? (Johannes Graumann) 9. Re: rma on new samples (Kuhn, Max) 10. Options for spatial normalization? (Oliver Homann) 11. how to enlarge memory (Yihuan Xu) 12. Re: how to enlarge memory (James W. MacDonald) 13. Re: Options for spatial normalization? (Jay Konieczka) 14. RMA, RefRMA questions (James Anderson) 15. Re: RMA, RefRMA questions (Kuhn, Max) 16. Re: RMA, RefRMA questions (James W. MacDonald) 17. Re: RMA, RefRMA questions (Kuhn, Max) 18. two questions about limma (cont.2) (De-Jian,ZHAO) ---------------------------------------------------------------------- Message: 1 Date: Wed, 07 Mar 2007 13:10:59 +0100 From: Morten Mattingsdal <mortenm@inbox.com> Subject: Re: [BioC] crlmm warning To: Morten Mattingsdal <mortenm at="" inbox.com=""> Cc: BioC <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <45EEABD3.70208 at inbox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello again, Ill just answer that myself, since I posted a bit too prematurely. Seems like I need to use the makePlatformDesign package, with the code: > library(makePlatformDesign) > makePDpackage("Mapping50K_Xba240.CDF","Mapping50K_Xba240_probe_fasta", "M apping50K_Xba240_annot.csv",type="SNP") R CMD INSTALL pdmapping50kxba240 I find it ... non-trivial.. to locate the fasta file for Nsp and Sty arrays, but I think Ill harass Affyemtrix Inc for that. I am aware that the oligo package is in development, but It would be nice to have some vignettes to read regards morten Morten Mattingsdal wrote: > Hello everyone, > > Ive managed to get the oligo package and the crlmm function up and > running. Ive also installed the meta-data libraries from this URL > http://www.biostat.jhsph.edu/~bcarvalh/oligoAddOns.tar.gz > provided by Benliton > > but when I run the commands: > > >files <- list.celfiles() > >snpFSet <- read.celfiles(files) > > > >Welcome to the pd.Mapping250K_Nsp prototype pdInfo package > >WARNING: DO NOT USE THIS PACKAGE FOR ANY ANALYSIS. > >THIS PACKAGE IS FOR INTERFACE PROTOTYPE USE ONLY! > >THE DATA HAS NOT BEEN VALIDATED AND LIKELY HAS ERRORS. > >Have fun! > > Im am happy that this information is warning me, but to my point: > - When will the "safe" mapping libraries come? > - Can I build this by myself ? > > I want to compare brlmm from Affymetrix and crlmm from BioC genotype > calls for my data > > regards > morten > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > . > > ------------------------------ Message: 2 Date: Wed, 07 Mar 2007 08:49:07 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] crlmm warning To: Morten Mattingsdal <mortenm at="" inbox.com=""> Cc: BioC <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <45EEC2D3.6000504 at med.umich.edu> Content-Type: text/plain; charset="utf-8"; format=flowed Morten Mattingsdal wrote: > > I find it ... non-trivial.. to locate the fasta file for Nsp and Sty > arrays, but I think Ill harass Affyemtrix Inc for that. > These files are available on the support page for the 500K snp arrays (near the bottom). http://www.affymetrix.com/support/technical/byproduct.affx?product=500 k -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues. ------------------------------ Message: 3 Date: Wed, 7 Mar 2007 09:15:27 -0500 From: Benilton Carvalho <bcarvalh@jhsph.edu> Subject: Re: [BioC] crlmm warning To: Morten Mattingsdal <mortenm at="" inbox.com=""> Cc: BioC <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <efc72c58-d847-4952-b251-c92a73860ff5 at="" jhsph.edu=""> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Hi Morten, the 'safe' versions will become available with BioC 2.0. My question, regarding the email you send next (about using makePlatformDesign for SNP arrays), that's not required anymore. All you need is in that oligoAddOns.tar.gz. All you need to do is install the packages and use the latest oligo. Let me know how things go, b On Mar 7, 2007, at 5:45 AM, Morten Mattingsdal wrote: > Hello everyone, > > Ive managed to get the oligo package and the crlmm function up and > running. Ive also installed the meta-data libraries from this URL > http://www.biostat.jhsph.edu/~bcarvalh/oligoAddOns.tar.gz > provided by Benliton > > but when I run the commands: > >> files <- list.celfiles() >> snpFSet <- read.celfiles(files) >> >> Welcome to the pd.Mapping250K_Nsp prototype pdInfo package >> WARNING: DO NOT USE THIS PACKAGE FOR ANY ANALYSIS. >> THIS PACKAGE IS FOR INTERFACE PROTOTYPE USE ONLY! >> THE DATA HAS NOT BEEN VALIDATED AND LIKELY HAS ERRORS. >> Have fun! > > Im am happy that this information is warning me, but to my point: > - When will the "safe" mapping libraries come? > - Can I build this by myself ? > > I want to compare brlmm from Affymetrix and crlmm from BioC genotype > calls for my data > > regards > morten > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor ------------------------------ Message: 4 Date: Wed, 07 Mar 2007 15:55:24 +0100 From: Morten Mattingsdal <mortenm@inbox.com> Subject: Re: [BioC] crlmm warning To: Benilton Carvalho <bcarvalh at="" jhsph.edu=""> Cc: BioC <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <45EED25C.8020007 at inbox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Benilton, Readning the data, loading annotation libraries and rma normalizting goes perfect! But I encounter an error in crlmm, which will probably expose my ignorance here, but crlmm complains about a "correctionFile is not found". >crlmm_NSP=crlmm(rma_NSP) >Error in crlmm(rma_NSP) : Provide correctionFile. >If the correctionFile is not found, it will be created and it will contain the EM results. The error claims, if not found it will be created. I don't think it is created, although I have an R object called "reference" The crlmm doc says: The 'correction' argument is a list with the following elements: 'f0' (scalar), 'fs' (numeric vector), 'pis' (numeric matrix) and 'snr'. I cant seem to figure out the nature of these correction elements nor the data format of this file. Could you be so kind and explain a bit what this means ? regards morten NB Ill just paste all commands and output for your leisure >library(oligo) >files <- list.celfiles() >files() [1] "1580_Nsp1_090207.CEL" "1620_Nsp1_020307.CEL" "1736_Nsp1_020307.CEL" [4] "1812_Nsp1_020307.CEL" "355_Nsp1_090207.CEL" "4379_Nsp1_020307.CEL" [7] "4436_Nsp1_020307.CEL" "5968_Nsp1_020307.CEL" "635_Nsp1_090207.CEL" [10] "654_Nsp1_090207.CEL" "659_Nsp1_090207.CEL" "680_Nsp1_090207.CEL" >NSP <- read.celfiles(files) Incompatible phenoData object. Created a new one. Welcome to the pd.Mapping250K_Nsp prototype pdInfo package WARNING: DO NOT USE THIS PACKAGE FOR ANY ANALYSIS. THIS PACKAGE IS FOR INTERFACE PROTOTYPE USE ONLY! THE DATA HAS NOT BEEN VALIDATED AND LIKELY HAS ERRORS. Have fun! Platform design info loaded. >rma_NSP <- snprma(NSP) Position -4 Position -2 Position -1 Position 0 Position 1 Position 3 Position 4 Loading required package: pd.mapping250k.nsp.crlmm.regions Calculating Expression >crlmm_NSP=crlmm(rma_NSP) Error in crlmm(rma_NSP) : Provide correctionFile. If the correctionFile is not found, it will be created and it will contain the EM results. > sessionInfo() R version 2.5.0 Under development (unstable) (2007-03-04 r40813) x86_64-unknown-linux-gnu locale: LC_CTYPE=no_NO;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_ME SS AGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT =C ;LC_IDENTIFICATION=C attached base packages: [1] "splines" "tools" "stats" "graphics" "grDevices" "utils" [7] "datasets" "methods" "base" other attached packages: pd.mapping250k.nsp.crlmm.regions pd.mapping250k.nsp "0.1.0" "0.1.5" geneplotter lattice "1.13.7" "0.14-16" annotate oligo "1.13.6" "0.99.82" BufferedMatrixMethods BufferedMatrix "0.1.1" "0.1.27" RSQLite DBI "0.4-20" "0.1-12" affyio Biobase "1.3.3" "1.13.39" Benilton Carvalho wrote: > Hi Morten, > > the 'safe' versions will become available with BioC 2.0. > > My question, regarding the email you send next (about using > makePlatformDesign for SNP arrays), that's not required anymore. All > you need is in that oligoAddOns.tar.gz. All you need to do is install > the packages and use the latest oligo. > > Let me know how things go, > > b > > On Mar 7, 2007, at 5:45 AM, Morten Mattingsdal wrote: > >> Hello everyone, >> >> Ive managed to get the oligo package and the crlmm function up and >> running. Ive also installed the meta-data libraries from this URL >> http://www.biostat.jhsph.edu/~bcarvalh/oligoAddOns.tar.gz >> provided by Benliton >> >> but when I run the commands: >> >>> files <- list.celfiles() >>> snpFSet <- read.celfiles(files) >>> >>> Welcome to the pd.Mapping250K_Nsp prototype pdInfo package >>> WARNING: DO NOT USE THIS PACKAGE FOR ANY ANALYSIS. >>> THIS PACKAGE IS FOR INTERFACE PROTOTYPE USE ONLY! >>> THE DATA HAS NOT BEEN VALIDATED AND LIKELY HAS ERRORS. >>> Have fun! >> >> Im am happy that this information is warning me, but to my point: >> - When will the "safe" mapping libraries come? >> - Can I build this by myself ? >> >> I want to compare brlmm from Affymetrix and crlmm from BioC genotype >> calls for my data >> >> regards >> morten >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > . > ------------------------------ Message: 5 Date: Wed, 07 Mar 2007 07:14:42 -0800 From: Seth Falcon <sfalcon@fhcrc.org> Subject: Re: [BioC] Unable to compile the impute package on debian with gcc 4.1 To: Sebastian Bauer <sebastian.bauer at="" charite.de=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <m2wt1t2ihp.fsf at="" ziti.local=""> Content-Type: text/plain; charset=us-ascii Sebastian Bauer <sebastian.bauer at="" charite.de=""> writes: > Hi Seth, > > Seth Falcon wrote: >> We are seeing the same issues on our build systems. This is almost >> certainly due to gfortran being more strict than g77. > > Yes, you're right. When compiling the file manually with g77 it runs > through. Is there any possibility to alter R to take g77 instead the > gfortran compiler? Well, yes. But I suspect you would want all of R to be using same fortran and that the way to do it is to rebuild R from source setting a config option. The R Admin manual might have some useful details. Before you recompile, I suppose it wouldn't hurt to edit R's Makeconf file located in R_HOME/etc/Makeconf. >> I have sent the impute package maintainer an email asking that he take >> a look. And I've heard back that he has a patch that should be applied this week. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org ------------------------------ Message: 6 Date: Wed, 7 Mar 2007 10:26:18 -0500 From: "Hassane, Duane" <duane_hassane@urmc.rochester.edu> Subject: Re: [BioC] rma on new samples To: <malick.paye at="" eu.biomerieux.com="">, <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <f66fedf8d4a99b499577086f6c2e4ffc0187c56a at="" e2k3ms3.urmc-="" sh.rochester.edu=""> Content-Type: text/plain; charset="iso-8859-1" Malick, A few months back, Chris Harbron posted a reply to a related question in which the RefPlus package was suggested for this purpose. Best, Duane _________________________________________________ Duane Hassane, Ph.D. Center for Pediatric Biomedical Research University of Rochester Medical Center Rochester, New York 14642 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Malick.PAYE at eu.biomerieux.com Sent: Monday, March 05, 2007 1:47 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] rma on new samples hello, I work for an invitro diagnostic company www.biomerieux.com) and we are interested in classification of patients based on expression profile (we are working with affymetrix chips). I built a classification model based on a training set and i have new samples and i want to make my new samples comparable with the training set in order to apply my built model. We use RMA to compute expression measures. If someone have a code to do this, it would be very greatful for me. Ideally i want to extract rma parameters and apply them to my new samples, or if someone have a better idea. Thanks in advance. M.P Malick Paye | bioM?rieux | Biomathematician Phone: (+33)4 78 87 70 97 | Fax: (+33)4 78 87 53 40 [Centre Cristophe M?rieux, 5 Rue des Berges, 38004 Cedex 01 Grenoble, France] ---------------------------------------------------------------------- -- ----- AVIS : Ce courrier et ses pieces jointes sont destines a leur seul destinataire et peuvent contenir des informations confidentielles appartenant a bioMerieux. Si vous n'etes pas destinataire, vous etes informe que toute lecture, divulgation, ou reproduction de ce message et des pieces jointes est strictement interdite. Si vous avez recu ce message par erreur merci d'en prevenir l'expediteur et de le detruire, ainsi que ses pieces jointes. NOTICE: This message and attachments are intended only for the use of their addressee and may contain confidential information belonging to bioMerieux. If you are not the intended recipient, you are hereby notified that any reading, dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately and delete this message, along with any attachments. [[alternative HTML version deleted]] ------------------------------ Message: 7 Date: Wed, 7 Mar 2007 11:16:31 -0500 From: Benilton Carvalho <bcarvalh@jhsph.edu> Subject: Re: [BioC] crlmm warning To: Morten Mattingsdal <mortenm at="" inbox.com=""> Cc: BioC <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <25D76B7F-2D68-4057-8762-15B337546A78 at jhsph.edu> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Hi Morten, I'm writing a vignette to help users with oligo. My bad! Sorry for that. All you need to do is give a file name... if the file does not exist, it'll be created... if it exists, it'll be loaded. For example, try the following: crlmm_NSP=crlmm(rma_NSP, correctionFile="nspCorrection.rda") The reason this is required (at least for now) is because the EM algorithm may take a long time depending on the sample size. So, once it is done, it saves the results in this correctionFile... which you can just load later in case you want to run CRLMM again. If for some reason you need to run CRLMM on the exact same data (rma_NSP), by using crlmm_NSP=crlmm(rma_NSP, correctionFile="nspCorrection.rda") the EM step will be skipped and loaded from nspCorrection.rda instead. b On Mar 7, 2007, at 9:55 AM, Morten Mattingsdal wrote: > Hi Benilton, > > Readning the data, loading annotation libraries and rma > normalizting goes perfect! > But I encounter an error in crlmm, which will probably expose my > ignorance here, but crlmm complains about a "correctionFile is not > found". > > >crlmm_NSP=crlmm(rma_NSP) > >Error in crlmm(rma_NSP) : Provide correctionFile. > >If the correctionFile is not found, it will be created and it will > contain the EM results. > > The error claims, if not found it will be created. I don't think it > is created, although I have an R object called "reference" > > The crlmm doc says: The 'correction' argument is a list with the > following elements: > 'f0' (scalar), 'fs' (numeric vector), 'pis' (numeric matrix) and > 'snr'. > > I cant seem to figure out the nature of these correction elements > nor the data format of this file. > Could you be so kind and explain a bit what this means ? > > regards > morten > > > NB Ill just paste all commands and output for your leisure > > >library(oligo) > > >files <- list.celfiles() > > >files() > [1] "1580_Nsp1_090207.CEL" "1620_Nsp1_020307.CEL" > "1736_Nsp1_020307.CEL" > [4] "1812_Nsp1_020307.CEL" "355_Nsp1_090207.CEL" > "4379_Nsp1_020307.CEL" > [7] "4436_Nsp1_020307.CEL" "5968_Nsp1_020307.CEL" > "635_Nsp1_090207.CEL" > [10] "654_Nsp1_090207.CEL" "659_Nsp1_090207.CEL" > "680_Nsp1_090207.CEL" > > >NSP <- read.celfiles(files) > Incompatible phenoData object. Created a new one. > > > Welcome to the pd.Mapping250K_Nsp prototype pdInfo package > WARNING: DO NOT USE THIS PACKAGE FOR ANY ANALYSIS. > THIS PACKAGE IS FOR INTERFACE PROTOTYPE USE ONLY! > THE DATA HAS NOT BEEN VALIDATED AND LIKELY HAS ERRORS. > Have fun! > > Platform design info loaded. > > >rma_NSP <- snprma(NSP) > Position -4 > Position -2 > Position -1 > Position 0 > Position 1 > Position 3 > Position 4 > Loading required package: pd.mapping250k.nsp.crlmm.regions > Calculating Expression > > >crlmm_NSP=crlmm(rma_NSP) > Error in crlmm(rma_NSP) : Provide correctionFile. > If the correctionFile is not found, it will be created and it will > contain the EM results. > > > > sessionInfo() > R version 2.5.0 Under development (unstable) (2007-03-04 r40813) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=no_NO;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_ME > SSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREME > NT=C;LC_IDENTIFICATION=C > > attached base packages: > [1] "splines" "tools" "stats" "graphics" "grDevices" > "utils" [7] "datasets" "methods" "base" > other attached packages: > pd.mapping250k.nsp.crlmm.regions pd.mapping250k.nsp > "0.1.0" "0.1.5" > geneplotter lattice > "1.13.7" "0.14-16" > annotate oligo > "1.13.6" "0.99.82" > BufferedMatrixMethods BufferedMatrix > "0.1.1" "0.1.27" > RSQLite DBI > "0.4-20" "0.1-12" > affyio Biobase > "1.3.3" "1.13.39" > > Benilton Carvalho wrote: >> Hi Morten, >> >> the 'safe' versions will become available with BioC 2.0. >> >> My question, regarding the email you send next (about using >> makePlatformDesign for SNP arrays), that's not required anymore. >> All you need is in that oligoAddOns.tar.gz. All you need to do is >> install the packages and use the latest oligo. >> >> Let me know how things go, >> >> b >> >> On Mar 7, 2007, at 5:45 AM, Morten Mattingsdal wrote: >> >>> Hello everyone, >>> >>> Ive managed to get the oligo package and the crlmm function up and >>> running. Ive also installed the meta-data libraries from this URL >>> http://www.biostat.jhsph.edu/~bcarvalh/oligoAddOns.tar.gz >>> provided by Benliton >>> >>> but when I run the commands: >>> >>>> files <- list.celfiles() >>>> snpFSet <- read.celfiles(files) >>>> >>>> Welcome to the pd.Mapping250K_Nsp prototype pdInfo package >>>> WARNING: DO NOT USE THIS PACKAGE FOR ANY ANALYSIS. >>>> THIS PACKAGE IS FOR INTERFACE PROTOTYPE USE ONLY! >>>> THE DATA HAS NOT BEEN VALIDATED AND LIKELY HAS ERRORS. >>>> Have fun! >>> >>> Im am happy that this information is warning me, but to my point: >>> - When will the "safe" mapping libraries come? >>> - Can I build this by myself ? >>> >>> I want to compare brlmm from Affymetrix and crlmm from BioC genotype >>> calls for my data >>> >>> regards >>> morten >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/ >>> gmane.science.biology.informatics.conductor >> >> >> . >> ------------------------------ Message: 8 Date: Wed, 07 Mar 2007 17:19:28 +0100 From: Johannes Graumann <johannes_graumann@web.de> Subject: [BioC] NEwbie: How to determine significant enrichment differences of GO term vectors? To: bioconductor at stat.math.ethz.ch Message-ID: <esmomh$lfo$1 at="" sea.gmane.org=""> Content-Type: text/plain; charset=us-ascii Hello, Please excuse this naive question, but I would appreciate if someone could point me at the right function(s) to use: I have two vectors containing all GO terms associated with proteins retrieved in two proteomic experiments and would like to figure out for which categories they differ significantly from each other. I am obviously somewhat limited by the fact of not being able to use the 'standard' annotation packages, but I have build my own protein -> GenBank -> GO package using AnnBuilder. Please let me know how you would tackle this. Thanks for your patience, Joh ------------------------------ Message: 9 Date: Wed, 7 Mar 2007 13:05:19 -0500 From: "Kuhn, Max" <max.kuhn@pfizer.com> Subject: Re: [BioC] rma on new samples To: <malick.paye at="" eu.biomerieux.com="">, <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <71257D09F114DA4A8E134DEAC70F25D307B70F0C at groamrexm03.amer.pfizer.com> Content-Type: text/plain; charset="iso-8859-1" There are a few ways to do this. The basic issue is that the RMA normalization and summarization steps use data across multiple chips. There is a package called refRMA that normalizes to a pre-defined database of data generated by GeneLogic. See http://www.biomedcentral.com/content/pdf/1471-2105-7-464.pdf for more details. I have approached the problem by keeping the background correction the same and then - normalize the PM values of new samples to a reference distribution defined by the training set PM values - computing a trimmed mean to get the summary measure I'm sure that others have done something similar too. I have code to do this (the normalization is based on some code from limma:::normalizeQuantiles). I will send you (or anyone else) the code in another email if you are interested. I've looked at comparisons in performance in my approach, regular RMA and MAS5 and found that the two RMA methods are pretty similar and MAS5 did very poorly. This could have been due to my particular problem and data (I was using the Staph chip), so take that for what it's worth. Max -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Malick.PAYE at eu.biomerieux.com Sent: Monday, March 05, 2007 1:47 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] rma on new samples hello, I work for an invitro diagnostic company www.biomerieux.com) and we are interested in classification of patients based on expression profile (we are working with affymetrix chips). I built a classification model based on a training set and i have new samples and i want to make my new samples comparable with the training set in order to apply my built model. We use RMA to compute expression measures. If someone have a code to do this, it would be very greatful for me. Ideally i want to extract rma parameters and apply them to my new samples, or if someone have a better idea. Thanks in advance. M.P Malick Paye | bioM?rieux | Biomathematician Phone: (+33)4 78 87 70 97 | Fax: (+33)4 78 87 53 40 [Centre Cristophe M?rieux, 5 Rue des Berges, 38004 Cedex 01 Grenoble, France] ---------------------------------------------------------------------- -- ----- AVIS : Ce courrier et ses pieces jointes sont destines a leur seul destinataire et peuvent contenir des informations confidentielles appartenant a bioMerieux. Si vous n'etes pas destinataire, vous etes informe que toute lecture, divulgation, ou reproduction de ce message et des pieces jointes est strictement interdite. Si vous avez recu ce message par erreur merci d'en prevenir l'expediteur et de le detruire, ainsi que ses pieces jointes. NOTICE: This message and attachments are intended only for the use of their addressee and may contain confidential information belonging to bioMerieux. If you are not the intended recipient, you are hereby notified that any reading, dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately and delete this message, along with any attachments. [[alternative HTML version deleted]] ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} ------------------------------ Message: 10 Date: Wed, 07 Mar 2007 10:47:03 -0800 From: "Oliver Homann" <oliver.homann@ucsf.edu> Subject: [BioC] Options for spatial normalization? To: bioconductor at stat.math.ethz.ch Message-ID: <45EF08A7.7040604 at ucsf.edu> Content-Type: text/plain; charset=iso-8859-1; format=flowed Hello, I was wondering if anyone could offer me some advice on the best approach for normalizing my two-color expression arrays. I will be processing a large number of arrays, and ideally I would like to develop a semi-automated normalization pipeline. Some of my arrays have issues with spatial effects, and currently the only method that I'm aware of for dealing with such effects is in the Maanova package (the "rlowess" method of transform.madata). However, this method is far from ideal for my purposes because it utilizes grid layout rather than 'X' and 'Y' positions to calculate proximity (which causes some problems with gaps between blocks) and because it is coupled to a intensity-based normalization (which limits the flexibility somewhat). I have a few specific questions: [1] Are there any other methods for spatial normalization of two-color data implemented in R? [2] In my attempts to develop a normalization pipeline I have been stymied by the need to ascertain on a slide-by-slide basis which types of normalization are needed (e.g. pin/intensity/spatial). Do any of you have a "rule-of-thumb", or better yet a quantitative approach to making this decision? Thanks! Oliver Homann ------------------------------ Message: 11 Date: Wed, 7 Mar 2007 13:55:39 -0500 From: "Yihuan Xu" <yihuan.xu@jefferson.edu> Subject: [BioC] how to enlarge memory To: <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <004501c760ea$33029710$1504a60a at XU> Content-Type: text/plain; charset="iso-8859-1" Hi, There, How can I enlarge memory when I use Affy package? Thanks a lot. Yihuan ------------------------------ Message: 12 Date: Wed, 07 Mar 2007 14:43:09 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] how to enlarge memory To: Yihuan Xu <yihuan.xu at="" jefferson.edu=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <45EF15CD.10001 at med.umich.edu> Content-Type: text/plain; charset="utf-8"; format=flowed Yihuan Xu wrote: > Hi, There, > > How can I enlarge memory when I use Affy package? Thanks a lot. This question has been asked and answered many many many times on this list, so searching first would get you the answer right away rather than ansking and waiting. See here: > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor In addition, without giving us any information about your OS, version of R, etc., this question is almost unanswerable. See here: http://www.bioconductor.org/docs/postingGuide.html > > Yihuan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues. ------------------------------ Message: 13 Date: Wed, 7 Mar 2007 12:45:41 -0700 From: Jay Konieczka <jayk@u.arizona.edu> Subject: Re: [BioC] Options for spatial normalization? To: "Oliver Homann" <oliver.homann at="" ucsf.edu=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <97D697B0-521E-460A-9FB5-FF0E46029EFE at u.arizona.edu> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Hi Olivier, Take a look at the OLIN package. It takes xy coordinates and uses a machine learning approach to approximate the smoothing parameters for spatial and intensity normalization. I have the same issue and I've had a great deal of success with it. I wouldn't recommend bypassing the slide-by-slide oversight, but you may find a set of parameters supplied to OLIN that is sufficient for the overwhelming majority of your chips. Cheers, jay On Mar 7, 2007, at 11:47 AM, Oliver Homann wrote: > Hello, > > I was wondering if anyone could offer me some advice on the best > approach for normalizing my two-color expression arrays. I will be > processing a large number of arrays, and ideally I would like to > develop > a semi-automated normalization pipeline. Some of my arrays have > issues > with spatial effects, and currently the only method that I'm aware of > for dealing with such effects is in the Maanova package (the "rlowess" > method of transform.madata). However, this method is far from > ideal for > my purposes because it utilizes grid layout rather than 'X' and 'Y' > positions to calculate proximity (which causes some problems with gaps > between blocks) and because it is coupled to a intensity-based > normalization (which limits the flexibility somewhat). > > I have a few specific questions: > [1] Are there any other methods for spatial normalization of two- color > data implemented in R? > [2] In my attempts to develop a normalization pipeline I have been > stymied by the need to ascertain on a slide-by-slide basis which types > of normalization are needed (e.g. pin/intensity/spatial). Do any > of you > have a "rule-of-thumb", or better yet a quantitative approach to > making > this decision? > > Thanks! > Oliver Homann > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor ------------------------------ Message: 14 Date: Wed, 7 Mar 2007 12:09:54 -0800 (PST) From: James Anderson <janderson_net@yahoo.com> Subject: [BioC] RMA, RefRMA questions To: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <614088.55974.qm at web43139.mail.sp1.yahoo.com> Content-Type: text/plain Hi, I roughly understand the issue of RMA and RefRMA. When using RefRMA, one RMA model is generated first based on the samples from one lab. If some additional arrays measured in a different lab needs to be normalized, does RefRMA automatically take care of the systematic difference between the two labs or except RefRMA, there are still some extra work needs to be done to correct the systematic difference between the two labs? Thanks, James --------------------------------- The fish are biting. [[alternative HTML version deleted]] ------------------------------ Message: 15 Date: Wed, 7 Mar 2007 15:26:00 -0500 From: "Kuhn, Max" <max.kuhn@pfizer.com> Subject: Re: [BioC] RMA, RefRMA questions <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <71257D09F114DA4A8E134DEAC70F25D307B711C8 at groamrexm03.amer.pfizer.com> Content-Type: text/plain; charset="us-ascii" James, >From the presentation that I saw on refRMA (and the paper), the quantile normalization process would coerce the distribution of the low-level probe data to have the same shape as the "reference" data. In GeneLogic's case, they used sets of biological samples from their large data base to define this reference distribution. Assuming that the reference distribution used is acceptable for your samples/problem, this should remove many of the systematic effects that you may have in your data. After googling, it seems that the package is only available form the authors at this time, so I can't be exact. As I mentioned earlier today, I have similar code that I'd be willing to share. It works very similar to the affy RMA functions. Max -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James Anderson Sent: Wednesday, March 07, 2007 3:10 PM To: bioconductor Subject: [BioC] RMA, RefRMA questions Hi, I roughly understand the issue of RMA and RefRMA. When using RefRMA, one RMA model is generated first based on the samples from one lab. If some additional arrays measured in a different lab needs to be normalized, does RefRMA automatically take care of the systematic difference between the two labs or except RefRMA, there are still some extra work needs to be done to correct the systematic difference between the two labs? Thanks, James --------------------------------- The fish are biting. [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} ------------------------------ Message: 16 Date: Wed, 07 Mar 2007 16:09:20 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] RMA, RefRMA questions To: "Kuhn, Max" <max.kuhn at="" pfizer.com=""> Cc: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <45EF2A00.6030900 at med.umich.edu> Content-Type: text/plain; charset="utf-8"; format=flowed Kuhn, Max wrote: > After googling, it seems that the package is only available form the > authors at this time, so I can't be exact. This package is part of BioC devel: http://bioconductor.org/packages/2.0/bioc/html/RefPlus.html It can be installed directly using biocLite() if you are running R-2.5.0devel. Best, Jim -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues. ------------------------------ Message: 17 Date: Wed, 7 Mar 2007 16:26:38 -0500 From: "Kuhn, Max" <max.kuhn@pfizer.com> Subject: Re: [BioC] RMA, RefRMA questions To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> Cc: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <71257D09F114DA4A8E134DEAC70F25D307B712F4 at groamrexm03.amer.pfizer.com> Content-Type: text/plain; charset="us-ascii" So there is no refRMA method. They are called RMA+ and RMA++. I can't wait for RMA#. Thanks, Max -----Original Message----- From: James W. MacDonald [mailto:jmacdon@med.umich.edu] Sent: Wednesday, March 07, 2007 4:09 PM To: Kuhn, Max Cc: James Anderson; bioconductor Subject: Re: [BioC] RMA, RefRMA questions Kuhn, Max wrote: > After googling, it seems that the package is only available form the > authors at this time, so I can't be exact. This package is part of BioC devel: http://bioconductor.org/packages/2.0/bioc/html/RefPlus.html It can be installed directly using biocLite() if you are running R-2.5.0devel. Best, Jim -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues. ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} ------------------------------ Message: 18 Date: Thu, 8 Mar 2007 12:19:51 +0800 (CST) From: "De-Jian,ZHAO" <zhaodj@ioz.ac.cn> Subject: [BioC] two questions about limma (cont.2) To: bioconductor at stat.math.ethz.ch Message-ID: <2070.159.226.67.50.1173327591.squirrel at mail.ioz.ac.cn> Content-Type: text/plain;charset=gb2312 Dear members, Thanks to you for your attention to my questions. Special thanks to Dr. Smyth, the author and maintainer of limma package, for his detailed answer. However,questions remain. The two questions first appeared in Bioconductor Digest, Vol 49, Issue 7 on Mar 7, 2007 . -------Question 1: About NaNs after backgroundCorrect------------ I checked the data before and after correction. The components ("R","G","Rb" and "Gb") of RGList are all positive. The dispersed spots at low intensites before backgroundCorrect (plotMA(RG)) shrink to a line or a cluster after backgroundCorrect and normalizeWithinArrays. NaNs occur in log(x) right during the backgroundCorrect step using method "normexp". Therefore I investigated the function backgroundCorrect() and the method normexp. They are as follows: >RG.b<-backgroundCorrect(RG,method="normexp",offset=0) > backgroundCorrect function (RG, method = "subtract", offset = 0, printer = RG$printer, verbose = TRUE) { if (is.null(RG$Rb) != is.null(RG$Gb)) stop("Background values exist for one channel but not the other") method <- match.arg(method, c("none", "subtract", "half", "minimum", "movingmin", "edwards", "normexp", "rma")) if (is.null(RG$Rb) && is.null(RG$Gb)) method <- "none" switch(method, subtract = { RG$R <- RG$R - RG$Rb RG$G <- RG$G - RG$Gb }, half = { RG$R <- pmax(RG$R - RG$Rb, 0.5) RG$G <- pmax(RG$G - RG$Gb, 0.5) }, minimum = { RG$R <- as.matrix(RG$R - RG$Rb) RG$G <- as.matrix(RG$G - RG$Gb) for (slide in 1:ncol(RG$R)) { i <- RG$R[, slide] < 1e-18 if (any(i, na.rm = TRUE)) { m <- min(RG$R[!i, slide], na.rm = TRUE) RG$R[i, slide] <- m/2 } i <- RG$G[, slide] < 1e-18 if (any(i, na.rm = TRUE)) { m <- min(RG$G[!i, slide], na.rm = TRUE) RG$G[i, slide] <- m/2 } } }, movingmin = { RG$R <- RG$R - ma3x3.spottedarray(RG$Rb, printer = printer, FUN = min, na.rm = TRUE) RG$G <- RG$G - ma3x3.spottedarray(RG$Gb, printer = printer, FUN = min, na.rm = TRUE) }, edwards = { one <- matrix(1, NROW(RG$R), 1) delta.vec <- function(d, f = 0.1) { quantile(d, mean(d < 1e-16, na.rm = TRUE) * (1 + f), na.rm = TRUE) } sub <- as.matrix(RG$R - RG$Rb) delta <- one %*% apply(sub, 2, delta.vec) RG$R <- ifelse(sub < delta, delta * exp(1 - (RG$Rb + delta)/RG$R), sub) sub <- as.matrix(RG$G - RG$Gb) delta <- one %*% apply(sub, 2, delta.vec) RG$G <- ifelse(sub < delta, delta * exp(1 - (RG$Gb + delta)/RG$G), sub) }, normexp = { for (j in 1:ncol(RG$R)) { x <- RG$G[, j] - RG$Gb[, j] out <- normexp.fit(x) RG$G[, j] <- normexp.signal(out$par, x) x <- RG$R[, j] - RG$Rb[, j] out <- normexp.fit(x) RG$R[, j] <- normexp.signal(out$par, x) if (verbose) cat("Corrected array", j, "\n") } }, rma = { require("affy") RG$R <- apply(RG$R - RG$Rb, 2, bg.adjust) RG$G <- apply(RG$G - RG$Gb, 2, bg.adjust) }) RG$Rb <- NULL RG$Gb <- NULL if (offset) { RG$R <- RG$R + offset RG$G <- RG$G + offset } new("RGList", unclass(RG)) } <environment: namespace:limma=""> The method normexp is within the function backgroundCorrect. It is excerpted from the function and pasted here. normexp = { for (j in 1:ncol(RG$R)) { x <- RG$G[, j] - RG$Gb[, j] out <- normexp.fit(x) RG$G[, j] <- normexp.signal(out$par, x) x <- RG$R[, j] - RG$Rb[, j] out <- normexp.fit(x) RG$R[, j] <- normexp.signal(out$par, x) if (verbose) cat("Corrected array", j, "\n") } }, Then I modified the excerpted code and ran the code block as follows: j=1 # j is from 1 to ncol(RG$R). Manually run the loop. x <- RG$G[, j] - RG$Gb[, j] out <- normexp.fit(x) RG$G[, j] <- normexp.signal(out$par, x) x <- RG$R[, j] - RG$Rb[, j] out <- normexp.fit(x) RG$R[, j] <- normexp.signal(out$par, x) I found that the warnings of NaNs occur following the code "out <- normexp.fit(x)".The output of this code block follows below. > j=1 > x <- RG$G[, j] - RG$Gb[, j] > out <- normexp.fit(x) > RG$G[, j] <- normexp.signal(out$par, x) > x <- RG$R[, j] - RG$Rb[, j] > out <- normexp.fit(x) Warning message: <<<<<<<<<<<<<<<<<<<<<<-----Warning! Produced NaNs in: log(x) > RG$R[, j] <- normexp.signal(out$par, x) > out $par [1] 105.927694 4.874661 8.145515$m2loglik [1] 352762.2 $convergence [1] 0 Then I tried to trace the origin of the warning by running the function normexp.fit step by step. I packed the if-else clause into a customized function myfunq1234().The modified normexp.fit for step-by-step execution is after the one embedded in the limma package. The code halts in the middle and the error message points to something beyond my knowledge. # normexp.fit Embedded in limma > normexp.fit function (x, trace = FALSE) { isna <- is.na(x) if (any(isna)) x <- x[!isna] if (length(x) < 4) stop("Not enough data: need at least 4 non-missing corrected intensities") q <- quantile(x, c(0, 0.05, 0.1, 1), na.rm = TRUE, names = FALSE) if (q[1] == q[4]) return(list(beta = q[1], sigma = 1, alpha = 1, m2loglik = NA, convergence = 0)) if (q[2] > q[1]) { beta <- q[2] } else { if (q[3] > q[1]) { beta <- q[3] } else { beta <- q[1] + 0.05 * (q[4] - q[1]) } } sigma <- sqrt(mean((x[x < beta] - beta)^2, na.rm = TRUE)) alpha <- mean(x, na.rm = TRUE) - beta if (alpha <= 0) alpha <- 1e-06 Results <- optim(par = c(beta, log(sigma), log(alpha)), fn = normexp.m2loglik, control = list(trace = as.integer(trace)), x = x) list(par = Results$par, m2loglik = Results$value, convergence = Results$convergence) } <environment: namespace:limma=""> # Modified normexp.fit isna <- is.na(x) if (any(isna)) x <- x[!isna] if (length(x) < 4) stop("Not enough data: need at least 4 non-missing corrected intensities") q <- quantile(x, c(0, 0.05, 0.1, 1), na.rm = TRUE, names = FALSE) if (q[1] == q[4]) return(list(beta = q[1], sigma = 1, alpha = 1, m2loglik = NA, convergence = 0)) myfunq1234<-function(q1,q2,q3,q4){ if (q2 > q1) { beta <- q2 } else { if (q3 > q1) { beta <- q3 } else { beta <- q1 + 0.05 * (q4 - q1) } } } myfunq1234(q[1],q[2],q[3],q[4]) sigma <- sqrt(mean((x[x < beta] - beta)^2, na.rm = TRUE)) <<<<--Error ocurs hereafter! alpha <- mean(x, na.rm = TRUE) - beta if (alpha <= 0) alpha <- 1e-06 Results <- optim(par = c(beta, log(sigma), log(alpha)), fn = normexp.m2loglik, control = list(trace = as.integer(trace)), x = x) list(par = Results$par, m2loglik = Results$value, convergence = Results$convergence) Based on the fact that all RG$R, RG$Rb, RG$G and RG\$Gb are positive, I think the doubt shed upon the microarray data may be removed. I wonder whether anyone else has reported this warning. -------Question 2: Output of Results-------- The topTable() can output the average logFC easily, but it cannot select differentially expressed genes based upon the combination of p value and logFC. The decideTests() can easily output the differentially expressed genes based upon the combination of p value and logFC, but it cannot output average logFC. Is there a function that combines the two advantages? Thanks! ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 49, Issue 8