Search
Question: Generating random gene lists: does sample/resample generate random sets
0
gravatar for Scott Ochsner
10.1 years ago by
Scott Ochsner300
Scott Ochsner300 wrote:
Dear BioC, I would like feedback as to the appropriateness of the following procedure to produce a set of 1000 random gene lists, each list of length 2000. The idea is to use the set of random gene lists to assess how often random gene lists of size x can reproduce or improve the classification performance of myCuratedList. #remove myCuratedList from the universe of possible genes. The "eset" object is your standard ExpressionSet object. >length(myCuratedList) [1] 2000 >Index<-setdiff(1:length(rownames(exprs(eset))),myCuratedList) >length(Index) [1] 20277 #generate 1000 random gene lists using the genes in Index. The code for resample is taken from the help pages for sample. >randomMatrix<-replicate(1000,resample(index,2000)) >dim(randomMatrix) [1] 2000 1000 I've verified that each column does not contain repeated genes as should be the case with resample without replacement. Is there a standard procedure for doing the above or is what I've done kosher? Scott A. Ochsner, Ph.D. NURSA Bioinformatics Molecular and Cellular Biology Baylor College of Medicine Houston, TX. 77030 phone: 713-798-6227
ADD COMMENTlink modified 10.1 years ago by Thomas Hampton740 • written 10.1 years ago by Scott Ochsner300
0
gravatar for Scott Ochsner
10.1 years ago by
Scott Ochsner300
Scott Ochsner300 wrote:
Sorry, Below is my sessionInfo() > sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] MLInterfaces_1.14.1 annotate_1.18.0 xtable_1.5-2 AnnotationDbi_1.2.1 RSQLite_0.6-8 DBI_0.2-4 [7] rda_1.0 rpart_3.1-41 genefilter_1.20.0 survival_2.34-1 MASS_7.2-41 affy_1.18.1 [13] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.1 loaded via a namespace (and not attached): [1] class_7.2-41 Scott -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Ochsner, Scott A Sent: Wednesday, September 10, 2008 3:03 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Generating random gene lists: does sample/resample generaterandom sets Dear BioC, I would like feedback as to the appropriateness of the following procedure to produce a set of 1000 random gene lists, each list of length 2000. The idea is to use the set of random gene lists to assess how often random gene lists of size x can reproduce or improve the classification performance of myCuratedList. #remove myCuratedList from the universe of possible genes. The "eset" object is your standard ExpressionSet object. >length(myCuratedList) [1] 2000 >Index<-setdiff(1:length(rownames(exprs(eset))),myCuratedList) >length(Index) [1] 20277 #generate 1000 random gene lists using the genes in Index. The code for resample is taken from the help pages for sample. >randomMatrix<-replicate(1000,resample(index,2000)) >dim(randomMatrix) [1] 2000 1000 I've verified that each column does not contain repeated genes as should be the case with resample without replacement. Is there a standard procedure for doing the above or is what I've done kosher? Scott A. Ochsner, Ph.D. NURSA Bioinformatics Molecular and Cellular Biology Baylor College of Medicine Houston, TX. 77030 phone: 713-798-6227 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 10.1 years ago by Scott Ochsner300
0
gravatar for Thomas Hampton
10.1 years ago by
Thomas Hampton740 wrote:
I would not have taken the curated list out. That strikes me as a significant bias. Am I missing something? Tom On Sep 10, 2008, at 4:03 PM, Ochsner, Scott A wrote: > Dear BioC, > > I would like feedback as to the appropriateness of the following > procedure to produce a set of 1000 random gene lists, each list of > length 2000. The idea is to use the set of random gene lists to > assess how often random gene lists of size x can reproduce or > improve the classification performance of > myCuratedList. > > > #remove myCuratedList from the universe of possible genes. The > "eset" object is your standard ExpressionSet object. >> length(myCuratedList) > [1] 2000 >> Index<-setdiff(1:length(rownames(exprs(eset))),myCuratedList) >> length(Index) > [1] 20277 > #generate 1000 random gene lists using the genes in Index. The > code for resample is taken from the help pages for sample. > >> randomMatrix<-replicate(1000,resample(index,2000)) >> dim(randomMatrix) > [1] 2000 1000 > > > I've verified that each column does not contain repeated genes as > should be the case with resample without replacement. > > Is there a standard procedure for doing the above or is what I've > done kosher? > > > Scott A. Ochsner, Ph.D. > NURSA Bioinformatics > Molecular and Cellular Biology > Baylor College of Medicine > Houston, TX. 77030 > phone: 713-798-6227 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD COMMENTlink written 10.1 years ago by Thomas Hampton740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 460 users visited in the last hour