custom subset method / handling columns selection as logic in '...' parameter
0
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
Eric -- Please don't cross post Please simplify your example so that others do not have to work hard to understand what you are asking See additional comments below "Eric Lecoutre" <ericlecoutre at="" gmail.com=""> writes: > Dear R-helpers & bioconductor > > > Sorry for cross-posting, this concerns R-programming stuff applied on > Bioconductor context. > Also sorry for this long message, I try to be complete in my request. > > I am trying to write a subset method for a specific class (ExpressionSet > from Bioconductor) allowing selection more flexible than "[" method . > > The schema I am thinking for is the following: > > subset.ExpressionSet <- function(x,subset,...){ > > } ExpressionSet is an S4 class, using S4 methods, you will get into trouble mixing S3 (implied above) and S4. > I will use the subset argument for rows (genes), as in default method. > > Now I would like to allow to select different columns (features) based on > phenotypic data. > phenotypic data provides detailed information about the columns. columns of an ExpressionSet are samples / phenotyes, rows are features. > Basically, first function I have written allows the following: > >> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2) > # subset=NULL takes all rows > > See: there are two conditions on two variables belonging to the associated > data.frame encapsulated in the ExpressionSetObject (to be complete, the > conditions will be applied on more of 2 columns, as they are used on the > phylogenic data.frame that concerns all variables) 'phylogenic' is not part of the terminology; you are perhaps aiming for 'phenotypic'? > To simplify a little bit, this would nearly return: > ExpressionSetObject[,V1==value & V2==value] The usual idiom is exactly this; if e is an ExpressionSet instance > e[,e$V1==value1 & e$V2==value2] the '$' is defined to access the phenoData slot of the ExpressionSet. > This is nice as I can already handle any number of conditions on variables > values thanks to '...'. First step is > conditions <- list(...) and are then handled later in code > > Nevertheless, those conditions are basic (one value). > > I would like to handle arbitrary conditions, such as: V1 %in% c(value1, > value2) > More simple expression would be passed with V2==value instead of V2=value2 > > My very problem is that I don't know how to turn '...' into an object > containing those conditions that could be used later. I get confused here; can you clarify (this means 'make it simpler', not 'make it longer'). In the future, if this is where your question is, then it would have been appropriate to formulate it in such a way as to avoid involving ExpressionSet, and posting to the R mailing list. > My attempt which seems the nearest is: > >> foo <- function(...){ >> as.expression(substitute(list(...))) >> } >>foo(x==1,y%in%1:2) > expression(list(x == 1, y %in% 1:2)) > > where as I would like to have something like > list(expression(x==1), expression(y %in% 1:2)) > those expressions beeing evaluated later on in the context of my specific > object. > > > Are there any existing function where '...' are already handled the way I > want so that I can mimic? > > Thanks for any insight. > > > Eric > > --- > > For those who have Biobase available, here is my current subset function and > a demo-case that explains a little bit. > > > library(Biobase) > example(ExpressionSet) # create sample object > print(expressionSet) > > # now my subset function as it is > > subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){ > # subset is used to subset on rows > # ... is used to make multiple conditions on columns based on pData > # list of conditions is handled in ... > stopifnot(is(x,"ExpressionSet")) > phenoData <- pData(x) > listCriteria <- list(...) > if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x))) > subset <- subset & !is.na(subset) > retainedCriteria <- list() > tmp <- sapply(names(listCriteria), function(critname) { > if(!critname %in% colnames(phenoData)){ > if (verbose) cat("\n*** subsetCompounds: Dropped > criteria:",critname, "not in phenoData of object\n") > }else{ > if(is.null(listCriteria[critname])) listCriteria[[critname]]<- > unique(phenoData[,critname]) > retainedCriteria[[critname]] <<- phenoData[,critname] %in% > listCriteria[critname] > } > }) > criteriaValues <- do.call("cbind",retainedCriteria) > > selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)] > ## cbind(phenoData,criteriaValues) > out <- x[subset,selectedColumns] > if (verbose) cat('\n',length(selectedColumns),' columns selected > (',paste(selectedColumns,collapse=' '), > ')\n',sep='') > invisible(return(out)) > } > > # looking at phenotypic data associated with the sample expressionSet >> pData(expressionSet) > sex type score > A Female Control 0.75 > B Male Case 0.40 > C Male Control 0.73 > D Male Case 0.42 > E Female Case 0.93 > F Male Control 0.22 > G Male Case 0.96 > H Male Case 0.79 > I Female Case 0.37 > J Male Control 0.63 > K Male Case 0.26 > L Female Control 0.36 > M Male Case 0.41 > N Male Case 0.80 > O Female Case 0.10 > P Female Control 0.41 > Q Female Case 0.16 > R Male Control 0.72 > S Male Case 0.17 > T Female Case 0.74 > U Male Control 0.35 > V Female Control 0.77 > W Male Control 0.27 > X Male Control 0.98 > Y Female Case 0.94 > Z Female Case 0.32 > > > # now the sample use >> (subset1 =subset(expressionSet,sex="Male",type="Control")) > 7 columns selected (C F J R U W X) > ExpressionSet (storageMode: lockedEnvironment) > assayData: 500 features, 7 samples > element names: exprs, se.exprs > phenoData > sampleNames: C, F, ..., X (7 total) > varLabels and varMetadata description: > sex: Female/Male > type: Case/Control > score: Testing Score > featureData > featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total) > fvarLabels and fvarMetadata description: none > experimentData: use 'experimentData(object)' > Annotation: hgu95av2 > > > # what I would like to allow in use: > (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the == > instead of = > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
Cancer Biobase Cancer Biobase • 941 views
ADD COMMENT

Login before adding your answer.

Traffic: 925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6