Gene filtering

0

Entering edit mode

Heike Pospisil ▴ 310

@heike-pospisil-1097

Last seen 11.4 years ago

Dear list members, I have a very naiv question and I think it is really simple - but I didn't solve the problem :-( I have 79 Affymetrix chips and performed justRMA() first. >data Expression Set (exprSet) with 54675 genes 79 samples phenoData object with 3 variables and 79 cases varLabels cyto: read from file vivo: read from file vitro: read from file Now, I want to filter out those genes showing no significant expression change (decrease or increase). I used the genefilter function: ff1<-ttest(data.frame(data),.001,na.rm=TRUE) ff2<-filterfun(ff1) wh2<-genefilter(exprs(eset), ff2) Unfortunately, I've got the following message: "Error: cannot allocate vector of size 168723 Kb" Do I have a chance to filter out my genes that way? Or is it necessary to split the expression set? Any hint? Thanks a lot in advance, Heike -- Dr. Heike Pospisil Center for Bioinformatics, University of Hamburg Bundesstrasse 43, 20146 Hamburg, Germany phone: +49-40-42838-7303 fax: +49-40-42838-7312

genefilter genefilter • 1.6k views

ADD COMMENT • link updated 20.9 years ago by Adaikalavan Ramasamy ★ 1.8k • written 20.9 years ago by Heike Pospisil ▴ 310

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 11.4 years ago

I think justRMA() uses nearly all the memory you have access to, so it it only able to handle small computations afterwards. What I would suggest is try saving the exprSet and exit. Then start from a fresh R session and do your analysis from that. See below. On Fri, 2005-02-11 at 10:06 +0100, Heike Pospisil wrote: > Dear list members, > > I have a very naiv question and I think it is really simple - but I > didn't solve the problem :-( > > I have 79 Affymetrix chips and performed justRMA() first. > > >data > Expression Set (exprSet) with > 54675 genes > 79 samples > phenoData object with 3 variables and 79 cases > varLabels > cyto: read from file > vivo: read from file > vitro: read from file To save the object and exit without saving workspace image, type in save(data, file="mydata.rda", compress=TRUE) quit("no") Start R and type in load("mydata.rda") and proceed with your analysis. > Now, I want to filter out those genes showing no significant expression > change (decrease or increase). I used the genefilter function: > > ff1<-ttest(data.frame(data),.001,na.rm=TRUE) > ff2<-filterfun(ff1) > wh2<-genefilter(exprs(eset), ff2) > > Unfortunately, I've got the following message: "Error: cannot allocate > vector of size 168723 Kb" > > Do I have a chance to filter out my genes that way? Or is it necessary > to split the expression set? Any hint? > > Thanks a lot in advance, > Heike >

ADD COMMENT • link 20.9 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Hello Adaikalavan >I think justRMA() uses nearly all the memory you have access to, so it >it only able to handle small computations afterwards. What I would >suggest is try saving the exprSet and exit. Then start from a fresh R >session and do your analysis from that. See below. > > Thanks for your suggestion. Saving and loading the exprSet work and help. But, unfortunately, my filter function do not work. ff1<-ttest(data,.001,na.rm=TRUE) ff2<-filterfun(ff1) wh2<-genefilter(exprs(data), ff2) No idea :-( Best wishes. Heike -- Dr. Heike Pospisil Center for Bioinformatics, University of Hamburg Bundesstrasse 43, 20146 Hamburg, Germany phone: +49-40-42838-7303 fax: +49-40-42838-7312

ADD REPLY • link 20.9 years ago Heike Pospisil ▴ 310

0

Entering edit mode

Hi, You really need to give us more details, "my xxx did not work", does not let anyone give you good advice. As someone just said on the R help list, "Imagine that the rest of us are not sitting looking at your computer screen" :-) Specific errors/warnings, and the commands issued (cut and paste, not what you think you did, but what you did) Robert On Feb 11, 2005, at 5:32 AM, Heike Pospisil wrote: > Hello Adaikalavan > >> I think justRMA() uses nearly all the memory you have access to, so it >> it only able to handle small computations afterwards. What I would >> suggest is try saving the exprSet and exit. Then start from a fresh R >> session and do your analysis from that. See below. >> > > Thanks for your suggestion. Saving and loading the exprSet work and > help. But, unfortunately, my filter function do not work. > > ff1<-ttest(data,.001,na.rm=TRUE) > ff2<-filterfun(ff1) > wh2<-genefilter(exprs(data), ff2) > > No idea :-( > > Best wishes. > Heike > > -- > Dr. Heike Pospisil > Center for Bioinformatics, University of Hamburg > Bundesstrasse 43, 20146 Hamburg, Germany > phone: +49-40-42838-7303 fax: +49-40-42838-7312 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > +--------------------------------------------------------------------- -- ----------------+ | Robert Gentleman phone: (206) 667-7700 | | Head, Program in Computational Biology fax: (206) 667-1319 | | Division of Public Health Sciences office: M2-B865 | | Fred Hutchinson Cancer Research Center | | email: rgentlem@fhcrc.org | +--------------------------------------------------------------------- -- ----------------+

ADD REPLY • link 20.9 years ago rgentleman ★ 5.5k

0

Entering edit mode

Heike Pospisil wrote: > Hello Adaikalavan > >> I think justRMA() uses nearly all the memory you have access to, so it >> it only able to handle small computations afterwards. What I would >> suggest is try saving the exprSet and exit. Then start from a fresh R >> session and do your analysis from that. See below. >> >> > > Thanks for your suggestion. Saving and loading the exprSet work and > help. But, unfortunately, my filter function do not work. > > ff1<-ttest(data,.001,na.rm=TRUE) > ff2<-filterfun(ff1) > wh2<-genefilter(exprs(data), ff2) > > No idea :-( > > Best wishes. > Heike > I think you are setting up ff1 incorrectly. As an example, let's say that your exprSet contains 10 samples, the first 5 are e.g., experimental, and the second 5 are control. Then you would set up ff1 like this: ff1 <- ttest(5, 0.001, na.rm = TRUE) -or- cl <- c(rep(1,5), rep(2,5)) ff1 <- ttest(cl, 0.001, na.rm = TRUE) The second method can be used if the samples are not contiguous (e.g., they are ordered exp, cont, exp, cont, etc). cl <- c(rep(c(1,2), 5) ff1 <- ttest(cl, 0.001, na.rm = TRUE) HTH, Jim -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109

ADD REPLY • link 20.9 years ago James W. MacDonald 68k

0

Entering edit mode

I never used genefilter and filterfun so I would not be able to advice on this and hope the suggestions below solves your problem. On a personal note, I just calculate and store the p-values/statistics directly. This is more efficient as * I can generate various lists of "differentially expressed" genes at different p-value cutoffs. This is often required by the biologists who might want a small and confident subset for biological validation and maybe a bigger subset for computation validation (e.g. pathway analysis) * Rank genes by p-values * Adjust p-values for multiple hypothesis Here is one way how you can do this mat <- matrix( rnorm(100000), nc=100 ) rownames( g <- rep(1:2, each=50) # e.g. 50 normal and 50 tumour stats <- t( apply( mat, 1, function(z) { x <- z[ which( g==1 ) ] y <- z[ which( g==2 ) ] t.p <- t.test(x, y)$p.value w.p <- wilcox.test(x, y)$p.value fc <- mean(x, na.rm=T) - mean(y, na.rm=T) return( c(t.pval=t.p, wilcox.pval=w.p, fold.change=fc) ) })) On Fri, 2005-02-11 at 10:08 -0500, James W. MacDonald wrote: > Heike Pospisil wrote: > > Hello Adaikalavan > > > >> I think justRMA() uses nearly all the memory you have access to, so it > >> it only able to handle small computations afterwards. What I would > >> suggest is try saving the exprSet and exit. Then start from a fresh R > >> session and do your analysis from that. See below. > >> > >> > > > > Thanks for your suggestion. Saving and loading the exprSet work and > > help. But, unfortunately, my filter function do not work. > > > > ff1<-ttest(data,.001,na.rm=TRUE) > > ff2<-filterfun(ff1) > > wh2<-genefilter(exprs(data), ff2) > > > > No idea :-( > > > > Best wishes. > > Heike > > > I think you are setting up ff1 incorrectly. As an example, let's say > that your exprSet contains 10 samples, the first 5 are e.g., > experimental, and the second 5 are control. Then you would set up ff1 > like this: > > ff1 <- ttest(5, 0.001, na.rm = TRUE) > > -or- > > cl <- c(rep(1,5), rep(2,5)) > ff1 <- ttest(cl, 0.001, na.rm = TRUE) > > The second method can be used if the samples are not contiguous (e.g., > they are ordered exp, cont, exp, cont, etc). > > cl <- c(rep(c(1,2), 5) > ff1 <- ttest(cl, 0.001, na.rm = TRUE) > > HTH, > > Jim > > >

ADD REPLY • link 20.9 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

[[Ignore the previous mail. I hit the send button too soon]] I never used genefilter and filterfun so I would not be able to advice on this and hope the suggestions below solves your problem. On a personal note, I just calculate and store the p-values/statistics directly. This may be more efficient for the following reasons 1) to generate various lists of interesting genes at different p-value cutoffs. This is often required by the biologists who might want a high confidence subset for biological validation and maybe a broader subset for computation validation (e.g. pathway analysis) 2) to rank genes by p-values 3) to adjust p-values for multiple hypothesis Here is one way how you can do this : mat <- matrix( rnorm(100000, sd=5), nc=100 ) rownames(mat) <- paste("g", 1:1000, sep="") # replace 'mat' with your exprs(data) g <- rep(1:2, each=50) # Class information e.g. 50 normal and 50 tumour # again replace this with your own groups stats <- t( apply( mat, 1, function(z) { x <- z[ which( g==1 ) ] y <- z[ which( g==2 ) ] t.p <- t.test(x, y)$p.value w.p <- wilcox.test(x, y)$p.value fc <- mean(x, na.rm=T) - mean(y, na.rm=T) return( c(t.pval=t.p, wilcox.pval=w.p, fold.change=fc) ) })) # You can modify the above to include further tests etc. Hopefully you can get something compact like the following (Note : your results will vary due to random number generation). t.pval wilcox.pval fold.change g1 0.2376890 0.2655573 1.0214440 g2 0.1513874 0.2931174 -1.2895703 g3 0.4788188 0.5014898 -0.7349789 g4 0.2021780 0.1302305 1.3201382 g5 0.2537569 0.2655573 -1.1256882 g6 0.5881588 0.7020112 -0.5907285 .. ......... ......... .......... Now, you can generate various lists such as list1 <- names( which( stats[ , "t.pval"] < 0.05 ) ) list2 <- names( which( stats[ , "fold.change"] > 1 ) ) intersect( list1, list2 ) I guess this is probably a matter of taste. Hope this helps. Regards, Adai On Fri, 2005-02-11 at 10:08 -0500, James W. MacDonald wrote: > Heike Pospisil wrote: > > Hello Adaikalavan > > > >> I think justRMA() uses nearly all the memory you have access to, so it > >> it only able to handle small computations afterwards. What I would > >> suggest is try saving the exprSet and exit. Then start from a fresh R > >> session and do your analysis from that. See below. > >> > >> > > > > Thanks for your suggestion. Saving and loading the exprSet work and > > help. But, unfortunately, my filter function do not work. > > > > ff1<-ttest(data,.001,na.rm=TRUE) > > ff2<-filterfun(ff1) > > wh2<-genefilter(exprs(data), ff2) > > > > No idea :-( > > > > Best wishes. > > Heike > > > I think you are setting up ff1 incorrectly. As an example, let's say > that your exprSet contains 10 samples, the first 5 are e.g., > experimental, and the second 5 are control. Then you would set up ff1 > like this: > > ff1 <- ttest(5, 0.001, na.rm = TRUE) > > -or- > > cl <- c(rep(1,5), rep(2,5)) > ff1 <- ttest(cl, 0.001, na.rm = TRUE) > > The second method can be used if the samples are not contiguous (e.g., > they are ordered exp, cont, exp, cont, etc). > > cl <- c(rep(c(1,2), 5) > ff1 <- ttest(cl, 0.001, na.rm = TRUE) > > HTH, > > Jim > > >

ADD REPLY • link 20.9 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Hello Adai, thanks a lot for your help. It works very well! And once again, sorry for the (first) uncertain information I gave. Best wishes, Heike Adaikalavan Ramasamy wrote: >[[Ignore the previous mail. I hit the send button too soon]] > >I never used genefilter and filterfun so I would not be able to advice >on this and hope the suggestions below solves your problem. On a >personal note, I just calculate and store the p-values/statistics >directly. This may be more efficient for the following reasons > >1) to generate various lists of interesting genes at different p-value >cutoffs. This is often required by the biologists who might want a high >confidence subset for biological validation and maybe a broader subset >for computation validation (e.g. pathway analysis) > >2) to rank genes by p-values > >3) to adjust p-values for multiple hypothesis > > >Here is one way how you can do this : > >mat <- matrix( rnorm(100000, sd=5), nc=100 ) >rownames(mat) <- paste("g", 1:1000, sep="") ># replace 'mat' with your exprs(data) > >g <- rep(1:2, each=50) ># Class information e.g. 50 normal and 50 tumour ># again replace this with your own groups > >stats <- t( apply( mat, 1, function(z) { > x <- z[ which( g==1 ) ] > y <- z[ which( g==2 ) ] > > t.p <- t.test(x, y)$p.value > w.p <- wilcox.test(x, y)$p.value > fc <- mean(x, na.rm=T) - mean(y, na.rm=T) > return( c(t.pval=t.p, wilcox.pval=w.p, fold.change=fc) ) > })) ># You can modify the above to include further tests etc. > > >Hopefully you can get something compact like the following (Note : your >results will vary due to random number generation). > > t.pval wilcox.pval fold.change >g1 0.2376890 0.2655573 1.0214440 >g2 0.1513874 0.2931174 -1.2895703 >g3 0.4788188 0.5014898 -0.7349789 >g4 0.2021780 0.1302305 1.3201382 >g5 0.2537569 0.2655573 -1.1256882 >g6 0.5881588 0.7020112 -0.5907285 >.. ......... ......... .......... > > >Now, you can generate various lists such as > >list1 <- names( which( stats[ , "t.pval"] < 0.05 ) ) >list2 <- names( which( stats[ , "fold.change"] > 1 ) ) >intersect( list1, list2 ) > >I guess this is probably a matter of taste. Hope this helps. > >Regards, Adai > > > >On Fri, 2005-02-11 at 10:08 -0500, James W. MacDonald wrote: > > >>Heike Pospisil wrote: >> >> >>>Hello Adaikalavan >>> >>> >>> >>>>I think justRMA() uses nearly all the memory you have access to, so it >>>>it only able to handle small computations afterwards. What I would >>>>suggest is try saving the exprSet and exit. Then start from a fresh R >>>>session and do your analysis from that. See below. >>>> >>>> >>>> >>>> >>>Thanks for your suggestion. Saving and loading the exprSet work and >>>help. But, unfortunately, my filter function do not work. >>> >>>ff1<-ttest(data,.001,na.rm=TRUE) >>>ff2<-filterfun(ff1) >>>wh2<-genefilter(exprs(data), ff2) >>> >>>No idea :-( >>> >>>Best wishes. >>>Heike >>> >>> >>> >>I think you are setting up ff1 incorrectly. As an example, let's say >>that your exprSet contains 10 samples, the first 5 are e.g., >>experimental, and the second 5 are control. Then you would set up ff1 >>like this: >> >>ff1 <- ttest(5, 0.001, na.rm = TRUE) >> >>-or- >> >>cl <- c(rep(1,5), rep(2,5)) >>ff1 <- ttest(cl, 0.001, na.rm = TRUE) >> >>The second method can be used if the samples are not contiguous (e.g., >>they are ordered exp, cont, exp, cont, etc). >> >>cl <- c(rep(c(1,2), 5) >>ff1 <- ttest(cl, 0.001, na.rm = TRUE) >> >>HTH, >> >>Jim >> >> >> >> >> > > > > -- Dr. Heike Pospisil Center for Bioinformatics, University of Hamburg Bundesstrasse 43, 20146 Hamburg, Germany phone: +49-40-42838-7303 fax: +49-40-42838-7312

ADD REPLY • link 20.9 years ago Heike Pospisil ▴ 310

Login before adding your answer.