Filtering genes using a list
7
0
Entering edit mode
@mcolosimbrandeisedu-880
Last seen 9.7 years ago
This probably is a general R question, but I couldn't find anything useful. I found all sort of stuff on how to filter using functions based on the values within the matrix, but nothing like this. I have a list of genes in a file that I want to look at, how can I filter my matrix of genes to match the ones in the list? gene_list.tab with 250 genes: probe{tab}description affy_blah1{tab}affy gene of interest 1 affy_blah2{tab}affy gene of interest 2 .. dim(my.metric) [1] 22625 11 mmfun <- function() # to filter ffun <- filterfun(mmfun) my.fmetric <- genefilter(my.metric,ffun) dim(my.fmetric) ## This should give 250 and 11
• 3.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
I would use the %in% function. This assumes that your matrix of gene values has the gene names appended somehow (row.names, or the first column). Since you are doing affy stuff, the easiest way is to use the exprSet holding your data. index <- gene_list.tab[,1] %in% geneNames(eset) -or- index <- gene_list.tab[,1] %in% row.names(my.metric) Then subset using the index. subset.data <- my.metric[index,] HTH, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <mcolosim@brandeis.edu> 08/18/04 11:29AM >>> This probably is a general R question, but I couldn't find anything useful. I found all sort of stuff on how to filter using functions based on the values within the matrix, but nothing like this. I have a list of genes in a file that I want to look at, how can I filter my matrix of genes to match the ones in the list? gene_list.tab with 250 genes: probe{tab}description affy_blah1{tab}affy gene of interest 1 affy_blah2{tab}affy gene of interest 2 .. dim(my.metric) [1] 22625 11 mmfun <- function() # to filter ffun <- filterfun(mmfun) my.fmetric <- genefilter(my.metric,ffun) dim(my.fmetric) ## This should give 250 and 11 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
@mcolosimbrandeisedu-880
Last seen 9.7 years ago
Jim, Thanks for the hint about %in%, where did you find this function? I couldn't find any thing about it. Also, it works the other way: index <- geneNames(eset) %in% gene_list.tab[,1] Marc Quoting James MacDonald <jmacdon@med.umich.edu>: > I would use the %in% function. This assumes that your matrix of gene > values has the gene names appended somehow (row.names, or the first > column). Since you are doing affy stuff, the easiest way is to use the > exprSet holding your data. > > index <- gene_list.tab[,1] %in% geneNames(eset) > -or- > index <- gene_list.tab[,1] %in% row.names(my.metric) > > Then subset using the index. > > subset.data <- my.metric[index,] > > > >>> <mcolosim@brandeis.edu> 08/18/04 11:29AM >>> > This probably is a general R question, but I couldn't find anything > useful. I > found all sort of stuff on how to filter using functions based on the > values > within the matrix, but nothing like this. > > I have a list of genes in a file that I want to look at, how can I > filter my > matrix of genes to match the ones in the list? > > gene_list.tab with 250 genes: > probe{tab}description > affy_blah1{tab}affy gene of interest 1 > affy_blah2{tab}affy gene of interest 2 > .. > > dim(my.metric) > [1] 22625 11 > > mmfun <- function() # to filter > ffun <- filterfun(mmfun) > my.fmetric <- genefilter(my.metric,ffun) > dim(my.fmetric) ## This should give 250 and 11 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT
0
Entering edit mode
A.J. Rossini ▴ 810
@aj-rossini-209
Last seen 9.7 years ago
mcolosim@brandeis.edu writes: > Jim, > > Thanks for the hint about %in%, where did you find this function? I couldn't > find any thing about it. ?%in% should provide the help page. It does for me (though under Emacs). (i.e. help("%in%") ) Interesting that help.search("in") isn't too useful (in fact, it seems to miss it). best, -tony -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
ADD COMMENT
0
Entering edit mode
Quoting "A.J. Rossini" <rossini@blindglobe.net>: > mcolosim@brandeis.edu writes: > > > Jim, > > > > Thanks for the hint about %in%, where did you find this function? I > couldn't > > find any thing about it. > > ?%in% > > should provide the help page. It does for me (though under Emacs). > > (i.e. help("%in%") ) > > Interesting that help.search("in") isn't too useful (in fact, it seems > to miss it). > > I'm using an old version of R (1.8.1) and ?%in% doesn't work. However, help("%in%") does. I know it is time up update everything. Marc
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Marc, I think finding the function you are looking for is usually more of an art than a science. I have no idea how I found %in%, but my usual method for finding functions that I think probably exist goes like this: 1.) help.search("something that I think might be a reasonable name for the function") 2.) google it to within an inch of its life ;-D. Usually I prepend an R on the google search to possibly limit the results to actual R functions. There are also search pages on www.r-project.org and www.bioconductor.org that will search the mail list archives. 3.) Look at code for functions that I already know might do something similar and see how they do it. By this time I have usually found what I am looking for, plus a bunch of other stuff that may come in handy in the future. However, if I still am hitting a wall, I ask on either the BioC or R-help listserv. Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <mcolosim@brandeis.edu> 08/18/04 01:25PM >>> Jim, Thanks for the hint about %in%, where did you find this function? I couldn't find any thing about it. Also, it works the other way: index <- geneNames(eset) %in% gene_list.tab[,1] Marc Quoting James MacDonald <jmacdon@med.umich.edu>: > I would use the %in% function. This assumes that your matrix of gene > values has the gene names appended somehow (row.names, or the first > column). Since you are doing affy stuff, the easiest way is to use the > exprSet holding your data. > > index <- gene_list.tab[,1] %in% geneNames(eset) > -or- > index <- gene_list.tab[,1] %in% row.names(my.metric) > > Then subset using the index. > > subset.data <- my.metric[index,] > > > >>> <mcolosim@brandeis.edu> 08/18/04 11:29AM >>> > This probably is a general R question, but I couldn't find anything > useful. I > found all sort of stuff on how to filter using functions based on the > values > within the matrix, but nothing like this. > > I have a list of genes in a file that I want to look at, how can I > filter my > matrix of genes to match the ones in the list? > > gene_list.tab with 250 genes: > probe{tab}description > affy_blah1{tab}affy gene of interest 1 > affy_blah2{tab}affy gene of interest 2 > .. > > dim(my.metric) > [1] 22625 11 > > mmfun <- function() # to filter > ffun <- filterfun(mmfun) > my.fmetric <- genefilter(my.metric,ffun) > dim(my.fmetric) ## This should give 250 and 11 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
S Peri ▴ 320
@s-peri-835
Last seen 9.7 years ago
Dear group, I did SAM, T-test analyses and obtained p-values. Now, these files look like these: T-test Values: > Gli_X0_X1_pvals[1:5] 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at 0.80033009 0.31943016 0.33078591 0.05216239 0.08957325 Fold change(Avg. Diff): > MyBrain_X0_X1_Exp_FCs[1:5] 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at 1.0176023 0.9274588 0.8752550 1.1056984 1.1096023 My question: Using annotation package how can I convert the probe ID's to Gene names. how do i incorporate gene name in place of 100_g_at? 2. How can I choose/filter P-values from T-test that are less than 0.01 to 0 ? 3. How can write the values into a table with 3 colnames: Gene, P-value, Fold change I am doing this for first time. Please help me. Thank you. Regards, PS
ADD COMMENT
0
Entering edit mode
See comments below. On Wed, 2004-08-18 at 18:54, S Peri wrote: > Dear group, > > I did SAM, T-test analyses and obtained p-values. Now, > these files look like these: > > T-test Values: > > > Gli_X0_X1_pvals[1:5] > 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at > > 0.80033009 0.31943016 0.33078591 0.05216239 0.08957325 > > > > Fold change(Avg. Diff): > > MyBrain_X0_X1_Exp_FCs[1:5] > 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at > 1.0176023 0.9274588 0.8752550 1.1056984 1.1096023 > > > My question: > > Using annotation package how can I convert the probe > ID's to Gene names. how do i incorporate gene name in > place of 100_g_at? > There are annotation packages in BioConductor. But if you want a quick and dirty solution, get the CDF file from affymetrix and merge it in excel. These will have all the information you need but may be slightly outdated. Can anyone on the list comment the merits of doing this versus using the BioConductor annotation package ? > 2. How can I choose/filter P-values from T-test that > are less than 0.01 to 0 ? You can write this yourself with a few ifelse(), which(), subset() commands. > 3. How can write the values into a table with 3 > colnames: > Gene, P-value, Fold change mat <- cbind( genename, pvalue, foldchange ) write.table(mat, file="aaa.txt", sep="\t", quote=FALSE) > I am doing this for first time. Please help me. > > Thank you. > > Regards, > PS > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
On Aug 19, 2004, at 6:36 AM, Adaikalavan Ramasamy wrote: > See comments below. > > On Wed, 2004-08-18 at 18:54, S Peri wrote: >> >> Using annotation package how can I convert the probe >> ID's to Gene names. how do i incorporate gene name in >> place of 100_g_at? >> > > There are annotation packages in BioConductor. But if you want a > quick and dirty solution, get the CDF file from affymetrix and > merge it in excel. These will have all the information you need > but may be slightly outdated. > > Can anyone on the list comment the merits of doing this > versus using the BioConductor annotation package ? > The annotation packages contain much more information than just the gene name (like gene ontology, homologous genes, etc.). If one has a vector of affy IDs from, for example, the hgu95av2 chip, getting the gene symbol is as simple as: getSYMBOL(myaffyids,"hgu95av2") See ?getSYMBOL for more help on getting ids of various types from the affy identifiers. Also, many of the other packages (GOstats, ontoTools, etc.) make extensive use of the annotation packages, so, while the "quick and dirty" approach will give you simple information, it does pay off if one is going to do post-processing of results in R to learn how to use the annotation packages. If all one needs is a gene name, either way works (but I still think the annotation package is a more robust solution). Sean
ADD REPLY
0
Entering edit mode
Dear group, I have list of genes (say ~120) from a pathway. Can I use 'genefilter' functions OR any other function to pick (only those I need for my pathway) fold-change values, p-value and LocuID from the output table that I created using write.table. Thank you all for your valuble suggestion for my previous query about annotate package and writing the output to a table (REF:Annotate Package: How do I get the gene names and how do I write my matrix). I could make things work on my desk. It was my mistake to iterate over element again and again even after using a 'for' loop. Eg: for (i on x){ y <- do something i = i+1 } i realized later that i =i+1 is not needed. Thank you PS
ADD REPLY
0
Entering edit mode
A.J. Rossini ▴ 810
@aj-rossini-209
Last seen 9.7 years ago
mcolosim@brandeis.edu writes: > Quoting "A.J. Rossini" <rossini@blindglobe.net>: > >> mcolosim@brandeis.edu writes: >> >> > Jim, >> > >> > Thanks for the hint about %in%, where did you find this function? I >> couldn't >> > find any thing about it. >> >> ?%in% >> >> should provide the help page. It does for me (though under Emacs). >> >> (i.e. help("%in%") ) >> >> Interesting that help.search("in") isn't too useful (in fact, it seems >> to miss it). >> >> > > I'm using an old version of R (1.8.1) and ?%in% doesn't work. However, > help("%in%") does. I know it is time up update everything. ?"%in%" might be the right thing. Emacs takes care of that for me. best, -tony -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
?%in% won't work with any version of R (except it appears to work under Emacs - more Rossini magic, I assume?). At the R prompt you have to use ?"%in% Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <mcolosim@brandeis.edu> 08/18/04 01:58PM >>> Quoting "A.J. Rossini" <rossini@blindglobe.net>: > mcolosim@brandeis.edu writes: > > > Jim, > > > > Thanks for the hint about %in%, where did you find this function? I > couldn't > > find any thing about it. > > ?%in% > > should provide the help page. It does for me (though under Emacs). > > (i.e. help("%in%") ) > > Interesting that help.search("in") isn't too useful (in fact, it seems > to miss it). > > I'm using an old version of R (1.8.1) and ?%in% doesn't work. However, help("%in%") does. I know it is time up update everything. Marc _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT

Login before adding your answer.

Traffic: 527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6