Filtering genes using a list

0

Entering edit mode

mcolosim@brandeis.edu ▴ 70

@mcolosimbrandeisedu-880

Last seen 11.3 years ago

This probably is a general R question, but I couldn't find anything useful. I found all sort of stuff on how to filter using functions based on the values within the matrix, but nothing like this. I have a list of genes in a file that I want to look at, how can I filter my matrix of genes to match the ones in the list? gene_list.tab with 250 genes: probe{tab}description affy_blah1{tab}affy gene of interest 1 affy_blah2{tab}affy gene of interest 2 .. dim(my.metric) [1] 22625 11 mmfun <- function() # to filter ffun <- filterfun(mmfun) my.fmetric <- genefilter(my.metric,ffun) dim(my.fmetric) ## This should give 250 and 11

• 5.0k views

ADD COMMENT • link updated 21.4 years ago by James W. MacDonald 68k • written 21.4 years ago by mcolosim@brandeis.edu ▴ 70

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 days ago

United States

I would use the %in% function. This assumes that your matrix of gene values has the gene names appended somehow (row.names, or the first column). Since you are doing affy stuff, the easiest way is to use the exprSet holding your data. index <- gene_list.tab[,1] %in% geneNames(eset) -or- index <- gene_list.tab[,1] %in% row.names(my.metric) Then subset using the index. subset.data <- my.metric[index,] HTH, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <mcolosim@brandeis.edu> 08/18/04 11:29AM >>> This probably is a general R question, but I couldn't find anything useful. I found all sort of stuff on how to filter using functions based on the values within the matrix, but nothing like this. I have a list of genes in a file that I want to look at, how can I filter my matrix of genes to match the ones in the list? gene_list.tab with 250 genes: probe{tab}description affy_blah1{tab}affy gene of interest 1 affy_blah2{tab}affy gene of interest 2 .. dim(my.metric) [1] 22625 11 mmfun <- function() # to filter ffun <- filterfun(mmfun) my.fmetric <- genefilter(my.metric,ffun) dim(my.fmetric) ## This should give 250 and 11 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.4 years ago James W. MacDonald 68k

0

Entering edit mode

mcolosim@brandeis.edu ▴ 70

@mcolosimbrandeisedu-880

Last seen 11.3 years ago

Jim, Thanks for the hint about %in%, where did you find this function? I couldn't find any thing about it. Also, it works the other way: index <- geneNames(eset) %in% gene_list.tab[,1] Marc Quoting James MacDonald <jmacdon@med.umich.edu>: > I would use the %in% function. This assumes that your matrix of gene > values has the gene names appended somehow (row.names, or the first > column). Since you are doing affy stuff, the easiest way is to use the > exprSet holding your data. > > index <- gene_list.tab[,1] %in% geneNames(eset) > -or- > index <- gene_list.tab[,1] %in% row.names(my.metric) > > Then subset using the index. > > subset.data <- my.metric[index,] > > > >>> <mcolosim@brandeis.edu> 08/18/04 11:29AM >>> > This probably is a general R question, but I couldn't find anything > useful. I > found all sort of stuff on how to filter using functions based on the > values > within the matrix, but nothing like this. > > I have a list of genes in a file that I want to look at, how can I > filter my > matrix of genes to match the ones in the list? > > gene_list.tab with 250 genes: > probe{tab}description > affy_blah1{tab}affy gene of interest 1 > affy_blah2{tab}affy gene of interest 2 > .. > > dim(my.metric) > [1] 22625 11 > > mmfun <- function() # to filter > ffun <- filterfun(mmfun) > my.fmetric <- genefilter(my.metric,ffun) > dim(my.fmetric) ## This should give 250 and 11 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.4 years ago mcolosim@brandeis.edu ▴ 70

0

Entering edit mode

A.J. Rossini ▴ 810

@aj-rossini-209

Last seen 11.3 years ago

mcolosim@brandeis.edu writes: > Jim, > > Thanks for the hint about %in%, where did you find this function? I couldn't > find any thing about it. ?%in% should provide the help page. It does for me (though under Emacs). (i.e. help("%in%") ) Interesting that help.search("in") isn't too useful (in fact, it seems to miss it). best, -tony -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

ADD COMMENT • link 21.4 years ago A.J. Rossini ▴ 810

0

Entering edit mode

Quoting "A.J. Rossini" <rossini@blindglobe.net>: > mcolosim@brandeis.edu writes: > > > Jim, > > > > Thanks for the hint about %in%, where did you find this function? I > couldn't > > find any thing about it. > > ?%in% > > should provide the help page. It does for me (though under Emacs). > > (i.e. help("%in%") ) > > Interesting that help.search("in") isn't too useful (in fact, it seems > to miss it). > > I'm using an old version of R (1.8.1) and ?%in% doesn't work. However, help("%in%") does. I know it is time up update everything. Marc

ADD REPLY • link 21.4 years ago mcolosim@brandeis.edu ▴ 70

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 days ago

United States

Marc, I think finding the function you are looking for is usually more of an art than a science. I have no idea how I found %in%, but my usual method for finding functions that I think probably exist goes like this: 1.) help.search("something that I think might be a reasonable name for the function") 2.) google it to within an inch of its life ;-D. Usually I prepend an R on the google search to possibly limit the results to actual R functions. There are also search pages on www.r-project.org and www.bioconductor.org that will search the mail list archives. 3.) Look at code for functions that I already know might do something similar and see how they do it. By this time I have usually found what I am looking for, plus a bunch of other stuff that may come in handy in the future. However, if I still am hitting a wall, I ask on either the BioC or R-help listserv. Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <mcolosim@brandeis.edu> 08/18/04 01:25PM >>> Jim, Thanks for the hint about %in%, where did you find this function? I couldn't find any thing about it. Also, it works the other way: index <- geneNames(eset) %in% gene_list.tab[,1] Marc Quoting James MacDonald <jmacdon@med.umich.edu>: > I would use the %in% function. This assumes that your matrix of gene > values has the gene names appended somehow (row.names, or the first > column). Since you are doing affy stuff, the easiest way is to use the > exprSet holding your data. > > index <- gene_list.tab[,1] %in% geneNames(eset) > -or- > index <- gene_list.tab[,1] %in% row.names(my.metric) > > Then subset using the index. > > subset.data <- my.metric[index,] > > > >>> <mcolosim@brandeis.edu> 08/18/04 11:29AM >>> > This probably is a general R question, but I couldn't find anything > useful. I > found all sort of stuff on how to filter using functions based on the > values > within the matrix, but nothing like this. > > I have a list of genes in a file that I want to look at, how can I > filter my > matrix of genes to match the ones in the list? > > gene_list.tab with 250 genes: > probe{tab}description > affy_blah1{tab}affy gene of interest 1 > affy_blah2{tab}affy gene of interest 2 > .. > > dim(my.metric) > [1] 22625 11 > > mmfun <- function() # to filter > ffun <- filterfun(mmfun) > my.fmetric <- genefilter(my.metric,ffun) > dim(my.fmetric) ## This should give 250 and 11 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.4 years ago James W. MacDonald 68k

0

Entering edit mode

S Peri ▴ 320

@s-peri-835

Last seen 11.3 years ago

Dear group, I did SAM, T-test analyses and obtained p-values. Now, these files look like these: T-test Values: > Gli_X0_X1_pvals[1:5] 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at 0.80033009 0.31943016 0.33078591 0.05216239 0.08957325 Fold change(Avg. Diff): > MyBrain_X0_X1_Exp_FCs[1:5] 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at 1.0176023 0.9274588 0.8752550 1.1056984 1.1096023 My question: Using annotation package how can I convert the probe ID's to Gene names. how do i incorporate gene name in place of 100_g_at? 2. How can I choose/filter P-values from T-test that are less than 0.01 to 0 ? 3. How can write the values into a table with 3 colnames: Gene, P-value, Fold change I am doing this for first time. Please help me. Thank you. Regards, PS

ADD COMMENT • link 21.4 years ago S Peri ▴ 320

0

Entering edit mode

See comments below. On Wed, 2004-08-18 at 18:54, S Peri wrote: > Dear group, > > I did SAM, T-test analyses and obtained p-values. Now, > these files look like these: > > T-test Values: > > > Gli_X0_X1_pvals[1:5] > 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at > > 0.80033009 0.31943016 0.33078591 0.05216239 0.08957325 > > > > Fold change(Avg. Diff): > > MyBrain_X0_X1_Exp_FCs[1:5] > 100_g_at 1000_at 1001_at 1002_f_at 1003_s_at > 1.0176023 0.9274588 0.8752550 1.1056984 1.1096023 > > > My question: > > Using annotation package how can I convert the probe > ID's to Gene names. how do i incorporate gene name in > place of 100_g_at? > There are annotation packages in BioConductor. But if you want a quick and dirty solution, get the CDF file from affymetrix and merge it in excel. These will have all the information you need but may be slightly outdated. Can anyone on the list comment the merits of doing this versus using the BioConductor annotation package ? > 2. How can I choose/filter P-values from T-test that > are less than 0.01 to 0 ? You can write this yourself with a few ifelse(), which(), subset() commands. > 3. How can write the values into a table with 3 > colnames: > Gene, P-value, Fold change mat <- cbind( genename, pvalue, foldchange ) write.table(mat, file="aaa.txt", sep="\t", quote=FALSE) > I am doing this for first time. Please help me. > > Thank you. > > Regards, > PS > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 21.4 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

On Aug 19, 2004, at 6:36 AM, Adaikalavan Ramasamy wrote: > See comments below. > > On Wed, 2004-08-18 at 18:54, S Peri wrote: >> >> Using annotation package how can I convert the probe >> ID's to Gene names. how do i incorporate gene name in >> place of 100_g_at? >> > > There are annotation packages in BioConductor. But if you want a > quick and dirty solution, get the CDF file from affymetrix and > merge it in excel. These will have all the information you need > but may be slightly outdated. > > Can anyone on the list comment the merits of doing this > versus using the BioConductor annotation package ? > The annotation packages contain much more information than just the gene name (like gene ontology, homologous genes, etc.). If one has a vector of affy IDs from, for example, the hgu95av2 chip, getting the gene symbol is as simple as: getSYMBOL(myaffyids,"hgu95av2") See ?getSYMBOL for more help on getting ids of various types from the affy identifiers. Also, many of the other packages (GOstats, ontoTools, etc.) make extensive use of the annotation packages, so, while the "quick and dirty" approach will give you simple information, it does pay off if one is going to do post-processing of results in R to learn how to use the annotation packages. If all one needs is a gene name, either way works (but I still think the annotation package is a more robust solution). Sean

ADD REPLY • link 21.4 years ago Sean Davis 21k

0

Entering edit mode

Dear group, I have list of genes (say ~120) from a pathway. Can I use 'genefilter' functions OR any other function to pick (only those I need for my pathway) fold-change values, p-value and LocuID from the output table that I created using write.table. Thank you all for your valuble suggestion for my previous query about annotate package and writing the output to a table (REF:Annotate Package: How do I get the gene names and how do I write my matrix). I could make things work on my desk. It was my mistake to iterate over element again and again even after using a 'for' loop. Eg: for (i on x){ y <- do something i = i+1 } i realized later that i =i+1 is not needed. Thank you PS

ADD REPLY • link 21.4 years ago S Peri ▴ 320

0

Entering edit mode

A.J. Rossini ▴ 810

@aj-rossini-209

Last seen 11.3 years ago

mcolosim@brandeis.edu writes: > Quoting "A.J. Rossini" <rossini@blindglobe.net>: > >> mcolosim@brandeis.edu writes: >> >> > Jim, >> > >> > Thanks for the hint about %in%, where did you find this function? I >> couldn't >> > find any thing about it. >> >> ?%in% >> >> should provide the help page. It does for me (though under Emacs). >> >> (i.e. help("%in%") ) >> >> Interesting that help.search("in") isn't too useful (in fact, it seems >> to miss it). >> >> > > I'm using an old version of R (1.8.1) and ?%in% doesn't work. However, > help("%in%") does. I know it is time up update everything. ?"%in%" might be the right thing. Emacs takes care of that for me. best, -tony -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

ADD COMMENT • link 21.4 years ago A.J. Rossini ▴ 810

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 days ago

United States

?%in% won't work with any version of R (except it appears to work under Emacs - more Rossini magic, I assume?). At the R prompt you have to use ?"%in% Best, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <mcolosim@brandeis.edu> 08/18/04 01:58PM >>> Quoting "A.J. Rossini" <rossini@blindglobe.net>: > mcolosim@brandeis.edu writes: > > > Jim, > > > > Thanks for the hint about %in%, where did you find this function? I > couldn't > > find any thing about it. > > ?%in% > > should provide the help page. It does for me (though under Emacs). > > (i.e. help("%in%") ) > > Interesting that help.search("in") isn't too useful (in fact, it seems > to miss it). > > I'm using an old version of R (1.8.1) and ?%in% doesn't work. However, help("%in%") does. I know it is time up update everything. Marc _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.4 years ago James W. MacDonald 68k

Login before adding your answer.