Question: Script for filtering
0
10.6 years ago by
Alberto Goldoni60 wrote:
Hi to everybody, i have to extract 500 genes from all the genes present on the GenomeWideSNP_6.0 database. If i have the list of these 500 genes, are there a script in order to extract only these genes from the complete list? Thanks a lot.
• 441 views
modified 10.6 years ago by James W. MacDonald51k • written 10.6 years ago by Alberto Goldoni60
0
10.6 years ago by
United States
James W. MacDonald51k wrote:
Hi Alberto, Alberto Goldoni wrote: > Hi to everybody, > > i have to extract 500 genes from all the genes present on the GenomeWideSNP_6.0 database. I'm not familiar with this database. Could you please give more information? Also, there are no genes measured on the GenomeWideSNP_6.0 chip. This chip measures SNPs, which may or may not be in or near genes. > > If i have the list of these 500 genes, are there a script in order to extract only these genes from the complete list? This question is too vague to be answered. What is the 'complete list'? Maybe you are trying to subset a list or data.frame, in which case you should look at ?'[' ?'%in% or perhaps ?which Best, Jim > > Thanks a lot. > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
OK, still not very detailed, so I will make assumptions. I assume the file GenomeWideSNP_6.0 is the csv file you can download from Affy. I assume the second 'list' of 500 SNPs is actually a vector of the dbSNP RS IDs for 500 SNPs. I assume you want to know how many of the 500 SNPs in the vector of IDs are found on the 6.0 chip. I assume you read both into R, and called the Affy csv file 'GW6' and the vector of RS IDs 'rsid'. I also assume there is a column called 'rsid' in the GW6 data.frame. intersection <- GW6[GW6$rsid %in% rsid,] And if there is a column in GW6 called 'gene' that you are interested in, then you could add intersection <- GW6[GW6$rsid %in% rsid,"gene"] to get just that column. Hopefully that helps. But maybe you see my point about detailed questions. When you want to know how to do something, you are asking a very _specific_ question. If you don't give very specific details about what you are trying to do, preferably with sample code if things aren't working the way you think they should, then people are left to guess what you want and what you have tried. Best, Jim Alberto Goldoni wrote: > Sorry i'll be more detailed. > > in R i'd need to load the file GenomeWideSNP_6.0 containing all the SNPs and in the second time i compare this list with a second list containing SNPs of 500 genes. > > I would like to know how many genes (of the second list: 500) are included in the first list (GenomeWideSNP_6.0 database) and > which SNPs are the same between the two lists. > > best regards. > > ________________________________________ > Da: James W. MacDonald [jmacdon at med.umich.edu] > Inviato: gioved? 23 aprile 2009 15.35 > A: Alberto Goldoni > Cc: bioconductor at stat.math.ethz.ch > Oggetto: Re: [BioC] Script for filtering > > Hi Alberto, > > Alberto Goldoni wrote: >> Hi to everybody, >> >> i have to extract 500 genes from all the genes present on the GenomeWideSNP_6.0 database. > > I'm not familiar with this database. Could you please give more > information? Also, there are no genes measured on the GenomeWideSNP_6.0 > chip. This chip measures SNPs, which may or may not be in or near genes. > > >> If i have the list of these 500 genes, are there a script in order to extract only these genes from the complete list? > > This question is too vague to be answered. What is the 'complete list'? > > Maybe you are trying to subset a list or data.frame, in which case you > should look at > > ?'[' > ?'%in% > or perhaps > ?which > > Best, > > Jim > > >> Thanks a lot. >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826