How to filter a list of genes by their ontology
3
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
Hi, After analysing my arrays for differentially expressed genes, I get a list of genes Id. To reduce even more the number of genes in this list, I would like to retain only genes related to the immune system. I've looked for packages dealing with "ontology" but I couldn't find any doing this simple task... Any idea on what package/function I could use? Thanks in advance for your help. L.P -- output of sessionInfo(): R version 2.7.0 (2008-04-22) powerpc-apple-darwin8.10.1 locale: fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.
• 1.2k views
ADD COMMENT
0
Entering edit mode
@john-linux-user-4917
Last seen 8.5 years ago
United States
Hi, The mapping functions like match and  %in% or overlapping functions in range data could help. Otherwise, a simple script in python could be faster. John ________________________________ From: Laurent Pays [guest] <guest@bioconductor.org> To: bioconductor@r-project.org; Laurent.Pays@univ-lyon1.fr Sent: Monday, September 17, 2012 11:15 AM Subject: [BioC] How to filter a list of genes by their ontology Hi, After analysing my arrays for differentially expressed genes, I get a list of genes Id. To reduce even more the number of genes in this list, I would like to retain only genes related to the immune system. I've looked for packages dealing with "ontology" but I couldn't find any doing this simple task... Any idea on what package/function I could use? Thanks in advance for your help. L.P -- output of sessionInfo(): R version 2.7.0 (2008-04-22) powerpc-apple-darwin8.10.1 locale: fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8 attached base packages: [1] stats    graphics  grDevices utils    datasets  methods  base -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 days ago
United States
Hi Laurent, On 9/17/2012 11:15 AM, Laurent Pays [guest] wrote: > Hi, > After analysing my arrays for differentially expressed genes, I get a list of genes Id. To reduce even more the number of genes in this list, I would like to retain only genes related to the immune system. I've looked for packages dealing with "ontology" but I couldn't find any doing this simple task... It's not really a simple task. If you want to assume that GO terms fulfill your criteria, then you can look for terms that contain the word 'immune'. You don't say what species you are working with, so I'll assume Homo sapiens. > library(org.Hs.eg.db) ## fake up some gene IDs > egids <- Lkeys(org.Hs.egSYMBOL)[sample(1:2e4, 500)] > gos <- mget(egids, org.Hs.egGO) > goterms <- sapply(gos, function(x) if(!is.null(names(x))) Term(names(x))) > ind <- sapply(goterms, function(x) length(grep("immune", x))) > 0 > sum(ind) [1] 17 > egids[ind] [1] "159" "3934" "3557" "4057" "3806" "8742" "8876" "10581" "57115" [10] "55593" "6352" "959" "9865" "3452" "841" "3608" "6935" Best, Jim > > Any idea on what package/function I could use? > > Thanks in advance for your help. > > L.P > > -- output of sessionInfo(): > > R version 2.7.0 (2008-04-22) > powerpc-apple-darwin8.10.1 > > locale: > fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
On 09/17/2012 11:51 AM, James W. MacDonald wrote: > Hi Laurent, > > On 9/17/2012 11:15 AM, Laurent Pays [guest] wrote: >> Hi, >> After analysing my arrays for differentially expressed genes, I get a >> list of genes Id. To reduce even more the number of genes in this >> list, I would like to retain only genes related to the immune system. >> I've looked for packages dealing with "ontology" but I couldn't find >> any doing this simple task... > > It's not really a simple task. If you want to assume that GO terms > fulfill your criteria, then you can look for terms that contain the word > 'immune'. You don't say what species you are working with, so I'll > assume Homo sapiens. > > > library(org.Hs.eg.db) > ## fake up some gene IDs > > egids <- Lkeys(org.Hs.egSYMBOL)[sample(1:2e4, 500)] > > gos <- mget(egids, org.Hs.egGO) > > goterms <- sapply(gos, function(x) if(!is.null(names(x))) > Term(names(x))) > > ind <- sapply(goterms, function(x) length(grep("immune", x))) > 0 > > sum(ind) > [1] 17 > > egids[ind] > [1] "159" "3934" "3557" "4057" "3806" "8742" "8876" "10581" > "57115" > [10] "55593" "6352" "959" "9865" "3452" "841" "3608" "6935" > > Best, > > Jim > > > >> >> Any idea on what package/function I could use? >> >> Thanks in advance for your help. >> >> L.P >> >> -- output of sessionInfo(): >> >> R version 2.7.0 (2008-04-22) >> powerpc-apple-darwin8.10.1 Also, this R is VERY out of date, and the first thing you'll want to do is update it. Martin >> >> locale: >> fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
The vignette of the keggorthology package addresses the use of KEGG terms to organize gene sets, and has a section "Application to gene filtering" that gives a deployment against hgu95av2 chips. There the example involves searching for genes annotated to terms including the token "insulin". One needs to take care that the token(s) used in searching will return all and only relevant genes. The GSEABase package also has relevant facilities. On Mon, Sep 17, 2012 at 2:51 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Laurent, > > > On 9/17/2012 11:15 AM, Laurent Pays [guest] wrote: > >> Hi, >> After analysing my arrays for differentially expressed genes, I get a >> list of genes Id. To reduce even more the number of genes in this list, I >> would like to retain only genes related to the immune system. I've looked >> for packages dealing with "ontology" but I couldn't find any doing this >> simple task... >> > > It's not really a simple task. If you want to assume that GO terms fulfill > your criteria, then you can look for terms that contain the word 'immune'. > You don't say what species you are working with, so I'll assume Homo > sapiens. > > > library(org.Hs.eg.db) > ## fake up some gene IDs > > egids <- Lkeys(org.Hs.egSYMBOL)[sample(**1:2e4, 500)] > > gos <- mget(egids, org.Hs.egGO) > > goterms <- sapply(gos, function(x) if(!is.null(names(x))) > Term(names(x))) > > ind <- sapply(goterms, function(x) length(grep("immune", x))) > 0 > > sum(ind) > [1] 17 > > egids[ind] > [1] "159" "3934" "3557" "4057" "3806" "8742" "8876" "10581" > "57115" > [10] "55593" "6352" "959" "9865" "3452" "841" "3608" "6935" > > Best, > > Jim > > > > > >> Any idea on what package/function I could use? >> >> Thanks in advance for your help. >> >> L.P >> >> -- output of sessionInfo(): >> >> R version 2.7.0 (2008-04-22) >> powerpc-apple-darwin8.10.1 >> >> locale: >> fr_FR.UTF-8/fr_FR.UTF-8/C/C/**fr_FR.UTF-8/fr_FR.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 days ago
United States
Hi Laurent, On 9/17/2012 11:15 AM, Laurent Pays [guest] wrote: > Hi, > After analysing my arrays for differentially expressed genes, I get a list of genes Id. To reduce even more the number of genes in this list, I would like to retain only genes related to the immune system. I've looked for packages dealing with "ontology" but I couldn't find any doing this simple task... > > Any idea on what package/function I could use? > > Thanks in advance for your help. > > L.P > > -- output of sessionInfo(): > > R version 2.7.0 (2008-04-22) > powerpc-apple-darwin8.10.1 Seriously, you need to upgrade. A four year old version of R is ridiculously outdated. Best, Jim > > locale: > fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 587 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6