enrichment of GO terms based on input gene list

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hello Bioconductor group, I am trying to derive a list of enriched GO terms from a set of genes given by a user. The way I understand and am implementing is: My input is a csv file consisting of about 500 genes. library(topGO) library(hgu133plus2.db) all.genes <- ls(hgu133plus2ACCNUM) data <- read.csv(file.choose(),header=FALSE) #Here I give an input csv file containing genes relevant.genes <- factor(as.integer(all.genes %in% data) names(relevant.genes) <- all.genes GOdata.BP <- new("topGOdata", ontology='BP', allGenes = relevant.genes, annotationFun = annFUN.db, affyLib = 'hgu133plus2.db') ------------ Error in .local(.Object, ...) : allGenes must be a factor with 2 levels > str(relevant.genes) Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "names")= chr [1:54675] "1007_s_at" "1053_at" "117_at" "121_at" ... -------------- Can you please direct me where am I going wrong? Am I right in using TopGO for this, even though I dont have expression values. - Joseph -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hgu95av2.db_2.8.0 org.Hs.eg.db_2.8.0 topGO_2.10.0 [4] SparseM_0.96 GO.db_2.8.0 RSQLite_0.11.2 [7] DBI_0.2-5 AnnotationDbi_1.20.7 Biobase_2.18.0 [10] BiocGenerics_0.4.0 graph_1.36.2 loaded via a namespace (and not attached): [1] grid_2.15.2 IRanges_1.16.6 lattice_0.20-14 parallel_2.15.2 [5] stats4_2.15.2 tools_2.15.2 -- Sent via the guest posting facility at bioconductor.org.

GO hgu95av2 GO hgu95av2 • 1.9k views

ADD COMMENT • link updated 11.1 years ago by Reema Singh ▴ 570 • written 11.1 years ago by Guest User ★ 13k

0

Entering edit mode

Reema Singh ▴ 570

@reema-singh-4373

Last seen 9.6 years ago

Hi Joseph You can also try GlobalTest,GOstat,Category and GeneAnswer package for GO term enrichment. Here what is the content of your CSV file? If it is Entrez id/Probe_id then you can also use GO.db for the mapping of entrez id to ACCNUM. Regards Reema Singh PhD Scholar Computational Biology and Bioinformatics School of Computational and Integrative Sciences Jawaharlal Nehru University New Delhi-110067 INDIA On Thu, Mar 21, 2013 at 12:17 AM, Joseph Nalluri [guest] < guest@bioconductor.org> wrote: > > Hello Bioconductor group, > > I am trying to derive a list of enriched GO terms from a set of genes > given by a user. The way I understand and am implementing is: > > My input is a csv file consisting of about 500 genes. > > library(topGO) > library(hgu133plus2.db) > all.genes <- ls(hgu133plus2ACCNUM) > > data <- read.csv(file.choose(),header=FALSE) > #Here I give an input csv file containing genes > > relevant.genes <- factor(as.integer(all.genes %in% data) > names(relevant.genes) <- all.genes > GOdata.BP <- new("topGOdata", ontology='BP', allGenes = relevant.genes, > annotationFun = annFUN.db, affyLib = 'hgu133plus2.db') > > ------------ > Error in .local(.Object, ...) : allGenes must be a factor with 2 levels > > str(relevant.genes) > Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ... > - attr(*, "names")= chr [1:54675] "1007_s_at" "1053_at" "117_at" "121_at" > ... > > -------------- > > Can you please direct me where am I going wrong? Am I right in using TopGO > for this, even though I dont have expression values. > > - Joseph > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hgu95av2.db_2.8.0 org.Hs.eg.db_2.8.0 topGO_2.10.0 > [4] SparseM_0.96 GO.db_2.8.0 RSQLite_0.11.2 > [7] DBI_0.2-5 AnnotationDbi_1.20.7 Biobase_2.18.0 > [10] BiocGenerics_0.4.0 graph_1.36.2 > > loaded via a namespace (and not attached): > [1] grid_2.15.2 IRanges_1.16.6 lattice_0.20-14 parallel_2.15.2 > [5] stats4_2.15.2 tools_2.15.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.1 years ago Reema Singh ▴ 570

Login before adding your answer.