topGO
1
0
Entering edit mode
@steven-stadler-6711
Last seen 11.1 years ago
Hi! I am new to Bioconductor and topGO ... My aim is to make a go-term richment analysis on expression data with a control and two different infections. I managed to create my own goterm-gene mapping, but I dont know how to create my own geneList. I have a excel sheet with p-values, reads and so on ... How can I create this geneList in R? I am also a newbee in R ;-) I would create a csv File withe the genname and its p-value? But how can I parse it in R/bioconductor to use it for the creation of a topGO object? It would be nice, if someone could tell me the parse command :-) Or an example how I can create a topGO object with custom data. Thanx. Greetings Steven [[alternative HTML version deleted]]
topGO topGO • 2.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States
Hi Steven, One of the best ways to figure out what to do is to see what is required. If you look on page 3 of the topGO vignette ( http://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc /topGO.pdf), you can see that the vignette uses a 'geneList' that comes with the package. We can inspect this object like this: > library(topGO) > data(geneList) > class(geneList) ## always a good idea to check [1] "numeric" > head(geneList) 1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at 1342_g_at 1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000 Being a newbie, you might not recognize this data structure. It is a named vector, where the vector itself is numeric and the names are character. In other words, the top row above (starting with 1095_s_at) contains the names, and the second row contains the values. In this case, the names are the names of the probes, and the numbers are the p-values from a t-test comparing two groups. Please note that you don't give us much information to go on, so it isn't possible to give you much help. In other words, what array are you using? Do you have mappings of probe ID to GO terms? If it is a common array, there are likely to be packages in Bioconductor that can help, or you might need to use an organism level package. Speaking of which, what is the species? What have you done so far? Did you analyze these data in R? If not, what is the form of your data? You say something about a csv file; is that how you have the data right now? Without knowing some or all of the above, it isn't really possible to give you anything but a general solution. So here is a general solution: You need to read in your probe identifiers and p-values and then create a named vector. So you need to use one of read.table(), read.csv(), read.delim() or scan() to read these things into R. Once you have done that (and note that you will almost surely want to set the stringsAsFactors argument to FALSE for any of the read.xxx functions), then you can create the geneList like this: geneList <- {p-values go here} names(geneList) <- {probe IDs go here} If you answer the questions above, we can probably give more constructive help. Best, Jim On Tue, Aug 26, 2014 at 4:06 AM, Steven Stadler <steven.stadler at="" gmail.com=""> wrote: > Hi! I am new to Bioconductor and topGO ... My aim is to make a go- term > richment analysis on expression data with a control and two different > infections. I managed to create my own goterm-gene mapping, but I dont know > how to create my own geneList. I have a excel sheet with p-values, reads > and so on ... How can I create this geneList in R? I am also a newbee in R > ;-) > > I would create a csv File withe the genname and its p-value? But how can I > parse it in R/bioconductor to use it for the creation of a topGO object? It > would be nice, if someone could tell me the parse command :-) Or an example > how I can create a topGO object with custom data. Thanx. > > Greetings Steven > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 317 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6