How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hello, I think the vignette is clear that you need (1) a gene set and (2) a mircoarray dataset to run the gage analysis. On page 4 they mention the importance of having the same ID system for your gene set and expression data. Once this is accomplished you can use the gage() function. ## this is the expression data gse16873 ## this is the gene set kegg.gs ## call to gage() using 'HN' as control and 'DCIS' as treatment gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, ref = hn, samp = dcis) I belive if you have only one column of expression data the 'ref' and 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage for details. Maybe the package author will comment on this. I've cc'd them on this message. It is still not clear to me what you have tried. It would be helpful to know the following, (1) what is your analysis question (what are you trying to accomplish) (2) what have you tried (what functions have you used) (3) what errors have you seen from #2 Valerie On 01/16/2012 04:19 PM, Javerjung Sandhu wrote: > Hi Valerie, > First of all thanks a lot for replying and helping me. I really appreciate that. I am sending you the R source code file which the GAGE analysis uses plus two other documents which explains what that package does. > These are the data files used by the GAGE analysis: > ---------------------------- > Data sets in package gage: > carta.gs Common gene set data collections > egSymb Mapping between Entrez Gene IDs and official > symbols > go.gs Common gene set data collections > gse16873 GSE16873: a breast cancer microarray dataset > kegg.gs Common gene set data collections > ----------------------------------------------------- > I have only ONE tab delimited data file in the form of a MATRIX giving the gene expressions for 173 patients(as columns) and names of genes(as rows). > I want to know how can i use this package and my data to do the GAGE analysis. > If you need more information, please tell me. I will be ready to provide that. > Thanks, > Jung > > ________________________________________ > From: Valerie Obenchain [vobencha@fhcrc.org] > Sent: Monday, January 16, 2012 3:18 PM > To: Javerjung Sandhu > Cc: bioconductor@r-project.org; luo_weijun@yahoo.com > Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files > > Hi Jung, > > Please provide the code you've tried and the error you are seeing. For > example, did you read your own data into R, then try to use gage() and > got an error? We can better help you if we understand your inputs and > the function you're having trouble with. > > Valerie > > > On 01/13/12 13:10, Javerjung Sandhu wrote: >> Dear List, >> I will highly appreciate your help on this. >> For the GAGE analysis package shown by the link given below: >> http://www.bioconductor.org/packages/release/bioc/html/gage.html >> Could you please tell me how to prepare the Custom INPUT files required for this analysis >> OR >> Send me the SAMPLE DATA files in TXT format so that i know in which format i need to put the data& how could i DO a BASIC GAGE analysis using those files. I couldn't figure it out and trying it since 3 weeks or more. >> Best Regards, >> Jung >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

Microarray GO Cancer Breast gage Microarray GO Cancer Breast gage • 2.4k views

ADD COMMENT • link updated 12.3 years ago by Javerjung Sandhu ▴ 200 • written 12.3 years ago by Valerie Obenchain ★ 6.8k

0

Entering edit mode

Javerjung Sandhu ▴ 200

@javerjung-sandhu-5043

Last seen 9.6 years ago

Hello Valerie, Thanks for your help. I am sending you the data files(Micro_array_dataset.txt** & Gene_Set.txt) which i want to use for the analysis. I need to know in which format the files should be saved (like http:// www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_format s this site explains in great detail, what should be the format of the data files required for GSEA analysis (though i am not using GSEA analysis or these file types), same way i want to know in which format i should save the data files required for GAGE analysis so that the analysis is done properly) Please tell me which information is missing from these files. * Yes i know that "gse16873" is expression data and "kegg.gs" is a geneset but i want to use my own, these ones are provided by the author. 1) What i want to accomplish is: I want to do a basic gage analysis (as given in the R script file named "GAGE.r" and pdf file "gage.pdf") such as t-test, rank test, KS test etc. 2) I copied the begining code(to make sure that it loads all the files successfully) from R script file provided by the author (which is also attached as GAGE.r) and made some changes to it and saved as my own script (also attached as Gage_run.r). I tried to load the data files (Micro_array_dataset.txt & Gene_Set.txt) and got these errors (shown in "R Console.txt" file). 3) I run the R script file (Gage_run.r) first to see that it loads all the input files successfully and then i can move ahead with the tests. The output is shown in "R Console.txt" file which shows the errors and warnings. If you need more additional information. Please do tell me. I will be happy to provide that. **an expression matrix with genes as rows and samples as columns. Thanks, Jung ________________________________ From: Valerie Obenchain [vobencha@fhcrc.org] Sent: Tuesday, January 17, 2012 10:04 AM To: Javerjung Sandhu Cc: bioconductor at r-project.org; luo_weijun at yahoo.com Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files Hello, I think the vignette is clear that you need (1) a gene set and (2) a mircoarray dataset to run the gage analysis. On page 4 they mention the importance of having the same ID system for your gene set and expression data. Once this is accomplished you can use the gage() function. ## this is the expression data gse16873 ## this is the gene set kegg.gs ## call to gage() using 'HN' as control and 'DCIS' as treatment gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, ref = hn, samp = dcis) I belive if you have only one column of expression data the 'ref' and 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage for details. Maybe the package author will comment on this. I've cc'd them on this message. It is still not clear to me what you have tried. It would be helpful to know the following, (1) what is your analysis question (what are you trying to accomplish) (2) what have you tried (what functions have you used) (3) what errors have you seen from #2 Valerie On 01/16/2012 04:19 PM, Javerjung Sandhu wrote: Hi Valerie, First of all thanks a lot for replying and helping me. I really appreciate that. I am sending you the R source code file which the GAGE analysis uses plus two other documents which explains what that package does. These are the data files used by the GAGE analysis: ---------------------------- Data sets in package ?gage?: carta.gs Common gene set data collections egSymb Mapping between Entrez Gene IDs and official symbols go.gs Common gene set data collections gse16873 GSE16873: a breast cancer microarray dataset kegg.gs Common gene set data collections ----------------------------------------------------- I have only ONE tab delimited data file in the form of a MATRIX giving the gene expressions for 173 patients(as columns) and names of genes(as rows). I want to know how can i use this package and my data to do the GAGE analysis. If you need more information, please tell me. I will be ready to provide that. Thanks, Jung ________________________________________ From: Valerie Obenchain [vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>] Sent: Monday, January 16, 2012 3:18 PM To: Javerjung Sandhu Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; luo_weijun at yahoo.com<mailto:luo_weijun at="" yahoo.com=""> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files Hi Jung, Please provide the code you've tried and the error you are seeing. For example, did you read your own data into R, then try to use gage() and got an error? We can better help you if we understand your inputs and the function you're having trouble with. Valerie On 01/13/12 13:10, Javerjung Sandhu wrote: Dear List, I will highly appreciate your help on this. For the GAGE analysis package shown by the link given below: http://www.bioconductor.org/packages/release/bioc/html/gage.html Could you please tell me how to prepare the Custom INPUT files required for this analysis OR Send me the SAMPLE DATA files in TXT format so that i know in which format i need to put the data& how could i DO a BASIC GAGE analysis using those files. I couldn't figure it out and trying it since 3 weeks or more. Best Regards, Jung [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Micro_array_dataset.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120117="" 8ff7a17c="" attachment.txt=""> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Gene_Set.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120117="" 8ff7a17c="" attachment-0001.txt=""> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: R Console.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120117="" 8ff7a17c="" attachment-0002.txt=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: gage.pdf Type: application/pdf Size: 267508 bytes Desc: gage.pdf URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120117="" 8ff7a17c="" attachment.pdf="">

ADD COMMENT • link 12.3 years ago Javerjung Sandhu ▴ 200

0

Entering edit mode

Hi Jung, Thank you for sending your files but there is no need to attach the source files from the gage package (GAGE.r, gage.pdf). I have access to those files. The package vignette is just intended to be an example. Clearly the data in the package and your data will be very different. It does not make sense to try to follow the code exactly "as is" when using your data. For example, it doesn't make sense for you to grep for 'HN', 'ADH' and 'DCIS' since they don't exist in your file. These are treatment groups included in the gage sample data and have no bearing on your analysis. This is why you see nothing (i.e., integer(0)) for these variables. > Micro_array_dataset <- read.table("Micro_array_dataset.txt") > cn=colnames(Micro_array_dataset) > hn=grep('HN',cn, ignore.case =T) > adh=grep('ADH',cn, ignore.case =T) > dcis=grep('DCIS',cn, ignore.case =T) > print(hn) integer(0) > print(dcis) integer(0) This error is due to the fact that you are subsetting a data.frame and have not specified the columns. In the vignette, the gene set is a list so this subsetting works. > lapply(Gene_set[1:3],head) Error in `[.data.frame`(Gene_set, 1:3) : undefined columns selected Next, your genes need to be grouped by pathway. The idea is to do an analysis of gene pathways so you need to provide a list of genes grouped by pathway (like the kegg.gs or go.gs example files in the vignette). Your gene file consists only of gene names, > head(rownames(Micro_array_dataset)) [1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457" [5] "ENSG00000000460" "ENSG00000000938" In R, a list of genes grouped by pathway would look like something like this, > headkegg.gs) $`hsa00010 Glycolysis / Gluconeogenesis` [1] "10327" "124" "125" "126" "127" "128" "130" "130589" [9] "131" "160287" "1737" "1738" "2023" "2026" "2027" "217" ... $`hsa00020 Citrate cycle (TCA cycle)` [1] "1431" "1737" "1738" "1743" "2271" "283398" "3417" "3418" [9] "3419" "3420" "3421" "4190" "4191" "47" "48" "4967" ... You need to identify what pathways you are interested and group the genes by those pathways. For identifying pathways take a look at the GO.db, KEGG.db or reactome.db. Mapping between gene identifiers can be done with the org.*.db packages. http://www.bioconductor.org/packages/release/data/annotation/ Some general background on using Bioconductor annotation data is here, http://www.bioconductor.org/help/workflows/annotation-data /#annotation-resources Valerie On 01/17/12 12:51, Javerjung Sandhu wrote: > Hello Valerie, > Thanks for your help. I am sending you the data > files(Micro_array_dataset.txt** & Gene_Set.txt) which i want to use > for the analysis. > I need to know in which format the files should be saved (like > http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Da ta_formats > this site explains in great detail, what should be the format of the > data files required for GSEA analysis (though i am not using GSEA > analysis or these file types), same way i want to know in which format > i should save the data files required for GAGE analysis so that the > analysis is done properly) > Please tell me which information is missing from these files. > * Yes i know that "gse16873" is expression data and "kegg.gs" is a > geneset but i want to use my own, these ones are provided by the author. > 1) What i want to accomplish is: I want to do a basic gage analysis > (as given in the R script file named "GAGE.r" and pdf file "gage.pdf") > such as t-test, rank test, KS test etc. > 2) I copied the begining code(to make sure that it loads all the files > successfully) from R script file provided by the author (which is also > attached as GAGE.r) and made some changes to it and saved as my own > script (also attached as Gage_run.r). I tried to load the data files > (Micro_array_dataset.txt & Gene_Set.txt) and got these errors (shown > in "R Console.txt" file). > 3) I run the R script file (Gage_run.r) first to see that it loads all > the input files successfully and then i can move ahead with the tests. > The output is shown in "R Console.txt" file which shows the errors and > warnings. > If you need more additional information. Please do tell me. I will be > happy to provide that. > **an expression matrix with genes as rows and samples as columns. > Thanks, > Jung > -------------------------------------------------------------------- ---- > *From:* Valerie Obenchain [vobencha at fhcrc.org] > *Sent:* Tuesday, January 17, 2012 10:04 AM > *To:* Javerjung Sandhu > *Cc:* bioconductor at r-project.org; luo_weijun at yahoo.com > *Subject:* Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE > Analysis and DO a BASIC GAGE analysis using those files > > Hello, > > I think the vignette is clear that you need (1) a gene set and (2) a > mircoarray dataset to run the gage analysis. On page 4 they mention > the importance of having the same ID system for your gene set and > expression data. Once this is accomplished you can use the gage() > function. > > ## this is the expression data > gse16873 > > ## this is the gene set > kegg.gs > > ## call to gage() using 'HN' as control and 'DCIS' as treatment > gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, > ref = hn, samp = dcis) > > > I belive if you have only one column of expression data the 'ref' and > 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage > for details. Maybe the package author will comment on this. I've cc'd > them on this message. > > It is still not clear to me what you have tried. It would be helpful > to know the following, > > (1) what is your analysis question (what are you trying to accomplish) > (2) what have you tried (what functions have you used) > (3) what errors have you seen from #2 > > > Valerie > > > > > > > > > > On 01/16/2012 04:19 PM, Javerjung Sandhu wrote: >> Hi Valerie, >> First of all thanks a lot for replying and helping me. I really appreciate that. I am sending you the R source code file which the GAGE analysis uses plus two other documents which explains what that package does. >> These are the data files used by the GAGE analysis: >> ---------------------------- >> Data sets in package ?gage?: >> carta.gs Common gene set data collections >> egSymb Mapping between Entrez Gene IDs and official >> symbols >> go.gs Common gene set data collections >> gse16873 GSE16873: a breast cancer microarray dataset >> kegg.gs Common gene set data collections >> ----------------------------------------------------- >> I have only ONE tab delimited data file in the form of a MATRIX giving the gene expressions for 173 patients(as columns) and names of genes(as rows). >> I want to know how can i use this package and my data to do the GAGE analysis. >> If you need more information, please tell me. I will be ready to provide that. >> Thanks, >> Jung >> >> ________________________________________ >> From: Valerie Obenchain [vobencha at fhcrc.org] >> Sent: Monday, January 16, 2012 3:18 PM >> To: Javerjung Sandhu >> Cc:bioconductor at r-project.org;luo_weijun at yahoo.com >> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files >> >> Hi Jung, >> >> Please provide the code you've tried and the error you are seeing. For >> example, did you read your own data into R, then try to use gage() and >> got an error? We can better help you if we understand your inputs and >> the function you're having trouble with. >> >> Valerie >> >> >> On 01/13/12 13:10, Javerjung Sandhu wrote: >>> Dear List, >>> I will highly appreciate your help on this. >>> For the GAGE analysis package shown by the link given below: >>> http://www.bioconductor.org/packages/release/bioc/html/gage.html >>> Could you please tell me how to prepare the Custom INPUT files required for this analysis >>> OR >>> Send me the SAMPLE DATA files in TXT format so that i know in which format i need to put the data& how could i DO a BASIC GAGE analysis using those files. I couldn't figure it out and trying it since 3 weeks or more. >>> Best Regards, >>> Jung >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives:http://news.gmane.org/gmane.science.biology.in formatics.conductor >

ADD REPLY • link 12.3 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Hi Jung, Gage package provides two functions: readExpData and readList to read in the microarray data and gene set data. For details, please check the help information by type in: ?readExpData ?readList You may check the demo input files for examples of proper file format. They are locations can be found by type in: system.file("extdata/gse16873.demo", package = "gage") system.file("extdata/c2.demo.gmt", package = "gage") Note that these files are not used in real analysis, just to show you the file format. Â To use the gage package, you need to read the vignette (PDF) and know basics of R and Bioconductor. The information provided by Varlerie is very useful too. You may find more a list of functions available in gage by: library(help=gage) Â Weijun --- On Tue, 1/17/12, Javerjung Sandhu <jsandhu@bcgsc.ca> wrote: From: Javerjung Sandhu <jsandhu@bcgsc.ca> Subject: RE: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files To: "Valerie Obenchain" <vobencha@fhcrc.org> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org>, "luo_weijun@yahoo.com" <luo_weijun@yahoo.com> Date: Tuesday, January 17, 2012, 3:51 PM Hello Valerie, Thanks for your help. I am sending you the data files(Micro_array_dataset.txt** & Gene_Set.txt) which i want to use for the analysis. I need to know in which format the files should be saved (like http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data _formats this site explains in great detail, what should be the format of the data files required for GSEA analysis (though i am not usingÂ GSEA analysis or these file types), same way i want to know in which format i should saveÂ the data files required for GAGE analysisÂ so that the analysis is done properly) Please tell me which information is missing from these files. * Yes i know that "gse16873" is expression data and "kegg.gs" is a geneset but i want to use my own, these ones are provided by the author. 1) What i want to accomplish is: I want to do a basic gage analysis (as given in the R script file named "GAGE.r" and pdf file "gage.pdf") such as t-test, rank test, KS test etc. Â 2) I copied theÂ begining code(to make sure that it loads all the files successfully) fromÂ R script file provided by the author (which is also attached as GAGE.r)Â and made some changes to it and saved as my own script (also attached as Gage_run.r). I tried to load the dataÂ files (Micro_array_dataset.txt & Gene_Set.txt)Â and got these errors (shown in "R Console.txt" file). Â 3)Â I run the R script file (Gage_run.r)Â first to see that it loads all the input files successfully and then i can move ahead with the tests. The output is shown in "R Console.txt" file which shows the errors and warnings. Â If you need more additional information. Please do tell me. I will be happy to provide that. **an expression matrix with genes as rows and samples as columns. Thanks, Jung From: Valerie Obenchain [vobencha@fhcrc.org] Sent: Tuesday, January 17, 2012 10:04 AM To: Javerjung Sandhu Cc: bioconductor@r-project.org; luo_weijun@yahoo.com Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files Hello, I think the vignette is clear that you need (1) a gene set and (2) a mircoarray dataset to run the gage analysis.Â On page 4 they mention the importance of having the same ID system for your gene set and expression data. Once this is accomplished you can use the gage() function. ## this is the expression data gse16873 ## this is the gene set kegg.gs ## call to gage() using 'HN' as control and 'DCIS' as treatment gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, Â Â Â ref = hn, samp = dcis) I belive if you have only one column of expression data the 'ref' and 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage for details. Maybe the package author will comment on this. I've cc'd them on this message. It is still not clear to me what you have tried. It would be helpful to know the following, (1) what is your analysis question (what are you trying to accomplish) (2) what have you tried (what functions have you used) (3) what errors have you seen from #2 Valerie On 01/16/2012 04:19 PM, Javerjung Sandhu wrote: Hi Valerie, First of all thanks a lot for replying and helping me. I really appreciate that. I am sending you the R source code file which the GAGE analysis uses plus two other documents which explains what that package does. These are the data files used by the GAGE analysis: ---------------------------- Data sets in package âgageâ: carta.gs Common gene set data collections egSymb Mapping between Entrez Gene IDs and official symbols go.gs Common gene set data collections gse16873 GSE16873: a breast cancer microarray dataset kegg.gs Common gene set data collections ----------------------------------------------------- I have only ONE tab delimited data file in the form of a MATRIX giving the gene expressions for 173 patients(as columns) and names of genes(as rows). I want to know how can i use this package and my data to do the GAGE analysis. If you need more information, please tell me. I will be ready to provide that. Thanks, Jung ________________________________________ From: Valerie Obenchain [vobencha@fhcrc.org] Sent: Monday, January 16, 2012 3:18 PM To: Javerjung Sandhu Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files Hi Jung, Please provide the code you've tried and the error you are seeing. For example, did you read your own data into R, then try to use gage() and got an error? We can better help you if we understand your inputs and the function you're having trouble with. Valerie On 01/13/12 13:10, Javerjung Sandhu wrote: Dear List, I will highly appreciate your help on this. For the GAGE analysis package shown by the link given below: http://www.bioconductor.org/packages/release/bioc/html/gage.html Could you please tell me how to prepare the Custom INPUT files required for this analysis OR Send me the SAMPLE DATA files in TXT format so that i know in which format i need to put the data& how could i DO a BASIC GAGE analysis using those files. I couldn't figure it out and trying it since 3 weeks or more. Best Regards, Jung [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 12.3 years ago Luo Weijun ★ 1.6k

Login before adding your answer.