gene enrichment analysis without a control sample
3
0
Entering edit mode
Wendy Qiao ▴ 360
@wendy-qiao-4501
Last seen 9.6 years ago
Hi all, I have a microarray dataset compiled from several sources, so I am facing some challenges with identifying the expressed genes of each cell type. I am thinking to use the enriched gene sets of each cell type as the expressed genes of that cell type. However, the gene set enrichment analysis (http://www.broadinstitute.org/gsea/index.jsp) needs both control and sample data. I am wondering if there is gene set enrichment tool for the analysis of one cell type only. Thank you, Wendy [[alternative HTML version deleted]]
Microarray Microarray • 1.7k views
ADD COMMENT
0
Entering edit mode
Wu, Di ▴ 120
@wu-di-4945
Last seen 8.8 years ago
United States
Hi Wendy, I am not sure whether using a right gene set test is your current problem. It seems you want to find the signature genes for each of the cell types. Therefore, for this question, it seems a differential expression problem to me. I understand, when you have data from several cell types, you probably don't have one particular cell type as a control group to all other cell types. I had the similar problem in the mammary gland cell type data (Lim 2010, Nature Medicine). What I have done is to compare the cell type A to each of the other three cell types, then get the overlapped up (or down) regulated genes in the three comparisons. These genes are the signature genes (expressed genes or lower- expressed genes) for the cell type A. The same thing can be done for the other cell types. Regarding gene set tests, testing which pathways, GO terms or other gene lists are enriched in your gene list, there are different ways. Some required the raw data (our "roast" and "romer" functions in limma among others ). The geneSetTest function in limma only used the ranks of genes. I will be happy to discuss with you more about gene set tests if that is actually what you face to or if you need to use them later. Hope this help, Di ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] On Behalf Of Wendy Qiao [wendy2.qiao@gmail.com] Sent: Friday, November 04, 2011 5:01 PM To: bioconductor at r-project.org Subject: [BioC] gene enrichment analysis without a control sample Hi all, I have a microarray dataset compiled from several sources, so I am facing some challenges with identifying the expressed genes of each cell type. I am thinking to use the enriched gene sets of each cell type as the expressed genes of that cell type. However, the gene set enrichment analysis (http://www.broadinstitute.org/gsea/index.jsp) needs both control and sample data. I am wondering if there is gene set enrichment tool for the analysis of one cell type only. Thank you, Wendy [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Wendy Qiao ▴ 360
@wendy-qiao-4501
Last seen 9.6 years ago
Hi Di, Thank you very much for you email. My major challenges with identifying differentially expressed genes is the microarray data are from different platforms (Illumina and Affymetrix), and those are the only data available for my project. In addition, my question does not necessarily to find differentially expressed genes of each cell type, but the *expressed genes* of each cell type are more interested. I hope to find a way that avoids direct comparison between cell type and cell type. I tried to rank the gene expression for each cell type and set a cutoff for expressed and unexpressed genes, but the cutoff is arbitrary and affects the downstream analysis. In this case, would you have any suggestions? Any advice on obtaining differentially express ed genes for microarray data from different platforms is also appreciated. By the way, would you mind sending me the title of the paper that you mentioned. Thank you very much, Wendy On 4 November 2011 17:21, Wu, Di <dwu@fas.harvard.edu> wrote: > Hi Wendy, > > I am not sure whether using a right gene set test is your current problem. > It seems you want to find the signature genes for each of the cell types. > Therefore, for this question, it seems a differential expression problem > to me. > > I understand, when you have data from several cell types, you probably > don't have one particular cell type as a control group to all other cell > types. I had the similar problem in the mammary gland cell type data (Lim > 2010, Nature Medicine). What I have done is to compare the cell type A to > each of the other three cell types, then get the overlapped up (or down) > regulated genes in the three comparisons. These genes are the signature > genes (expressed genes or lower-expressed genes) for the cell type A. The > same thing can be done for the other cell types. > > Regarding gene set tests, testing which pathways, GO terms or other gene > lists are enriched in your gene list, there are different ways. Some > required the raw data (our "roast" and "romer" functions in limma among > others ). The geneSetTest function in limma only used the ranks of genes. > > I will be happy to discuss with you more about gene set tests if that is > actually what you face to or if you need to use them later. > > Hope this help, > Di > > > ---- > Di Wu > Postdoctoral fellow > Harvard University, Statistics Department > Harvard Medical School > Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA > > ________________________________________ > From: bioconductor-bounces@r-project.org [ > bioconductor-bounces@r-project.org] On Behalf Of Wendy Qiao [ > wendy2.qiao@gmail.com] > Sent: Friday, November 04, 2011 5:01 PM > To: bioconductor@r-project.org > Subject: [BioC] gene enrichment analysis without a control sample > > Hi all, > > I have a microarray dataset compiled from several sources, so I am facing > some challenges with identifying the expressed genes of each cell type. I > am thinking to use the enriched gene sets of each cell type as the > expressed genes of that cell type. However, the gene set enrichment > analysis (http://www.broadinstitute.org/gsea/index.jsp) needs both control > and sample data. I am wondering if there is gene set enrichment tool for > the analysis of one cell type only. > > Thank you, Wendy > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Wendy, Here is the reference to that article. Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin- Labat M-L, Gyorki DE, Ward T, Partanen A, et al. 2009. Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nat Med 15: 907?913. Now I understand your concern of the different platforms. It is always a problem. I would think about two strategies. First, I will see whether all the cell types I am interested in can be obtained in one platform. If so, I will firstly analyze the data in this platform to see how the results are like. If so for for both platform, gene set tests can be used to check the reproducibility across platforms. Second, if some cell types are overlapped, we might be able to use them to remove the batch (platform) effects after matching the gene symbols across platforms. R function " removeBatchEffect" in limma package may work. I am not sure, maybe "MergeMaid" package can also help to merge the data from two platforms. After all these, you can do the routine differential expression data analysis. On the other hand, I think comparing the present/absent of genes in cell types is not very reliable, as you noticed. The fact that the expression value on the array is higher in one gene (A) than in another gene (B) may not really indicates geneA is really expressed higher, maybe due to the difference of probes on the array. Good luck, Di ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] On Behalf Of Wendy Qiao [wendy2.qiao@gmail.com] Sent: Friday, November 04, 2011 6:34 PM To: bioconductor at r-project.org Subject: Re: [BioC] gene enrichment analysis without a control sample Hi Di, Thank you very much for you email. My major challenges with identifying differentially expressed genes is the microarray data are from different platforms (Illumina and Affymetrix), and those are the only data available for my project. In addition, my question does not necessarily to find differentially expressed genes of each cell type, but the *expressed genes* of each cell type are more interested. I hope to find a way that avoids direct comparison between cell type and cell type. I tried to rank the gene expression for each cell type and set a cutoff for expressed and unexpressed genes, but the cutoff is arbitrary and affects the downstream analysis. In this case, would you have any suggestions? Any advice on obtaining differentially express ed genes for microarray data from different platforms is also appreciated. By the way, would you mind sending me the title of the paper that you mentioned. Thank you very much, Wendy On 4 November 2011 17:21, Wu, Di <dwu at="" fas.harvard.edu=""> wrote: > Hi Wendy, > > I am not sure whether using a right gene set test is your current problem. > It seems you want to find the signature genes for each of the cell types. > Therefore, for this question, it seems a differential expression problem > to me. > > I understand, when you have data from several cell types, you probably > don't have one particular cell type as a control group to all other cell > types. I had the similar problem in the mammary gland cell type data (Lim > 2010, Nature Medicine). What I have done is to compare the cell type A to > each of the other three cell types, then get the overlapped up (or down) > regulated genes in the three comparisons. These genes are the signature > genes (expressed genes or lower-expressed genes) for the cell type A. The > same thing can be done for the other cell types. > > Regarding gene set tests, testing which pathways, GO terms or other gene > lists are enriched in your gene list, there are different ways. Some > required the raw data (our "roast" and "romer" functions in limma among > others ). The geneSetTest function in limma only used the ranks of genes. > > I will be happy to discuss with you more about gene set tests if that is > actually what you face to or if you need to use them later. > > Hope this help, > Di > > > ---- > Di Wu > Postdoctoral fellow > Harvard University, Statistics Department > Harvard Medical School > Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA > > ________________________________________ > From: bioconductor-bounces at r-project.org [ > bioconductor-bounces at r-project.org] On Behalf Of Wendy Qiao [ > wendy2.qiao at gmail.com] > Sent: Friday, November 04, 2011 5:01 PM > To: bioconductor at r-project.org > Subject: [BioC] gene enrichment analysis without a control sample > > Hi all, > > I have a microarray dataset compiled from several sources, so I am facing > some challenges with identifying the expressed genes of each cell type. I > am thinking to use the enriched gene sets of each cell type as the > expressed genes of that cell type. However, the gene set enrichment > analysis (http://www.broadinstitute.org/gsea/index.jsp) needs both control > and sample data. I am wondering if there is gene set enrichment tool for > the analysis of one cell type only. > > Thank you, Wendy > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Robert Castelo ★ 3.2k
@rcastelo
Last seen 4 days ago
Barcelona/Universitat Pompeu Fabra
hi Wendy, take a look at the GSVA package, it'll allow you to transform your gene-by-sample matrix of expression values into a gene-set-by-sample matrix of enrichment scores which can be interpreted as surrogates of expression level per gene set. take a look at the vignette, i think your situation might be similar to what is described in section 4.2 of the vignette. cheers, robert. On 11/4/11 10:01 PM, Wendy Qiao wrote: > Hi all, > > I have a microarray dataset compiled from several sources, so I am facing > some challenges with identifying the expressed genes of each cell type. I > am thinking to use the enriched gene sets of each cell type as the > expressed genes of that cell type. However, the gene set enrichment > analysis (http://www.broadinstitute.org/gsea/index.jsp) needs both control > and sample data. I am wondering if there is gene set enrichment tool for > the analysis of one cell type only. > > Thank you, Wendy > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6