Question

Mapping genes to their gene sets for GSEA

0

Entering edit mode

Yoo, Seungyeul ▴ 110

@yoo-seungyeul-5323

Last seen 10.6 years ago

Hi all, I'm a newbie in microarray analysis and enjoy learning this wonderful techniques to understand gene activities. I'm playing with one example of gene expression 6 samples from 3 patients (two samples for each) in a agilent platform. After struggling from beginning, now I have a ranked list of probes based on mean and pvalue of logratio and I want to perform gene set enrichment analysis with the ranked list. But I still can't find the way to map my genes into their corresponding gene sets. I'm stuck here for last two days. I have two questions here. 1) Is there any simple way to map genes in the list to their gene set based on GOCollection using only their EntrezID? 2) When I googled it and searched back the previous bioconductor mailing archieve, using "GeneSetCollection" seems to be the best way, but it requires my data is ExpressionSet. So I was trying to do with my microarray data with their intensity and probename like following table_all<-cbind(log10(processed),t_table[,4:7],gene_u[,2:3]) table_all<-table_all[!is.na(table_all$EntrezID),] exp=as.matrix(table_all[,1:8]) df<-data.frame(x=colnames(exp),y=c(rep(c("PB","SP"),3),"SP","SP"),row. names=colnames(exp)) meta<-data.frame(labelDescription=c("ID","Character")) my_set=new("ExpressionSet", exprs=exp, phenoData=new("AnnotatedDataFrame",data=df,varMetadata=meta), annotation="hgug4845a.db") I created phenoData just using the name of the samples. It looks fine (without no error!) but now my_set, created ExpressionSet, contains almost 29000 probes which will take a long time to run GeneSetConnection() with all of them. So I am thinking to use only 5000 probes (2500 from top and 2500 from bottom) to go with genesetconnection but I'm not sure if this will affect the results at the end. My understanding about GSEA is our interest lies only geneset which overlaps gene located top or bottom in the ranked list. Can anyone give me advices if my logic is right or not? Sorry for so naive questions. > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] tools grid stats graphics [5] grDevices utils datasets methods [9] base other attached packages: [1] BiocInstaller_1.4.6 Matrix_1.0-6 [3] lattice_0.20-6 KEGG.db_2.7.1 [5] GSEABase_1.18.0 annotate_1.34.0 [7] ALL_1.4.12 BiocCaseStudies_1.18.0 [9] hgug4845a.db_0.0.3 hgug4112a.db_2.7.1 [11] xtable_1.7-0 GO.db_2.7.1 [13] hgu95av2.db_2.7.1 org.Hs.eg.db_2.7.1 [15] GOstats_2.22.0 RSQLite_0.11.1 [17] DBI_0.2-5 graph_1.34.0 [19] Category_2.22.0 AnnotationDbi_1.18.1 [21] Biobase_2.16.0 BiocGenerics_0.2.0 [23] genefilter_1.38.0 gplots_2.10.1 [25] KernSmooth_2.23-7 caTools_1.13 [27] bitops_1.0-4.1 gdata_2.8.2 [29] gtools_2.6.2 marray_1.34.0 [31] limma_3.12.0 loaded via a namespace (and not attached): [1] IRanges_1.14.3 RBGL_1.32.0 [3] splines_2.15.0 stats4_2.15.0 [5] survival_2.36-14 XML_3.9-4 Best regards, Seungyeul Yoo Postdoc fellow Institute of Genomics and Multiscale Biology Department of Genetics and Genomic Sciences Mount Sinai School of Medicine [[alternative HTML version deleted]]

Microarray Genetics GO hgu95av2 hgug4112a Microarray Genetics GO hgu95av2 hgug4112a • 1.2k views

ADD COMMENT • link 12.8 years ago Yoo, Seungyeul ▴ 110