Entering edit mode
Yoo, Seungyeul
▴
110
@yoo-seungyeul-5323
Last seen 10.4 years ago
Hi all,
I'm a newbie in microarray analysis and enjoy learning this wonderful
techniques to understand gene activities.
I'm playing with one example of gene expression 6 samples from 3
patients (two samples for each) in a agilent platform.
After struggling from beginning, now I have a ranked list of probes
based on mean and pvalue of logratio and I want to perform gene set
enrichment analysis with the ranked list. But I still can't find the
way to map my genes into their corresponding gene sets. I'm stuck here
for last two days.
I have two questions here.
1) Is there any simple way to map genes in the list to their gene set
based on GOCollection using only their EntrezID?
2) When I googled it and searched back the previous bioconductor
mailing archieve, using "GeneSetCollection" seems to be the best way,
but it requires my data is ExpressionSet. So I was trying to do with
my microarray data with their intensity and probename like following
table_all<-cbind(log10(processed),t_table[,4:7],gene_u[,2:3])
table_all<-table_all[!is.na(table_all$EntrezID),]
exp=as.matrix(table_all[,1:8])
df<-data.frame(x=colnames(exp),y=c(rep(c("PB","SP"),3),"SP","SP"),row.
names=colnames(exp))
meta<-data.frame(labelDescription=c("ID","Character"))
my_set=new("ExpressionSet", exprs=exp,
phenoData=new("AnnotatedDataFrame",data=df,varMetadata=meta),
annotation="hgug4845a.db")
I created phenoData just using the name of the samples. It looks fine
(without no error!) but now my_set, created ExpressionSet, contains
almost 29000 probes which will take a long time to run
GeneSetConnection() with all of them. So I am thinking to use only
5000 probes (2500 from top and 2500 from bottom) to go with
genesetconnection but I'm not sure if this will affect the results at
the end. My understanding about GSEA is our interest lies only geneset
which overlaps gene located top or bottom in the ranked list. Can
anyone give me advices if my logic is right or not? Sorry for so naive
questions.
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tools grid stats graphics
[5] grDevices utils datasets methods
[9] base
other attached packages:
[1] BiocInstaller_1.4.6 Matrix_1.0-6
[3] lattice_0.20-6 KEGG.db_2.7.1
[5] GSEABase_1.18.0 annotate_1.34.0
[7] ALL_1.4.12 BiocCaseStudies_1.18.0
[9] hgug4845a.db_0.0.3 hgug4112a.db_2.7.1
[11] xtable_1.7-0 GO.db_2.7.1
[13] hgu95av2.db_2.7.1 org.Hs.eg.db_2.7.1
[15] GOstats_2.22.0 RSQLite_0.11.1
[17] DBI_0.2-5 graph_1.34.0
[19] Category_2.22.0 AnnotationDbi_1.18.1
[21] Biobase_2.16.0 BiocGenerics_0.2.0
[23] genefilter_1.38.0 gplots_2.10.1
[25] KernSmooth_2.23-7 caTools_1.13
[27] bitops_1.0-4.1 gdata_2.8.2
[29] gtools_2.6.2 marray_1.34.0
[31] limma_3.12.0
loaded via a namespace (and not attached):
[1] IRanges_1.14.3 RBGL_1.32.0
[3] splines_2.15.0 stats4_2.15.0
[5] survival_2.36-14 XML_3.9-4
Best regards,
Seungyeul Yoo
Postdoc fellow
Institute of Genomics and Multiscale Biology
Department of Genetics and Genomic Sciences
Mount Sinai School of Medicine
[[alternative HTML version deleted]]