Entering edit mode
Hello everybody,
I have a table with some microarray experiments which look like that:
my "genelist.txt"
Probe Id NAME FC_set1 FC_set2 FC_set3 FC_set4
A_51_P100021 Hivep3 1.048368 -1.085207 -1.013457
1.032816
A_51_P100034 Mif4gd -1.049719 -1.077773 -1.084012
-1.004941
A_51_P100052 Slitrk2 1.339832 1.063053 -1.157675
-1.003128
A_51_P100063 Lnx1 1.073604 1.010892 -1.058375 1.063377
A_51_P100084 Unknown 1.084544 -1.258876 -1.092571
-1.058791
...
the Probe Ids are from the Agilent expression arrays. I extracted the
names
using BiomaRt and now I would like to find whether there are some
overrepresented gene sets in the differentially regulated genes.
For once I would like to see if there are any GO terms which are
overrepresented in these gene lists for each of the columns (gene
sets).
Secondly i would like to search for accumulations of other gene sets
of
differentially regulated genes in these lists (for example kinases,
transcription factors, but also localization, protein domain etc.)
I would like your help in creating the gene sets of either GO terms or
the
other parameters.
I know I can extract the data from BiomaRt to each and every gen. for
example:
mart <- useMart("ensembl")
ensembl <- useDataset("mmusculus_gene_ensembl", mart = mart)
test <- read.delim("genelist.txt")
geneset1 <- read.delim("geneset1_all_signal.txt")
genes <- as.character(geneset1[,1])
geneNames <- getBM(attributes = c("go_biological_process_id",
"name_1006",
"agilent_wholegenome", "external_gene_id", "ensembl_gene_id",
"entrezgene"),
filter = c("agilent_wholegenome"), values = geneset1, mart = ensembl)
> geneNames
go_biological_process_id
name_1006
1
GO:0007409
axonogenesis
2 GO:0006511 ubiquitin-
dependent
protein catabolic process
3 GO:0051260
protein homooligomerization
4 GO:0042787 protein ubiquitination during ubiquitin-
dependent
protein catabolic process
5 GO:0006417
regulation of translation
6
GO:0016070 RNA
metabolic process
7
GO:0016070 RNA
metabolic process
agilent_wholegenome external_gene_id ensembl_gene_id entrezgene
1 A_51_P100052 Slitrk2 ENSMUSG00000036790 245450
2 A_51_P100063 Lnx1 ENSMUSG00000029228 16924
3 A_51_P100063 Lnx1 ENSMUSG00000029228 16924
4 A_51_P100063 Lnx1 ENSMUSG00000029228 16924
5 A_51_P100034 Mif4gd ENSMUSG00000020743 69674
6 A_51_P100034 Mif4gd ENSMUSG00000020743 69674
7 A_51_P100034 Mif4gd ENSMUSG00000020743 NA
Now I would like to create the gene sets according to these GO
categories. I
would like to get something like that:
GO:0007409 A_51_P100052 ... the rest of the genes from this category
in the
list on one line
GO:0016070 A_51_P100034 ...
GO:0006417 A_51_P100034 ...
THX for the help
Assa
[[alternative HTML version deleted]]