Dear Community,
based on some initial in vitro experiments, and a subsequent cancer microarray dataset analysis in R, i would like to perform some gene-set tests, for specific pathways and ontologies, regarding my phenotype of interest. Briefly, based on a two-group condition, we are mostly interested in identifying biological processes related to neutrophils, and subsequently more generally to inflammation. So the two major approaches under consideration:
A) Have identified through Gene Ontology Consortium, 7 GO-biological processes that are related to netrophils (http://amigo.geneontology.org/amigo/search/ontology?q=neutrophils)
B) The C7 immunologic signatures from WHEI (rdata files)
My major questions are:
1) In the context of microarrays, especially for the first part of the specific GOs: fry would be more appropriate, or mroast ? Alternatively,
would mroast be more suitable for the second part with the many immunologic gene sets ?
2) My second issue, is more specific with the microarray platform and annotation:
in detail, the microarray platform is the Agilent SurePrint G3 Human GE v2 8x60k Microarray (Array Design A-MEXP-2320),
for which as no R annotation package was available, i have downloaded the latest gene symbol annotation from https://earray.chem.agilent.com/earray/
Thus, as both of the above approaches need Entrez Gene ids, how could i proceed ? as my expression matrix, has unique gene symbols in the rows ? Below, is a small code chunk from the final limma part:
class(final) "EList" attr(,"package") "limma" dim(final$E) 23339 119 head(final$E) US84600244_253949426815_S01_GE1_107_Sep09_1_4 IRX1 4.979257 SAA1 7.548621 H19 13.150892 MBP 8.240486 SAA2 6.692976 CHGA 7.527782..... condition <- factor(final$targets$UBE2D3.group, levels = c("LOW.UBE2D3","HIGH.UBE2D3")) design <- model.matrix(~condition) fit <- lmFit(final,design)...
Thank you in advance,
Efstathios
Dear Gordon, thank you very much for the very useful comment-i have used in the past-based also on your suggestion-alias2SymbolTable, but i haven't checked that alias2SymbolUsingNCBI() returns also GeneIDs-
moreover, regarding my initial question, concerning the type of gene set ? you would choose for example one "type" of test for each procedure ? that is, fry for the specific GOs, and mroast for the high number of gene sets ?
Dear Gordon, thank you for your updates for my first question part-however, I'm facing a specific downstream issue:
But afterwards, while loading the GO rdata from WEHI (http://bioinf.wehi.edu.au/software/MSigDB/)-C5 gene sets:
load("human_c5_v5p2.rdata")
head(Hs.c5)
$`GO_REGULATION_OF_DOPAMINE_METABOLIC_PROCESS`
[1] "5153" "4929" "4129" "1815" "6870" "5071" "1312" "3350"
[9] "2861" "3251" "1141" "6622" "6531" "18" "1812" "25953"
[17] "11315"
$GO_LACTATE_TRANSPORT
[1] "23539" "9121" "9122" "159963" "133418" "6566" "9194"
[8] "387700" "201232" "9120" "9123" "162515"
$GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION
[1] "5432" "5439" "9150" "7936" "25920" "51773" "5431" "5433"
[9] "5436" "5435" "5430" "22938" "1105" "5440" "1025" "3725"
[17] "5434" "904" "51176" "5437" "2963" "6829" "3249" "4851"
[25] "2033" "6827" "5441" "5438" "6882" "6598" "5216" "7469"
[33] "51193" "6597" "29969" "51497" "6667" "2962" "7023"
.......
However, how could i subset this list, for the specific BP terms, as my GO identifiers are in a different form ? [http://amigo.geneontology.org/amigo/search/ontology?q=neutrophils]
for example, the GO:0070488, which has the name neutrophil aggregation ?
Or my approach is incorrect, and these GO gene sets could not contain the above specific GOs, as they are different, grouped together or omitted, based on the relative description ? (http://software.broadinstitute.org/gsea/msigdb/collection_details.jsp#C5)
and i should follow another approach ?