Question: Perform limma based gene-set testing for a two-group comparison in a microarray dataset regarding specific biological processes
gravatar for svlachavas
6 months ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas650 wrote:

Dear Community,

based on some initial in vitro experiments, and a subsequent cancer microarray dataset analysis in R, i would like to perform some gene-set tests, for specific pathways and ontologies, regarding my phenotype of interest. Briefly, based on a two-group condition, we are mostly interested in identifying biological processes related to neutrophils, and subsequently more generally to inflammation. So the two major approaches under consideration:

A) Have identified through Gene Ontology Consortium, 7 GO-biological processes that are related to netrophils (

B) The C7 immunologic signatures from WHEI (rdata files)

My major questions are:

1) In the context of microarrays, especially for the first part of the specific GOs: fry would be more appropriate, or mroast ? Alternatively,

would mroast be more suitable for the second part with the many immunologic gene sets ?

2) My second issue, is more specific with the microarray platform and annotation:

in detail, the microarray platform is the Agilent SurePrint G3 Human GE v2 8x60k Microarray (Array Design A-MEXP-2320),

for which as no R annotation package was available, i have downloaded the latest gene symbol annotation from

Thus, as both of the above approaches need Entrez Gene ids, how could i proceed ? as my expression matrix, has unique gene symbols in the rows ? Below, is a small code chunk from the final limma part:


23339   119

IRX1                                      4.979257
SAA1                                      7.548621
H19                                      13.150892
MBP                                       8.240486
SAA2                                      6.692976
CHGA                                      7.527782.....

condition <- factor(final$targets$,
levels = c("LOW.UBE2D3","HIGH.UBE2D3"))

design <- model.matrix(~condition)

fit <- lmFit(final,design)...


Thank you in advance,



ADD COMMENTlink modified 6 months ago by Gordon Smyth37k • written 6 months ago by svlachavas650
Answer: Perform limma based gene-set testing for a two-group comparison in a microarray
gravatar for Gordon Smyth
6 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

1) With 7 particular GO terms, I would use mroast. Why not? roast is designed for focused gene set tests. fry is an approximation to mroast but, with only 7 terms, you may as well use roast itself.

For B) I would use camera.

2) Personally, I use alias2SymbolUsingNCBI() to convert gene symbols to Entrez Gene Ids and anything else I need. For example:

> Symbols <- c("IRX1","SAA1","H19","MBP","SAA2","CHGA")
> alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")
      GeneID Symbol                                    description
14710  79192   IRX1                            iroquois homeobox 1
5055    6288   SAA1                               serum amyloid A1
20753 283120    H19 H19, imprinted maternally expressed transcript
3388    4155    MBP                           myelin basic protein
5056    6289   SAA2                               serum amyloid A2
925     1113   CHGA                                 chromogranin A
ADD COMMENTlink modified 6 months ago • written 6 months ago by Gordon Smyth37k

Dear Gordon, thank you very much for the very useful comment-i have used in the past-based also on your suggestion-alias2SymbolTable, but i haven't checked that alias2SymbolUsingNCBI() returns also GeneIDs-

moreover, regarding my initial question, concerning the type of gene set ? you would choose for example one "type" of test for each procedure ? that is, fry for the specific GOs, and mroast for the high number of gene sets ?

ADD REPLYlink written 6 months ago by svlachavas650

Dear Gordon, thank you for your updates for my first question part-however, I'm facing a specific downstream issue:

Symbols <- rownames(final)
dat <- alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")

      GeneID Symbol                                    description
14710  79192   IRX1                            iroquois homeobox 1
5055    6288   SAA1                               serum amyloid A1
20752 283120    H19 H19, imprinted maternally expressed transcript
3388    4155    MBP                           myelin basic protein
5056    6289   SAA2                               serum amyloid A2
925     1113   CHGA                                 chromogranin A

rownames(final) <- as.character(dat$GeneID) # have entrez gene ids
[1] "79192"  "6288"   "283120" "4155"   "6289"   "1113"  

But afterwards, while loading the GO rdata from WEHI ( gene sets:



 [1] "5153"  "4929"  "4129"  "1815"  "6870"  "5071"  "1312"  "3350" 
 [9] "2861"  "3251"  "1141"  "6622"  "6531"  "18"    "1812"  "25953"
[17] "11315"

 [1] "23539"  "9121"   "9122"   "159963" "133418" "6566"   "9194"  
 [8] "387700" "201232" "9120"   "9123"   "162515"

 [1] "5432"  "5439"  "9150"  "7936"  "25920" "51773" "5431"  "5433" 
 [9] "5436"  "5435"  "5430"  "22938" "1105"  "5440"  "1025"  "3725" 
[17] "5434"  "904"   "51176" "5437"  "2963"  "6829"  "3249"  "4851" 
[25] "2033"  "6827"  "5441"  "5438"  "6882"  "6598"  "5216"  "7469" 
[33] "51193" "6597"  "29969" "51497" "6667"  "2962"  "7023" 


However, how could i subset this list, for the specific BP terms, as my GO identifiers are in a different form ? [

for example, the GO:0070488, which has the name neutrophil aggregation ?

Or my approach is incorrect, and these GO gene sets could not contain the above specific GOs, as they are different, grouped together or omitted, based on the relative description ? (

and i should follow another approach ?

ADD REPLYlink modified 6 months ago • written 6 months ago by svlachavas650
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 161 users visited in the last hour