Perform limma based gene-set testing for a two-group comparison in a microarray dataset regarding specific biological processes
1
1
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

based on some initial in vitro experiments, and a subsequent cancer microarray dataset analysis in R, i would like to perform some gene-set tests, for specific pathways and ontologies, regarding my phenotype of interest. Briefly, based on a two-group condition, we are mostly interested in identifying biological processes related to neutrophils, and subsequently more generally to inflammation. So the two major approaches under consideration:

A) Have identified through Gene Ontology Consortium, 7 GO-biological processes that are related to netrophils (http://amigo.geneontology.org/amigo/search/ontology?q=neutrophils)

B) The C7 immunologic signatures from WHEI (rdata files)

My major questions are:

1) In the context of microarrays, especially for the first part of the specific GOs: fry would be more appropriate, or mroast ? Alternatively,

would mroast be more suitable for the second part with the many immunologic gene sets ?

2) My second issue, is more specific with the microarray platform and annotation:

in detail, the microarray platform is the Agilent SurePrint G3 Human GE v2 8x60k Microarray (Array Design A-MEXP-2320),

for which as no R annotation package was available, i have downloaded the latest gene symbol annotation from https://earray.chem.agilent.com/earray/

Thus, as both of the above approaches need Entrez Gene ids, how could i proceed ? as my expression matrix, has unique gene symbols in the rows ? Below, is a small code chunk from the final limma part:

class(final)
"EList"
attr(,"package")
"limma"

 dim(final$E)
23339   119

head(final$E)
     US84600244_253949426815_S01_GE1_107_Sep09_1_4
IRX1                                      4.979257
SAA1                                      7.548621
H19                                      13.150892
MBP                                       8.240486
SAA2                                      6.692976
CHGA                                      7.527782.....

condition <- factor(final$targets$UBE2D3.group,
levels = c("LOW.UBE2D3","HIGH.UBE2D3"))

design <- model.matrix(~condition)

fit <- lmFit(final,design)...

 

Thank you in advance,

Efstathios

 

limma agilent microarrays mroast fry gene tests gene set testing • 1.6k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

1) With 7 particular GO terms, I would use mroast. Why not? roast is designed for focused gene set tests. fry is an approximation to mroast but, with only 7 terms, you may as well use roast itself.

For B) I would use camera.

2) Personally, I use alias2SymbolUsingNCBI() to convert gene symbols to Entrez Gene Ids and anything else I need. For example:

> Symbols <- c("IRX1","SAA1","H19","MBP","SAA2","CHGA")
> alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")
      GeneID Symbol                                    description
14710  79192   IRX1                            iroquois homeobox 1
5055    6288   SAA1                               serum amyloid A1
20753 283120    H19 H19, imprinted maternally expressed transcript
3388    4155    MBP                           myelin basic protein
5056    6289   SAA2                               serum amyloid A2
925     1113   CHGA                                 chromogranin A
ADD COMMENT
0
Entering edit mode

Dear Gordon, thank you very much for the very useful comment-i have used in the past-based also on your suggestion-alias2SymbolTable, but i haven't checked that alias2SymbolUsingNCBI() returns also GeneIDs-

moreover, regarding my initial question, concerning the type of gene set ? you would choose for example one "type" of test for each procedure ? that is, fry for the specific GOs, and mroast for the high number of gene sets ?

ADD REPLY
0
Entering edit mode

Dear Gordon, thank you for your updates for my first question part-however, I'm facing a specific downstream issue:

Symbols <- rownames(final)
dat <- alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")

head(dat)
      GeneID Symbol                                    description
14710  79192   IRX1                            iroquois homeobox 1
5055    6288   SAA1                               serum amyloid A1
20752 283120    H19 H19, imprinted maternally expressed transcript
3388    4155    MBP                           myelin basic protein
5056    6289   SAA2                               serum amyloid A2
925     1113   CHGA                                 chromogranin A

rownames(final) <- as.character(dat$GeneID) # have entrez gene ids
head(rownames(final))
[1] "79192"  "6288"   "283120" "4155"   "6289"   "1113"  

But afterwards, while loading the GO rdata from WEHI (http://bioinf.wehi.edu.au/software/MSigDB/)-C5 gene sets:

load("human_c5_v5p2.rdata")

 head(Hs.c5)


$`GO_REGULATION_OF_DOPAMINE_METABOLIC_PROCESS`
 [1] "5153"  "4929"  "4129"  "1815"  "6870"  "5071"  "1312"  "3350" 
 [9] "2861"  "3251"  "1141"  "6622"  "6531"  "18"    "1812"  "25953"
[17] "11315"

$GO_LACTATE_TRANSPORT
 [1] "23539"  "9121"   "9122"   "159963" "133418" "6566"   "9194"  
 [8] "387700" "201232" "9120"   "9123"   "162515"

$GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION
 [1] "5432"  "5439"  "9150"  "7936"  "25920" "51773" "5431"  "5433" 
 [9] "5436"  "5435"  "5430"  "22938" "1105"  "5440"  "1025"  "3725" 
[17] "5434"  "904"   "51176" "5437"  "2963"  "6829"  "3249"  "4851" 
[25] "2033"  "6827"  "5441"  "5438"  "6882"  "6598"  "5216"  "7469" 
[33] "51193" "6597"  "29969" "51497" "6667"  "2962"  "7023" 

.......

However, how could i subset this list, for the specific BP terms, as my GO identifiers are in a different form ? [http://amigo.geneontology.org/amigo/search/ontology?q=neutrophils

for example, the GO:0070488, which has the name neutrophil aggregation ?

Or my approach is incorrect, and these GO gene sets could not contain the above specific GOs, as they are different, grouped together or omitted, based on the relative description ? (http://software.broadinstitute.org/gsea/msigdb/collection_details.jsp#C5)

and i should follow another approach ?

ADD REPLY

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6