Question

Perform limma based gene-set testing for a two-group comparison in a microarray dataset regarding specific biological processes

1

Entering edit mode

svlachavas ▴ 830

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Community,

based on some initial in vitro experiments, and a subsequent cancer microarray dataset analysis in R, i would like to perform some gene-set tests, for specific pathways and ontologies, regarding my phenotype of interest. Briefly, based on a two-group condition, we are mostly interested in identifying biological processes related to neutrophils, and subsequently more generally to inflammation. So the two major approaches under consideration:

A) Have identified through Gene Ontology Consortium, 7 GO-biological processes that are related to netrophils (http://amigo.geneontology.org/amigo/search/ontology?q=neutrophils)

B) The C7 immunologic signatures from WHEI (rdata files)

My major questions are:

1) In the context of microarrays, especially for the first part of the specific GOs: fry would be more appropriate, or mroast ? Alternatively,

would mroast be more suitable for the second part with the many immunologic gene sets ?

2) My second issue, is more specific with the microarray platform and annotation:

in detail, the microarray platform is the Agilent SurePrint G3 Human GE v2 8x60k Microarray (Array Design A-MEXP-2320),

for which as no R annotation package was available, i have downloaded the latest gene symbol annotation from https://earray.chem.agilent.com/earray/

Thus, as both of the above approaches need Entrez Gene ids, how could i proceed ? as my expression matrix, has unique gene symbols in the rows ? Below, is a small code chunk from the final limma part:

class(final)
"EList"
attr(,"package")
"limma"

 dim(final$E)
23339   119

head(final$E)
     US84600244_253949426815_S01_GE1_107_Sep09_1_4
IRX1                                      4.979257
SAA1                                      7.548621
H19                                      13.150892
MBP                                       8.240486
SAA2                                      6.692976
CHGA                                      7.527782.....

condition <- factor(final$targets$UBE2D3.group,
levels = c("LOW.UBE2D3","HIGH.UBE2D3"))

design <- model.matrix(~condition)

fit <- lmFit(final,design)...

Thank you in advance,

Efstathios

limma agilent microarrays mroast fry gene tests gene set testing • 1.4k views

ADD COMMENT • link updated 5.5 years ago by Gordon Smyth 50k • written 5.5 years ago by svlachavas ▴ 830

score 1 · Answer 1 · 2018-10-10

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 36 minutes ago

WEHI, Melbourne, Australia

1) With 7 particular GO terms, I would use mroast. Why not? roast is designed for focused gene set tests. fry is an approximation to mroast but, with only 7 terms, you may as well use roast itself.

For B) I would use camera.

2) Personally, I use alias2SymbolUsingNCBI() to convert gene symbols to Entrez Gene Ids and anything else I need. For example:

> Symbols <- c("IRX1","SAA1","H19","MBP","SAA2","CHGA")
> alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")
      GeneID Symbol                                    description
14710  79192   IRX1                            iroquois homeobox 1
5055    6288   SAA1                               serum amyloid A1
20753 283120    H19 H19, imprinted maternally expressed transcript
3388    4155    MBP                           myelin basic protein
5056    6289   SAA2                               serum amyloid A2
925     1113   CHGA                                 chromogranin A

ADD COMMENT • link 5.5 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, thank you very much for the very useful comment-i have used in the past-based also on your suggestion-alias2SymbolTable, but i haven't checked that alias2SymbolUsingNCBI() returns also GeneIDs-

moreover, regarding my initial question, concerning the type of gene set ? you would choose for example one "type" of test for each procedure ? that is, fry for the specific GOs, and mroast for the high number of gene sets ?

ADD REPLY • link 5.5 years ago svlachavas ▴ 830

0

Entering edit mode

Dear Gordon, thank you for your updates for my first question part-however, I'm facing a specific downstream issue:

Symbols <- rownames(final)
dat <- alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")

head(dat)
      GeneID Symbol                                    description
14710  79192   IRX1                            iroquois homeobox 1
5055    6288   SAA1                               serum amyloid A1
20752 283120    H19 H19, imprinted maternally expressed transcript
3388    4155    MBP                           myelin basic protein
5056    6289   SAA2                               serum amyloid A2
925     1113   CHGA                                 chromogranin A

rownames(final) <- as.character(dat$GeneID) # have entrez gene ids
head(rownames(final))
[1] "79192"  "6288"   "283120" "4155"   "6289"   "1113"

But afterwards, while loading the GO rdata from WEHI (http://bioinf.wehi.edu.au/software/MSigDB/)-C5 gene sets:

load("human_c5_v5p2.rdata")

head(Hs.c5)

$`GO_REGULATION_OF_DOPAMINE_METABOLIC_PROCESS`
[1] "5153" "4929" "4129" "1815" "6870" "5071" "1312" "3350"
[9] "2861" "3251" "1141" "6622" "6531" "18" "1812" "25953"
[17] "11315"

$GO_LACTATE_TRANSPORT
[1] "23539" "9121" "9122" "159963" "133418" "6566" "9194"
[8] "387700" "201232" "9120" "9123" "162515"

$GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION
[1] "5432" "5439" "9150" "7936" "25920" "51773" "5431" "5433"
[9] "5436" "5435" "5430" "22938" "1105" "5440" "1025" "3725"
[17] "5434" "904" "51176" "5437" "2963" "6829" "3249" "4851"
[25] "2033" "6827" "5441" "5438" "6882" "6598" "5216" "7469"
[33] "51193" "6597" "29969" "51497" "6667" "2962" "7023"

.......

However, how could i subset this list, for the specific BP terms, as my GO identifiers are in a different form ? [http://amigo.geneontology.org/amigo/search/ontology?q=neutrophils]

for example, the GO:0070488, which has the name neutrophil aggregation ?

Or my approach is incorrect, and these GO gene sets could not contain the above specific GOs, as they are different, grouped together or omitted, based on the relative description ? (http://software.broadinstitute.org/gsea/msigdb/collection_details.jsp#C5)

and i should follow another approach ?

ADD REPLY • link 5.5 years ago svlachavas ▴ 830