in addition to one of my previous posts about mroast testing for findind "differentially activated" KEGG pathways, i would like to make two very important questions regarding some useful implementations about my current analysis in a illumina microarray dataset(Human HT-12 v4 beadchip) which i have analyzed and pre-processed with limma. My first question is if i could similarly test for multiple gene sets regarding Gene Ontology via mroast. My design matrix for my expressionset is also mentioned in other post(C: Correct construction of design matrix in limma for multiple contrasts for gene e)
My first approach is similar to my post for KEGG with mroast(C: Appropriate use of the function mroast to find KEGG pathways for differential ex) :
x <- illuminaHumanv4GO2PROBE
mapped_probes <- mappedkeys(x)
xx <- as.list(x[mapped_probes])
indices <- ids2indices(xx, rownames(filtered.2)) # filtered.2= normalized & filtered-EListRaw-class
res <- mroast(filtered.2, indices, design, contrast=6) # contrast=6 in the design matrix in the above link represents the difference between the combined therapy of two compounds vs DMSO(control group)
My second approach is based on an edx lesson(PH525x series) for gene set testing in R, which i slightly modified-the motivation for the second approach is that i wanted to define more the minimum length for each gene set for testing:
go2probe <- as.list(illuminaHumanv4GO2PROBE)
govector <- unlist(go2probe)
golengths <- sapply(go2probe, length)
idxvector <- match(govector, rownames(filtered.2))
idx <- split(idxvector, rep(names(go2probe),golengths))
idxclean <- lapply(idx,function(x) x[is.na(x)])
idxlengths <- sapply(idxclean,length)
idxsub <- idxclean[idxlengths>=10]
res <- mroast(filtered.2,idxsub,design,contrast=6)
and then for both cases use the GO.db database to find each GO term to which specific term is annotated(i.e. MF, BP or CC).
To sum up, my first question is (because i use R for the past 6 months) if both the above methodologies are appropriate for the use of mroast for finding "DE GO terms" ?
Moreover, my second and also very crusial question, is if any of these above methods could be also used with other curated gene-sets as inputs for mroast in R, such as the molecular signature databases in the Broad Institute (http://bioinf.wehi.edu.au/software/MSigDB/) ?? My main reason for asking is that besides various pathway analysis methodologies and repositories like KEGG or Reactome, as my current dataset describes the use of combinatorial inhibitors in a specific metastatic breast cancer cell line, i found extremely interesting and possibly useful to use specific molecular signatures from the Broad Institute, such as the C6 oncogenic signatures. Or these are completely different and irrelevant for my specific analysis ?
Thank you in advance !!!