Dear Community, based on a description on a previous post (C: Perform limma based gene-set testing for a two-group comparison in a microarray ) i tried to implement mroast gene set testing in limma with specific Gene Ontology Biological Processes:
# my expression microarray dataset:
class(final) [1] "EList" attr(,"package") [1] "limma" head(final$E) US84600244_253949426815_S01_GE1_107_Sep09_1_4 IRX1 4.979257 SAA1 7.548621 H19 13.150892 MBP 8.240486 SAA2 6.692976 CHGA 7.527782.... Symbols <- rownames(final) dat <- alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info") final$genes$EntrezID <- dat$GeneID final.2 <- final[!is.na(final$genes$EntrezID),] rownames(final.2) <- as.character(final.2$genes$EntrezID) condition <- factor(final.2$targets$UBE2D3.group, levels = c("LOW.UBE2D3","HIGH.UBE2D3")) design <- model.matrix(~condition) fit <- lmFit(final.2,design) fit2 <- eBayes(fit, trend=TRUE) neutro.go <- c("GO:0030851", "GO:0070488", "GO:0097350", "GO:0001788","GO:0001781","GO:0001780","GO:0001802","GO:0045728") # selected BPs associated with neutrophils term <- select(GO.db, keys=neutro.go, columns="TERM") Rkeys(org.Hs.egGO2ALLEGS) <- neutro.go neutro.go.genes <- as.list(org.Hs.egGO2ALLEGS) indices <- ids2indices(neutro.go.genes, rownames(final.2)) res <- mroast(final.2, indices, design, contrast=2) res NGenes PropDown PropUp Direction PValue FDR PValue.Mixed GO:0045728 2 1.0000000 0.00000000 Down 0.001 0.0040 0.001 GO:0001788 1 1.0000000 0.00000000 Down 0.106 0.4220 0.106 GO:0030851 28 0.2857143 0.21428571 Down 0.427 0.7105 0.001 GO:0070488 2 0.5000000 0.00000000 Down 0.431 0.7105 0.431 GO:0001802 2 0.5000000 0.00000000 Down 0.485 0.7105 0.318 GO:0001780 14 0.2857143 0.07142857 Down 0.672 0.7105 0.021 GO:0001781 7 0.4285714 0.14285714 Down 0.685 0.7105 0.003 GO:0097350 5 0.0000000 0.00000000 Up 0.711 0.7110 0.608 FDR.Mixed GO:0045728 0.002000000 GO:0001788 0.168800000 GO:0030851 0.002000000 GO:0070488 0.492000000 GO:0001802 0.423333333 GO:0001780 0.041000000 GO:0001781 0.006666667 GO:0097350 0.608000000
1) My first question is regarding the interrogation of the mroast testing for these specific GO terms:
as the columns PValue and PValueMixed are appropriate for different questions, and my many concern is, if any of these GO terms are actually DE, i could focus on GOs above like GO:0030851 and GO:0001780 ? which, even that do not show a significant Pvalue, illustrate a significant PValueMixed ? which also are annotated with not a very small number of genes in comparison to the other selected terms ? and essentially justify, that indeed some of the neutrophil processes are significantly pertubed concerning my phenotype of interest ?
2) For subsequent utilization of the barcode function: as i would to create a barcode plot, similar to the plot created in the FRY section-
i should choose from above, a GOterm, that has a more direct proportion/direction of regulation ? like the GO:0001780 ? in order also to illustrate the changes in my 2 conditions used in my design matrix (low and high expressing UBE2D3 groups)?
3) Regarding the actual visualization function:
barcodeplot( fit2$t[,2], index=indices[["GO:0001780"]] ) Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘100506380’, ‘143872’, ‘149775’, ‘161176’, ‘249’, ‘3373’, ‘341019’, ‘387707’, ‘4253’, ‘4584’, ‘51271’, ‘51614’, ‘57585’, ‘79813’, ‘84458’, ‘84631’, ‘8553’
How can i fix this issue ? i should remove prior the mroast function these duplicates ?
Thank you in advance,
Efstathios
Dear Gordon,
thank you for your clarification, especially on the first 2 questions. If i could re-formulate my initial questions more appropriatelly, my main goal/concern, if with the aim of significance, i could choose the mixed Pvalues-for these specific terms-instead of the one direction Pvalues, as indeed in many cases in various GOterms, there would be mixed proportions of up-and down DEG in any gene set. Thus, the PValue mixed for the above cases would suffice, even if there is a clear but small trend for the downregulation direction, for example in the GO:0001780 ?
Finally, concerning your suggestion, which worked, i added some small labels, and i would like to ask on this:
based on the following link:
https://www.dropbox.com/s/t379eeostepgtn0/GO.0001780.example.mroast.tiff?dl=0
if i interpret correct the plot, as also based on the initial mroast results, that the genes belonging to this specific GO term-neutrophil homeostasis, tend to be-even moderately- upregulated in the HIGH.UBE2D3 group, in comparison to the "LOW.UBE2D3" group on the left ?