Question: Explanation of mroast results for specific biological processes and error with barcodeplot function with a microarray dataset
0
6 months ago by
svlachavas650
Greece/Athens/National Hellenic Research Foundation
svlachavas650 wrote:

Dear Community, based on a description on a previous post (C: Perform limma based gene-set testing for a two-group comparison in a microarray ) i tried to implement mroast gene set testing in limma with specific Gene Ontology Biological Processes:

# my expression microarray dataset:

 class(final)
[1] "EList"
attr(,"package")
[1] "limma"

head(final$E) US84600244_253949426815_S01_GE1_107_Sep09_1_4 IRX1 4.979257 SAA1 7.548621 H19 13.150892 MBP 8.240486 SAA2 6.692976 CHGA 7.527782.... Symbols <- rownames(final) dat <- alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info") final$genes$EntrezID <- dat$GeneID

final.2 <- final[!is.na(final$genes$EntrezID),]

rownames(final.2) <- as.character(final.2$genes$EntrezID)

condition <- factor(final.2$targets$UBE2D3.group,
levels = c("LOW.UBE2D3","HIGH.UBE2D3"))

design <- model.matrix(~condition)

fit <- lmFit(final.2,design)

fit2 <- eBayes(fit, trend=TRUE)

neutro.go <- c("GO:0030851", "GO:0070488", "GO:0097350",
"GO:0001788","GO:0001781","GO:0001780","GO:0001802","GO:0045728") # selected BPs associated with neutrophils

term <- select(GO.db, keys=neutro.go, columns="TERM")

Rkeys(org.Hs.egGO2ALLEGS) <- neutro.go

neutro.go.genes <- as.list(org.Hs.egGO2ALLEGS)

indices <- ids2indices(neutro.go.genes, rownames(final.2))

res <- mroast(final.2, indices, design, contrast=2)

res
NGenes  PropDown     PropUp Direction PValue    FDR PValue.Mixed
GO:0045728      2 1.0000000 0.00000000      Down  0.001 0.0040        0.001
GO:0001788      1 1.0000000 0.00000000      Down  0.106 0.4220        0.106
GO:0030851     28 0.2857143 0.21428571      Down  0.427 0.7105        0.001
GO:0070488      2 0.5000000 0.00000000      Down  0.431 0.7105        0.431
GO:0001802      2 0.5000000 0.00000000      Down  0.485 0.7105        0.318
GO:0001780     14 0.2857143 0.07142857      Down  0.672 0.7105        0.021
GO:0001781      7 0.4285714 0.14285714      Down  0.685 0.7105        0.003
GO:0097350      5 0.0000000 0.00000000        Up  0.711 0.7110        0.608
FDR.Mixed
GO:0045728 0.002000000
GO:0001788 0.168800000
GO:0030851 0.002000000
GO:0070488 0.492000000
GO:0001802 0.423333333
GO:0001780 0.041000000
GO:0001781 0.006666667
GO:0097350 0.608000000

1) My first question is regarding the interrogation of the mroast testing for these specific GO terms:

as the columns PValue and PValueMixed are appropriate for different questions, and my many concern is, if any of these GO terms are actually DE, i could focus on GOs above like GO:0030851 and GO:0001780 ? which, even that do not show a significant Pvalue, illustrate a significant PValueMixed ? which also are annotated with not a very small number of genes in comparison to the other selected terms ? and essentially justify, that indeed some of the neutrophil processes are significantly pertubed concerning my phenotype of interest ?

2) For subsequent utilization of the barcode function: as i would to create a barcode plot, similar to the plot created in the FRY section-

http://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html#fry-gene-set-tests

i should choose from above, a GOterm, that has a more direct proportion/direction of regulation ? like the GO:0001780 ? in order also to illustrate the changes in my 2 conditions used in my design matrix (low and high expressing UBE2D3 groups)?

3) Regarding the actual visualization function:

barcodeplot( fit2$t[,2], index=indices[["GO:0001780"]] ) Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘100506380’, ‘143872’, ‘149775’, ‘161176’, ‘249’, ‘3373’, ‘341019’, ‘387707’, ‘4253’, ‘4584’, ‘51271’, ‘51614’, ‘57585’, ‘79813’, ‘84458’, ‘84631’, ‘8553’  How can i fix this issue ? i should remove prior the mroast function these duplicates ? Thank you in advance, Efstathios ADD COMMENTlink modified 6 months ago by Gordon Smyth37k • written 6 months ago by svlachavas650 Answer: Explanation of mroast results for specific biological processes and error with b 1 6 months ago by Gordon Smyth37k Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Gordon Smyth37k wrote: Your questions 1) and 2) are biological questions rather than software questions, so you need to make these decisions yourself according to the scientific aims of your study. However, I would note that the genes associated with each GO Term are not constrained to all change in the same direction. Generally speaking, the GO Term might include genes that are negatively as well as positively associated with the biological process. So the use of Directional p-values might not be entirely appropriate. Regarding 3), I see that barcodeplot() is assuming that fit2 has unique rownames, something that isn't necessarily true. You can get around it by: barcodeplot( as.vector(fit2$t[,2]), index=indices[["GO:0001780"]] )

Dear Gordon,

thank you for your clarification, especially on the first 2 questions. If i could re-formulate my initial questions more appropriatelly, my main goal/concern, if with the aim of significance, i could choose the mixed Pvalues-for these specific terms-instead of the one direction Pvalues, as indeed in many cases in various GOterms, there would be mixed proportions of up-and down DEG in any gene set. Thus, the PValue mixed for the above cases would suffice, even if there is a clear but small trend for the downregulation direction, for example in the GO:0001780 ?

Finally, concerning your suggestion, which worked, i added some small labels, and i would like to ask on this:

barcodeplot( as.vector(fit2\$t[,2]), index=indices[["GO:0001780"]],
labels=c("LOW.UBE2D3","HIGH.UBE2D3"),main=neutro.go[6]) # used the condition factor levels with the same order as for the design matrix and subsequent functions

if i interpret correct the plot, as also based on the initial mroast results, that the genes belonging to this specific GO term-neutrophil homeostasis, tend to be-even moderately- upregulated in the HIGH.UBE2D3 group, in comparison to the "LOW.UBE2D3" group on the left ?