Question

Explanation of mroast results for specific biological processes and error with barcodeplot function with a microarray dataset

0

Entering edit mode

svlachavas ▴ 830

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Community, based on a description on a previous post (C: Perform limma based gene-set testing for a two-group comparison in a microarray ) i tried to implement mroast gene set testing in limma with specific Gene Ontology Biological Processes:

# my expression microarray dataset:

 class(final)
[1] "EList"
attr(,"package")
[1] "limma"

head(final$E)
     US84600244_253949426815_S01_GE1_107_Sep09_1_4
IRX1                                      4.979257
SAA1                                      7.548621
H19                                      13.150892
MBP                                       8.240486
SAA2                                      6.692976
CHGA                                      7.527782....

Symbols <- rownames(final)
dat <- alias2SymbolUsingNCBI(Symbols, "Homo_sapiens.gene_info")

final$genes$EntrezID <- dat$GeneID

final.2 <- final[!is.na(final$genes$EntrezID),]

rownames(final.2) <- as.character(final.2$genes$EntrezID)

condition <- factor(final.2$targets$UBE2D3.group,
levels = c("LOW.UBE2D3","HIGH.UBE2D3"))

design <- model.matrix(~condition)

fit <- lmFit(final.2,design)

fit2 <- eBayes(fit, trend=TRUE)

neutro.go <- c("GO:0030851", "GO:0070488", "GO:0097350",
"GO:0001788","GO:0001781","GO:0001780","GO:0001802","GO:0045728") # selected BPs associated with neutrophils

term <- select(GO.db, keys=neutro.go, columns="TERM")

Rkeys(org.Hs.egGO2ALLEGS) <- neutro.go

neutro.go.genes <- as.list(org.Hs.egGO2ALLEGS)

indices <- ids2indices(neutro.go.genes, rownames(final.2))

res <- mroast(final.2, indices, design, contrast=2)

res
           NGenes  PropDown     PropUp Direction PValue    FDR PValue.Mixed
GO:0045728      2 1.0000000 0.00000000      Down  0.001 0.0040        0.001
GO:0001788      1 1.0000000 0.00000000      Down  0.106 0.4220        0.106
GO:0030851     28 0.2857143 0.21428571      Down  0.427 0.7105        0.001
GO:0070488      2 0.5000000 0.00000000      Down  0.431 0.7105        0.431
GO:0001802      2 0.5000000 0.00000000      Down  0.485 0.7105        0.318
GO:0001780     14 0.2857143 0.07142857      Down  0.672 0.7105        0.021
GO:0001781      7 0.4285714 0.14285714      Down  0.685 0.7105        0.003
GO:0097350      5 0.0000000 0.00000000        Up  0.711 0.7110        0.608
             FDR.Mixed
GO:0045728 0.002000000
GO:0001788 0.168800000
GO:0030851 0.002000000
GO:0070488 0.492000000
GO:0001802 0.423333333
GO:0001780 0.041000000
GO:0001781 0.006666667
GO:0097350 0.608000000

1) My first question is regarding the interrogation of the mroast testing for these specific GO terms:

as the columns PValue and PValueMixed are appropriate for different questions, and my many concern is, if any of these GO terms are actually DE, i could focus on GOs above like GO:0030851 and GO:0001780 ? which, even that do not show a significant Pvalue, illustrate a significant PValueMixed ? which also are annotated with not a very small number of genes in comparison to the other selected terms ? and essentially justify, that indeed some of the neutrophil processes are significantly pertubed concerning my phenotype of interest ?

2) For subsequent utilization of the barcode function: as i would to create a barcode plot, similar to the plot created in the FRY section-

http://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html#fry-gene-set-tests

i should choose from above, a GOterm, that has a more direct proportion/direction of regulation ? like the GO:0001780 ? in order also to illustrate the changes in my 2 conditions used in my design matrix (low and high expressing UBE2D3 groups)?

3) Regarding the actual visualization function:

barcodeplot( fit2$t[,2], index=indices[["GO:0001780"]] )
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘100506380’, ‘143872’, ‘149775’, ‘161176’, ‘249’, ‘3373’, ‘341019’, ‘387707’, ‘4253’, ‘4584’, ‘51271’, ‘51614’, ‘57585’, ‘79813’, ‘84458’, ‘84631’, ‘8553’

How can i fix this issue ? i should remove prior the mroast function these duplicates ?

Thank you in advance,

Efstathios

limma mroast agilent microarrays GO.BP barcodeplot • 1.3k views

ADD COMMENT • link updated 5.5 years ago by Gordon Smyth 50k • written 5.5 years ago by svlachavas ▴ 830

score 1 · Answer 1 · 2018-10-14

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 22 minutes ago

WEHI, Melbourne, Australia

Your questions 1) and 2) are biological questions rather than software questions, so you need to make these decisions yourself according to the scientific aims of your study. However, I would note that the genes associated with each GO Term are not constrained to all change in the same direction. Generally speaking, the GO Term might include genes that are negatively as well as positively associated with the biological process. So the use of Directional p-values might not be entirely appropriate.

Regarding 3), I see that barcodeplot() is assuming that fit2 has unique rownames, something that isn't necessarily true. You can get around it by:

barcodeplot( as.vector(fit2$t[,2]), index=indices[["GO:0001780"]] )

ADD COMMENT • link 5.5 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon,

thank you for your clarification, especially on the first 2 questions. If i could re-formulate my initial questions more appropriatelly, my main goal/concern, if with the aim of significance, i could choose the mixed Pvalues-for these specific terms-instead of the one direction Pvalues, as indeed in many cases in various GOterms, there would be mixed proportions of up-and down DEG in any gene set. Thus, the PValue mixed for the above cases would suffice, even if there is a clear but small trend for the downregulation direction, for example in the GO:0001780 ?

Finally, concerning your suggestion, which worked, i added some small labels, and i would like to ask on this:

barcodeplot( as.vector(fit2$t[,2]), index=indices[["GO:0001780"]],
labels=c("LOW.UBE2D3","HIGH.UBE2D3"),main=neutro.go[6]) # used the condition factor levels with the same order as for the design matrix and subsequent functions

based on the following link:

https://www.dropbox.com/s/t379eeostepgtn0/GO.0001780.example.mroast.tiff?dl=0

if i interpret correct the plot, as also based on the initial mroast results, that the genes belonging to this specific GO term-neutrophil homeostasis, tend to be-even moderately- upregulated in the HIGH.UBE2D3 group, in comparison to the "LOW.UBE2D3" group on the left ?

ADD REPLY • link 5.5 years ago svlachavas ▴ 830