Question

IHW multiple testing correction

0

Entering edit mode

lara.h.urban • 0

@larahurban-11390

Last seen 7.0 years ago

Dear All,

I currently try to use IHW for multiple testing correction of my association study that associates gene expression of various genes with a binary molecular signature. As covariate for adjusting for multiple testing, I take the variance of the expression of each gene across all samples, but as a result the adjusted p-values are the same as after normal Benjamini-Hochberg adjustment, and the weight does not change across strata: https://imgur.com/a/EtNSZm5.

My data does not look like the examples shown in the IHW vignette (see P-values over rank of the covariate: https://imgur.com/a/prGaWnL). However, I would still like to do the same adjustment as applied in the vignette, i.e. get a smaller p-value for those genes with high variance.

I am now not sure what the problem is: Can I not use the variance of gene expression as covariate here? Is there just no difference due to variance (what I would not expect)? Or do I do sth completely wrong?

Here is my code, and you can find the pfins dataset here (every row is one gene): https://www.dropbox.com/s/bs8oehc17qwkbkr/pfins.tsv?dl=0

library("IHW")
pfin = read.table('pfins.tsv', sep='\t', header=TRUE, row.names = 1)
ihwRes <- ihw(p ~ var,  data = pfin, alpha = 0.1)
pfin$padj = adj_pvalues(ihwRes)
pfin$BH = p.adjust(pfin$p, method = "BH")
head(pfin)
plot(ihwRes)
ggplot(aes(x = rank(var), y = -log10(p)), data=pfin) + geom_hex(bins = 100) + ylab(expression(-log[10]~p))

I would appreciate any help a lot, many thanks!

Best regards,

Lara

IHW multiple testing correction covariate distribution • 1.9k views

ADD COMMENT • link updated 7.0 years ago by Wolfgang Huber ★ 13k • written 7.0 years ago by lara.h.urban • 0

0

Entering edit mode

The second link should be https://imgur.com/a/prGaWnL - no right parenthesis.

ADD REPLY • link 7.0 years ago Wolfgang Huber ★ 13k

score 1 · Answer 1 · 2018-11-18

1

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 9 weeks ago

EMBL European Molecular Biology Laborat…

Dear Lara

Thanks for the clear posting (reproducible script with small working example, post of the plots).

Indeed it looks like your variable var is not informative on power or prior probability of the tests that generate your p-values. One can glean this from the plot of -log10(p) over rank(var); another way is looking at the stratified histograms or ECDFs (https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html#stratified-p-value-histograms ). Please see here for more: http://rpubs.com/WolfgangHuber/440682

It looks like var is not a good covariate. If this is surprising, it might be worth to dig into the (pre)processing of the data to make sure it is desirable.

ADD COMMENT • link 7.0 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

---
title: "Lara Urban IHW question 2018-11-18"
author: "Wolfgang Huber"
output:
  BiocStyle::html_document:
    toc_float: TRUE
---

# Setup

```{r global_options, include = FALSE}
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE)
options(error = recover)
```

```{r message = FALSE}
library("IHW")
library("ggplot2")
```

# Lara's post

```{r}
pfin <- read.table('pfins.tsv', sep='\t', header=TRUE, row.names = 1)
ihwRes <- ihw(p ~ var,  data = pfin, alpha = 0.1)
pfin$padj <- adj_pvalues(ihwRes)
pfin$BH <- p.adjust(pfin$p, method = "BH")
head(pfin)
plot(ihwRes)
ggplot(aes(x = rank(var), y = -log10(p)), data=pfin) + geom_hex(bins = 100) + ylab(expression(-log[10]~p))
```

# Histogram of p and Schweder-Spj&#xF8;tvoll plot

```{r}
ggplot(pfin, aes(x = p)) + geom_histogram(binwidth = 0.005, boundary = 0)
```

There seems to be a small enrichment of small p-values; we can also verify this using the Schweder-Spj&#xF8;tvoll plot, see the orange line.

```{r}
metap::schweder(pfin$p, drawline = c("ls", "bh"),
                ls.col = "red", ls.lty = 1, ls.control = list(frac = 1/3),
                bh.col = "blue")
```

Also the BH method indicates that there are a few discoveries to be made, but only with FDR of 40\% or higher.

```{r}
ggplot(dplyr::filter(pfin, BH < 1), aes(x = BH)) +
  geom_histogram(binwidth = 0.01, boundary = 0)
```

# Stratification by var

```{r}
ggplot(pfin, aes(x = var)) + geom_histogram(binwidth = 0.01)
```

```{r}
ggplot(pfin, aes(x = p)) + geom_histogram(binwidth = 0.040, boundary = 0) +
  facet_wrap(~ groups_by_filter(pfin$var, 8), nrow = 2)
```

There seems to be a weak trend that the enrichment of small p-values is strongest for small `var`.
Perhaps this could also be a normalization / data preprocessing artefact?

# Conclusion

`var` is not a strong covariate. You'll have to search for others.

# Session info

```{r sesssion, eval = TRUE}
devtools::session_info()
```

ADD REPLY • link 7.0 years ago Wolfgang Huber ★ 13k