Question: IHW multiple testing correction
0
gravatar for lara.h.urban
7 months ago by
lara.h.urban0 wrote:

Dear All, 

I currently try to use IHW for multiple testing correction of my association study that associates gene expression of various genes with a binary molecular signature. As covariate for adjusting for multiple testing, I take the variance of the expression of each gene across all samples, but as a result the adjusted p-values are the same as after normal Benjamini-Hochberg adjustment, and the weight does not change across strata: https://imgur.com/a/EtNSZm5.

My data does not look like the examples shown in the IHW vignette (see P-values over rank of the covariate: https://imgur.com/a/prGaWnL). However, I would still like to do the same adjustment as applied in the vignette, i.e. get a smaller p-value for those genes with high variance.

I am now not sure what the problem is: Can I not use the variance of gene expression as covariate here? Is there just no difference due to variance (what I would not expect)? Or do I do sth completely wrong? 

Here is my code, and you can find the pfins dataset here (every row is one gene): https://www.dropbox.com/s/bs8oehc17qwkbkr/pfins.tsv?dl=0

library("IHW")
pfin = read.table('pfins.tsv', sep='\t', header=TRUE, row.names = 1)
ihwRes <- ihw(p ~ var,  data = pfin, alpha = 0.1)
pfin$padj = adj_pvalues(ihwRes)
pfin$BH = p.adjust(pfin$p, method = "BH")
head(pfin)
plot(ihwRes)
ggplot(aes(x = rank(var), y = -log10(p)), data=pfin) + geom_hex(bins = 100) + ylab(expression(-log[10]~p))

I would appreciate any help a lot, many thanks!

Best regards,

Lara

 

ADD COMMENTlink modified 6 months ago by Wolfgang Huber13k • written 7 months ago by lara.h.urban0

The second link should be https://imgur.com/a/prGaWnL - no right parenthesis.

ADD REPLYlink written 6 months ago by Wolfgang Huber13k
Answer: IHW multiple testing correction
1
gravatar for Wolfgang Huber
6 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Dear Lara

Thanks for the clear posting (reproducible script with small working example, post of the plots).

Indeed it looks like your variable var is not informative on power or prior probability of the tests that generate your p-values. One can glean this from the plot of -log10(p) over rank(var); another way is looking at the stratified histograms or ECDFs (https://bioconductor.org/packages/release/bioc/vignettes/IHW/inst/doc/introduction_to_ihw.html#stratified-p-value-histograms ). Please see here for more: http://rpubs.com/WolfgangHuber/440682

It looks like var is not a good covariate. If this is surprising, it might be worth to dig into the (pre)processing of the data to make sure it is desirable.

 

 

ADD COMMENTlink written 6 months ago by Wolfgang Huber13k
---
title: "Lara Urban IHW question 2018-11-18"
author: "Wolfgang Huber"
output:
  BiocStyle::html_document:
    toc_float: TRUE
---

# Setup

```{r global_options, include = FALSE}
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE)
options(error = recover)
```

```{r message = FALSE}
library("IHW")
library("ggplot2")
```

# Lara's post

```{r}
pfin <- read.table('pfins.tsv', sep='\t', header=TRUE, row.names = 1)
ihwRes <- ihw(p ~ var,  data = pfin, alpha = 0.1)
pfin$padj <- adj_pvalues(ihwRes)
pfin$BH <- p.adjust(pfin$p, method = "BH")
head(pfin)
plot(ihwRes)
ggplot(aes(x = rank(var), y = -log10(p)), data=pfin) + geom_hex(bins = 100) + ylab(expression(-log[10]~p))
```

# Histogram of p and Schweder-Spj&#xF8;tvoll plot

```{r}
ggplot(pfin, aes(x = p)) + geom_histogram(binwidth = 0.005, boundary = 0)
```

There seems to be a small enrichment of small p-values; we can also verify this using the Schweder-Spj&#xF8;tvoll plot, see the orange line.

```{r}
metap::schweder(pfin$p, drawline = c("ls", "bh"),
                ls.col = "red", ls.lty = 1, ls.control = list(frac = 1/3),
                bh.col = "blue")
```

Also the BH method indicates that there are a few discoveries to be made, but only with FDR of 40\% or higher.

```{r}
ggplot(dplyr::filter(pfin, BH < 1), aes(x = BH)) +
  geom_histogram(binwidth = 0.01, boundary = 0)
```

# Stratification by var

```{r}
ggplot(pfin, aes(x = var)) + geom_histogram(binwidth = 0.01)
```

```{r}
ggplot(pfin, aes(x = p)) + geom_histogram(binwidth = 0.040, boundary = 0) +
  facet_wrap(~ groups_by_filter(pfin$var, 8), nrow = 2)
```

There seems to be a weak trend that the enrichment of small p-values is strongest for small `var`.
Perhaps this could also be a normalization / data preprocessing artefact?

# Conclusion

`var` is not a strong covariate. You'll have to search for others.

# Session info

```{r sesssion, eval = TRUE}
devtools::session_info()
```

 
ADD REPLYlink modified 6 months ago • written 6 months ago by Wolfgang Huber13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour