IHW: retain annotation columns in data when running `ihw(pvalue ~ covariate, data = ...)`
0
0
Entering edit mode
@davidlaehnemann-23351
Last seen 4.0 years ago

This is not an error, but more of a feature request:

We are using independent hypothesis weighting (IHW) in a differential expression pipeline. To be able to interpret results, we need the gene names corresponding to the weighted p-values in the results -- and I assume this is what you would usually want, or am I missing some important point here?

However, from what we can see in the documentation and the code, the ihwResult-class provides the original pvalues, weights and weighted pvalues, but not any further annotation that came with it. Optimally, using the formula type of calling ihw() we would use:

ihw_results <- ihw(pvalue ~ covariate, data = per_gene_data)

Where per_gene_data is a data frame that contains the columns pvalue and covariate along with further annotation, e.g. the gene names associated with a pair of pvalue and covariate, so e.g. something like:

ens_gene   pvalue   covariate
ENS1234    0.0123   3.45678
ENS2345    0.0234   4.56789

For printing out the results into a table (TSV in our case), we turn ihw_results into a data frame with as.data.frame(ihw_results). However, this does not contain the ens_gene column any more, even though this is essential for the interpretation of the resulting weighted pvalues. We can of course add that info back in manually (and currently do using the dplyr _join methods on the pvalue, the covariate and the grouping assigned by IHW), but this feels patchy and hacky. We would prefer if ihw() itself did not drop any columns from per_gene_data, but just kept all of its columns along for the ride and associated to the right values, including when turning the IHWresults into a data frame.

Am I making sense? Please do not hesitate to check back for further explanations on any of this.

(For cross-reference, I initially posted this as an issue in the IHW GitHub code repository.)

ihw independent hypothesis weighting • 935 views
ADD COMMENT
1
Entering edit mode

Dear David,

Thank you! This sounds like a reasonable request, and I'll discuss with Nikos if and how this can be implemented without breaking anything.

Please note that a better way to align the output with the input data is not join, but simply cbind(per_gene_data, as.data.frame(ihw_results)), since the function respects the shape and order of the input.

The reason for this current interface is that we wanted the use of the ihw function to be as similar as possible to stats::p.adjust.

ADD REPLY
1
Entering edit mode

Dear Wolfgang,

thanks for considering this and thanks for the alternative to align input and output. We did notice that the order stayed the same but weren't sure if this is a guaranteed feature or might go away in the future. So we went for the following, which should be just as safe (and might be of interest for other users, unless you see any of it to be unsafe;).

We have the input data with a grouping already done with a groupings = groups_by_filter() column, so a table gene_data like this:

ens_gene   pval     mean_obs   grouping
ENS1234    0.0123   3.45678    1
ENS2345    0.0234   4.56789    2

As the ihw() output will contain all the columns (although with slightly different names) except for ens_gene we can match on their values to accurately put back the ens_gene values:

ihw_res <- ihw(pvalue ~ covariate, data = gene_data)

ihw_res_annotated <- as.data.frame(ihw_res) %>% 
  right_join(gene_data, by = c(pvalue = "pval", covariate = "mean_obs", group = "grouping")) %>%
  # needed, because occasionally different genes in the same group have exactly the same
  # pval and mean_obs, creating identical duplicate entries
  unique()

In any case, looking forward to a version, where we can simply print out the ihw() output as a data.frame!

ADD REPLY
1
Entering edit mode

We did notice that the order stayed the same but weren't sure if this is a guaranteed feature or might go away in the future.

Yes, this is guaranteed. I see nothing wrong with your approach either, except that it is more complicated (in terms of code and CPU work).

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6