This is not an error, but more of a feature request:
We are using independent hypothesis weighting (IHW) in a differential expression pipeline. To be able to interpret results, we need the gene names corresponding to the weighted p-values in the results -- and I assume this is what you would usually want, or am I missing some important point here?
However, from what we can see in the documentation and the code, the ihwResult-class
provides the original pvalues, weights and weighted pvalues, but not any further annotation that came with it. Optimally, using the formula type of calling ihw() we would use:
ihw_results <- ihw(pvalue ~ covariate, data = per_gene_data)
Where per_gene_data
is a data frame that contains the columns pvalue
and covariate
along with further annotation, e.g. the gene names associated with a pair of pvalue and covariate, so e.g. something like:
ens_gene pvalue covariate
ENS1234 0.0123 3.45678
ENS2345 0.0234 4.56789
For printing out the results into a table (TSV in our case), we turn ihw_results
into a data frame with as.data.frame(ihw_results)
. However, this does not contain the ens_gene
column any more, even though this is essential for the interpretation of the resulting weighted pvalues. We can of course add that info back in manually (and currently do using the dplyr
_join
methods on the pvalue
, the covariate
and the grouping
assigned by IHW), but this feels patchy and hacky. We would prefer if ihw()
itself did not drop any columns from per_gene_data
, but just kept all of its columns along for the ride and associated to the right values, including when turning the IHWresults
into a data frame.
Am I making sense? Please do not hesitate to check back for further explanations on any of this.
(For cross-reference, I initially posted this as an issue in the IHW GitHub code repository.)
Dear David,
Thank you! This sounds like a reasonable request, and I'll discuss with Nikos if and how this can be implemented without breaking anything.
Please note that a better way to align the output with the input data is not
join
, but simplycbind(per_gene_data, as.data.frame(ihw_results))
, since the function respects the shape and order of the input.The reason for this current interface is that we wanted the use of the
ihw
function to be as similar as possible tostats::p.adjust
.Dear Wolfgang,
thanks for considering this and thanks for the alternative to align input and output. We did notice that the order stayed the same but weren't sure if this is a guaranteed feature or might go away in the future. So we went for the following, which should be just as safe (and might be of interest for other users, unless you see any of it to be unsafe;).
We have the input data with a grouping already done with a
groupings = groups_by_filter()
column, so a tablegene_data
like this:As the
ihw()
output will contain all the columns (although with slightly different names) except forens_gene
we can match on their values to accurately put back theens_gene
values:In any case, looking forward to a version, where we can simply print out the
ihw()
output as adata.frame
!Yes, this is guaranteed. I see nothing wrong with your approach either, except that it is more complicated (in terms of code and CPU work).