Extract only dataframes from AnnData objects
1
0
Entering edit mode
merv ▴ 120
@mmfansler-13248
Last seen 6 months ago
MSKCC | New York, NY

Is there existing functionality to extract only the dataframes from AnnData HDF5 objects? That is, directly extract the equivalents of colData and rowData from .h5ad files.

The anndata package only seems to load the full object (plus, I very much dislike the fact that it is wrapper for the Python package, i.e., requires a Python installation). The HDF5Array package can import the matrix, as a DelayedArray, but doesn't include the dataframes.

For now, I am using the following code, but would prefer if this functionality was packaged somewhere.

library(tidyverse)
library(rhdf5)

read_ad_df <- function (file, name) {
    x_attrs <- h5readAttributes(file, name)

    ## check requested entry is a dataframe
    ## TODO: do we need to check encoding-version?
    stopifnot(x_attrs[['encoding-type']] == "dataframe")

    ## rownames and columns in order
    idx_cols <- unlist(x_attrs[c("_index", "column-order")], use.names=FALSE)

    ## load the factor levels
    x_levels <- h5read(file, str_c(name, "/__categories"))

    ## load dataframe
    h5read(file, name)[idx_cols] %>% as_tibble() %>%
        ## replace categorical columns with proper factors
        mutate(across(any_of(names(x_levels)), ~ factor(x_levels[[cur_column()]][.x+1L])))
}

where read_ad_df(FILE, "/obs") retrieves colData and read_ad_df(FILE, "/var") retrieves rowData.

rhdf5 HDF5Array anndata SingleCellExperiment • 1.1k views
ADD COMMENT
1
Entering edit mode
Peter Hickey ▴ 740
@petehaitch
Last seen 7 days ago
WEHI, Melbourne, Australia

For now, I am using the following code, but would prefer if this functionality was packaged somewhere.

Have you considered creating a pull request to zellkonverter (https://github.com/theislab/zellkonverter/pulls)? I expect the authors would be interested to receive and review such a request that implements a tested and documented version of this functionality.

The anndata package only seems to load the full object (plus, I very much dislike the fact that it is wrapper for the Python package, i.e., requires a Python installation).

A side note: have you seen the R-based reader available with zellkonverter::readH5AD(reader = "R")? As documented, this is experimental/under-development, but it does not require a Python installation.

ADD COMMENT

Login before adding your answer.

Traffic: 642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6