Question

rownames for categoryReshape

0

Entering edit mode

alison.sara.waller • 0

@alisonsarawaller-7103

Last seen 7.1 years ago

Germany

Dear All, I am trying to make incidence matrices (where each row represents a pathway, and each column a gene) and there is a 1 if that gene is in the pathways, otherwise it is a zero.

I am using the function categoryReshape from the package plotrix as follows:

df_incid<-t(categoryReshape(df))
>head(df)
gene_name    pwy
gene1        pwy1
gene2        pwy2

In the resulting incidence matrix (df_incid) I can see that the rows are labelled with the pathway names, however there are no column names.

I know these names are from the first column in my original dataframe, but I don't know if they are sorted in any way.

the help in the user manual is quite limited for this function.

Any help, from anyone who has used this is appreciated,

thanks

Category • 1.5k views

ADD COMMENT • link updated 11.2 years ago by Martin Morgan 25k • written 11.2 years ago by alison.sara.waller • 0

score 2 · Accepted Answer · 2014-11-27

I'm not sure about a function, but here's some data:

df <- data.frame(
    Gene=sample(letters, 100, TRUE),
    Pathway=sample(LETTERS[1:10], 100, TRUE),
    stringsAsFactors=FALSE)

From this I figured out the set of genes and pathways

genes <- sort(unique(df$Gene))
pathways <- sort(unique(df$Pathway))

Then created an incidence matrix of the appropriate dimensions and dimnames

m <- matrix(FALSE, length(pathways), length(genes),
            dimnames=list(pathways, genes))

And finally used the fact that a matrix can be subset by another matrix to access row,column pairs

m[cbind(df$Pathway, df$Gene)] <- TRUE

A complete helper function is

incidence <-
    function(Gene, Pathway)
{
    stopifnot(length(Gene) == length(Pathway))
    genes <- sort(unique(Gene))
    pathways <- sort(unique(Pathway))
    m <- matrix(FALSE, length(pathways), length(genes),
                dimnames=list(pathways, genes))
    idx <- cbind(as.character(Pathway), as.character(Gene))
    m[idx] <- TRUE
    m
}

invoked as incidence(df$Gene, df$Pathway).