rownames for categoryReshape
1
0
Entering edit mode
@alisonsarawaller-7103
Last seen 5.2 years ago
Germany

Dear All, I am trying to make incidence matrices (where each row represents a pathway, and each column a gene) and there is a 1 if that gene is in the pathways, otherwise it is a zero.

I am using the function categoryReshape from the package plotrix as follows:

df_incid<-t(categoryReshape(df))
>head(df)
gene_name    pwy
gene1        pwy1
gene2        pwy2

In the resulting incidence matrix (df_incid) I can see that the rows are labelled with the pathway names, however there are no column names.

I know these names are from the first column in my original dataframe, but I don't know if they are sorted in any way.

the help in the user manual is quite limited for this function.

Any help, from anyone who has used this is appreciated,

thanks

 

 

 

 

 

Category • 1.0k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

I'm not sure about a function, but here's some data:

df <- data.frame(
    Gene=sample(letters, 100, TRUE),
    Pathway=sample(LETTERS[1:10], 100, TRUE),
    stringsAsFactors=FALSE)

From this I figured out the set of genes and pathways

genes <- sort(unique(df$Gene))
pathways <- sort(unique(df$Pathway))

Then created an incidence matrix of the appropriate dimensions and dimnames

m <- matrix(FALSE, length(pathways), length(genes),
            dimnames=list(pathways, genes))

And finally used the fact that a matrix can be subset by another matrix to access row,column pairs

m[cbind(df$Pathway, df$Gene)] <- TRUE

A complete helper function is

incidence <-
    function(Gene, Pathway)
{
    stopifnot(length(Gene) == length(Pathway))
    genes <- sort(unique(Gene))
    pathways <- sort(unique(Pathway))
    m <- matrix(FALSE, length(pathways), length(genes),
                dimnames=list(pathways, genes))
    idx <- cbind(as.character(Pathway), as.character(Gene))
    m[idx] <- TRUE
    m
}

invoked as incidence(df$Gene, df$Pathway).

 

ADD COMMENT
0
Entering edit mode

Thanks, this is great, and much faster than the categoryReshape function in plotrix.

And incase anyone was wondering, by comparing the results I can see that the column names from categoryReshape, are in the order of

sort(unique(df$Gene)).
ADD REPLY

Login before adding your answer.

Traffic: 868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6