Question: SPIA error - names of de are not in reference array
gravatar for  A Eelmur
4.8 years ago by
A Eelmur0
United States
A Eelmur0 wrote:

I'm trying to use the SPIA package in R  to complete a pathway analysis of differentially expressed genes that considers fold change and ORA. I'm new to R generally, and continually run into an error that states my de names are not in the reference array (exact wording below) when I run the analysis. 

I've done many checks, my data is clear of duplicate entries, and NAs. The entrezgene IDs are valid. In fact, I can run the analysis on Graphite's online version of SPIA using the "all_genes" and "sig_genes" vectors/files listed below without any problem. I'd like to use R for the analysis, though as it gives better graphing options. 

I think the problem has something to do with my creation of the named vector sig_genes(below), which contains the log2FoldChange values named with Entrezgene IDs. 

I've searched around and can't seem to find a solution online that fixes the problem. 

I've posted the columns of my original data frame (which is very large), my code and sessioninfo() below

Does anyone see an error in my code  or session info () that might be leading to this problem?


A. Eelmur


My original data frame (DF) contains 12000+ rows (not sure if a sample would, therefore, be useful here), but columns names are as so


baseMean   log2FoldChange     lfcSE     stat        pvalue   padj   ensembl_gene_id

 Here is the code and error.  


> # read in the data

DF = read.delim("DF.txt", header=T, row.names=1)

>#define mart

ensembl <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")

># get entrez gene IDs for the DF file
getgenes <- getBM(attributes = c("ensembl_gene_id", "entrezgene"), filters = "ensembl_gene_id", values = DF$ensembl_gene_id, mart = ensembl, uniqueRows=FALSE)

>#remove duplicates
getgenes2 = getgenes[!duplicated(getgenes[,1]),]

>#join entrez genes and DE data frame by ensemble gene ID

DF_with_EntrezID= join(DF, getgenes2, type = "left")

>#subset DF_with_EntrezID to just Entrez gene ID, log2 Fold Change and the padj columns and omit any NAs

DF_with_EntrezID_subset <- na.omit(subset(DF_with_EntrezID, select=c(entrezgene, log2FoldChange, padj))

>#make a vector of log2FoldChange values with a padj <0.1

sig_genes <- subset(DF_with_EntrezID_subset, padj<0.1)$log2FoldChange

>#change sig_genes to numeric

sig_genes <- as.numeric(sig_genes)

>#make a vector of Entregene IDs for DE genes with a padj <0.1

n <- subset(DF_with_EntrezID_subset, padj<0.1)$entrezgene

># change n to character

n <- as.character(n)

>#create named vector of log2FoldChange values with Entrezgene ID names

names(sig_genes)<- n

>#make reference array based on all Entrezgene IDs

all_genes <- as.character(DF_with_EntrezID_subsetentrezgene)

>res=spia(de=sig_genes, all=all_genes, organism="hsa",plots=TRUE)

Error in spia(de = DE, all = all_genes, organism = "hsa", plots = TRUE) : 
  de must be a vector of log2 fold changes. The names of de should be included in the refference array!


Here is my sessioninfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] plyr_1.8.1          SPIA_2.16.0         KEGGgraph_1.22.1    graph_1.42.0        XML_3.98-1.1        limma_3.20.9       
 [7] GEOquery_2.30.1     Biobase_2.24.0      BiocGenerics_0.10.0 biomaRt_2.20.0     

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.26.1 DBI_0.3.1            GenomeInfoDb_1.0.2   IRanges_1.22.10      Rcpp_0.11.3         
[6] RCurl_1.95-4.3       RSQLite_1.0.0        stats4_3.1.1         tools_3.1.1         






ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by A Eelmur0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 251 users visited in the last hour