Hello,
I have run the viper workflow using the test data-set, but I'm having trouble using my own data using a regulatory network generated with ARACNe-AP. Unfortunately, ARACNe-AP no longer seems to output adj files as used in the viper vignette. Instead I have a text file that represents a consolidated ARACNe-AP network based on 100 bootstraps: Regulator Target MI pvalue 1 RNF11 FAM177A1 0.4596049 0.0000000000 2 SERPINE2 SERPING1 0.5120350 0.0000000000 3 GPRC5A RRAD 0.6129169 0.0000000000
But, I think there might be a formatting issue that prevents matching between the expression data and the ARACNe network to generate the regulon object. Would you have advice as to how I can resolve this issue?
Load library
load(viper)
create expression eSet
exprs<-as.matrix(read.table("RICHSnormbatch.txt",header=TRUE,sep="\t",row.names=1,as.is=TRUE,check.names=F)) head(exprs[1:5,1:5]) pData<-read.csv("../../Covariates.csv",row.names=1,header=TRUE) phenoData<-new("AnnotatedDataFrame",data=pData) dset<-ExpressionSet(assayData=exprs, phenoData=phenoData, annotation="hg19") dset ExpressionSet (storageMode: lockedEnvironment) assayData: 12135 features, 200 samples element names: exprs protocolData: none phenoData sampleNames: 21001 21004 ... 22078 (200 total) varLabels: ID idx ... U_MoBT (53 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hg19
set location of ARACNe network
adjfile<-"viper/RICHS_ARACNe.txt" regul <- aracne2regulon(adjfile, dset,format="3col",verbose = TRUE) Loading the dataset... Generating the regulon objects... Error in tapply(1:nrow(tmp), tmp$tf, function(pos, tmp) { : arguments must have same length
Looking at source code, the network text file may not be in correct format
adjfile<-"viper/RICHS_ARACNe.txt" tmp <- t(sapply(strsplit(readLines(adjfile), "\t"), function(x) x[1:3])) head(tmp) [,1] [,2] [,3]
[1,] "\"RNF11\"" "\"FAM177A1\"" "0.45960485911494" [2,] "\"SERPINE2\"" "\"SERPING1\"" "0.512034971750754" [3,] "\"GPRC5A\"" "\"RRAD\"" "0.612916935813727"aracne <- data.frame(tf = tmp[, 1], target = tmp[, 2], mi = as.numeric(tmp[, 3])/max(as.numeric(tmp[, 3]))) head(aracne) tf target mi 1 "RNF11" "FAM177A1" 0.3013416 2 "SERPINE2" "SERPING1" 0.3357176 3 "GPRC5A" "RRAD" 0.4018612
tmp <- aracne[!is.na(aracne$mi), ] head(aracne) tf target mi 1 "RNF11" "FAM177A1" 0.3013416 2 "SERPINE2" "SERPING1" 0.3357176 3 "GPRC5A" "RRAD" 0.4018612
str(rownames(exprs)) chr [1:12135] "NOC2L" "KLHL17" "HES4" "ISG15" "AGRN" str(tmp) data.frame': 43095 obs. of 3 variables: $ tf : Factor w/ 179 levels "\"ABHD12\"","\"AFG3L1P\"",..: 133 145 62 44 60 169 75 79 86 172 ... $ target: Factor w/ 11522 levels "\"A2LD1\"","\"A2M\"",..: 3372 8671 8373 11374 1817 10933 9535 7678 9794 575 ... $ mi : num 0.301 0.336 0.402 0.177 0.212 ...
tmp <- tmp[rowSums(matrix(as.matrix(tmp[, 1:2]) %in% rownames(exprs), nrow(tmp), 2)) == 2, ]
tmp [1] tf target mi
<0 rows> (or 0-length row.names)
sessionInfo
sessionInfo()
R version 3.5.1 (2018-07-02) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252
[3] LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C
[5] LCTIME=English_United States.1252
attached base packages:
[1] grid parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] viper1.16.0 Biobase2.42.0 Rgraphviz2.26.0 graph1.60.0
[5] BiocGenerics0.28.0 tidyr0.8.3 dplyr0.8.0.1 minet3.40.0
loaded via a namespace (and not attached):
[1] fansi0.4.0 splines3.5.1 R62.4.0 assertthat0.2.0
[5] utf81.1.4 e10711.7-0.1 knitr1.21 survival2.42-3
[9] cli1.0.1 tidyselect0.2.5 pillar1.3.1 segmented1.0-0
[13] compiler3.5.1 tibble2.0.1 lattice0.20-35 pkgconfig2.0.2
[17] Matrix1.2-14 purrr0.3.1 KernSmooth2.23-15 rstudioapi0.9.0
[21] MASS7.3-50 glue1.3.0 xfun0.5 stats43.5.1
[25] BiocManager1.30.4 magrittr1.5 rlang0.3.1 yaml2.2.0
[29] tools3.5.1 mixtools1.1.0 crayon1.3.4 class7.3-15
[33] Rcpp_1.0.0
A little late, but in case it helps anyone else -- I had to do three things to use output from ARACNe-AP with
aracne2regulon
: 1. drop the p-value column, 2. remove the header, and, 3. setformat="3col"
in the call toaracne2regulon()
..Hello Keith,
I am currently trying to use my ARACNE-AP generated networks in Viper. I did all three steps you mentioned above but my problem lies in the length of the network and the gene expression matrix. When I try to use the gene expression matrix as is and give the 3col aracne-ap output I get the following error:
I can subset the gene expression matrix according to regulators in the network but I don't think that is very reasonable. Since it wouldn't be the same gene expression matrix that I fed into ARACNE-AP. I was wondering if you came across the same problem or not and if so did you subset the gene expression matrix?
Thank you in advance!
Hi, I'm afraid not, so I don't think I can be much help, unfortunately. In the past, the authors of ARACNe-AP were generally quite responsive and helpful though, so you might consider reaching out to them directly. Best of luck!
Thank you!
Hi Luna_P, I think I am in a very similar situation. I get the same error message and modified my network file as mentioned above but I still get the "arguments must have same length" message. Did you come up with a solution by any chance ? Many thanks Best Mika
As @Keith Hughitt mentioned, the ARACNe network.txt file output has to be pre-processed before running
aracne2regulon()
function:format="3col"
in the call toaracne2regulon()
Removing the index solved the
arguments must have same length
errorHope it helps!
Theo
You are right!And may be we should check the gene symbols cause when I open the expression matrix in EXCEL,it will change some genes into months ,like SEP,Mar,etc
Hi, do you mind sharing the line of code you used to to remove the index from the network.txt file? Thanks!
You are probably not interested anymore, but it might help somebody else.
If you run 'awk 'NR>1' network.txt > tmp && mv tmp network.txt' after getting the network file, it works fine.
I was removing the last column like some suggested, but that caused the file to switch from tab separated to space separated. I checked the viper code, and indeed it 'strsplits' the file with tab and it doesn't look past the third column, so this step is unnecessary.
Had a similar issue generating the regulon object using the ARACNe network.txt output file and the same expression matrix used to make the network file. The error message I ran into was "Loading the dataset... Error in readLines(afile) : 'con' is not a connection". I solved this by specifying the path to the network.txt file as a string and then using that string in aracne2regulon(). I also pre-processed the network.txt file as suggested by @Keith Hughitt and @Theo. A warning message appeared, "In readLines(afile): incomplete final line found on "network.txt", which decreases the number of iterations when generating the regulon object. I saved the pre-processed network.txt file using the write_delim() function from the readr package. For those who may still have or come across this issue, here's what I did to fix the problem.
count.mt <- read.table(file = "/path/expression_counts.txt", sep = "\t", header = T, quote = "")
count.mt2 <- count.mt %>% column_to_rownames(var = "gene") %>% as.matrix()
net <- read.table(file = "network.txt", sep = "\t", header = T, quote = "")
net2 <- net %>% dplyr::select(! pvalue)
write_delim(net2, file = "network2.txt", na = "NA", append = F, col_names = F, delim = "\t", quote = "none", escape = "none", eol = "\n")
net3 <- "/path/ARACNe-AP/output/network2.txt"
regulons <- aracne2regulon(afile = net3, eset = count.mt2, format = "3col")