Question

Data input for aracne2regulon function in viper package

1

Entering edit mode

maya.kappil ▴ 30

@mayakappil-18569

Last seen 5.6 years ago

Hello,

I have run the viper workflow using the test data-set, but I'm having trouble using my own data using a regulatory network generated with ARACNe-AP. Unfortunately, ARACNe-AP no longer seems to output adj files as used in the viper vignette. Instead I have a text file that represents a consolidated ARACNe-AP network based on 100 bootstraps: Regulator Target MI pvalue 1 RNF11 FAM177A1 0.4596049 0.0000000000 2 SERPINE2 SERPING1 0.5120350 0.0000000000 3 GPRC5A RRAD 0.6129169 0.0000000000

But, I think there might be a formatting issue that prevents matching between the expression data and the ARACNe network to generate the regulon object. Would you have advice as to how I can resolve this issue?

Load library

load(viper)

create expression eSet

exprs<-as.matrix(read.table("RICHSnormbatch.txt",header=TRUE,sep="\t",row.names=1,as.is=TRUE,check.names=F)) head(exprs[1:5,1:5]) pData<-read.csv("../../Covariates.csv",row.names=1,header=TRUE) phenoData<-new("AnnotatedDataFrame",data=pData) dset<-ExpressionSet(assayData=exprs, phenoData=phenoData, annotation="hg19") dset ExpressionSet (storageMode: lockedEnvironment) assayData: 12135 features, 200 samples element names: exprs protocolData: none phenoData sampleNames: 21001 21004 ... 22078 (200 total) varLabels: ID idx ... U_MoBT (53 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hg19

set location of ARACNe network

adjfile<-"viper/RICHS_ARACNe.txt" regul <- aracne2regulon(adjfile, dset,format="3col",verbose = TRUE) Loading the dataset... Generating the regulon objects... Error in tapply(1:nrow(tmp), tmp$tf, function(pos, tmp) { : arguments must have same length

Looking at source code, the network text file may not be in correct format

adjfile<-"viper/RICHS_ARACNe.txt" tmp <- t(sapply(strsplit(readLines(adjfile), "\t"), function(x) x[1:3])) head(tmp) [,1] [,2] [,3]
[1,] "\"RNF11\"" "\"FAM177A1\"" "0.45960485911494" [2,] "\"SERPINE2\"" "\"SERPING1\"" "0.512034971750754" [3,] "\"GPRC5A\"" "\"RRAD\"" "0.612916935813727"

aracne <- data.frame(tf = tmp[, 1], target = tmp[, 2], mi = as.numeric(tmp[, 3])/max(as.numeric(tmp[, 3]))) head(aracne) tf target mi 1 "RNF11" "FAM177A1" 0.3013416 2 "SERPINE2" "SERPING1" 0.3357176 3 "GPRC5A" "RRAD" 0.4018612

tmp <- aracne[!is.na(aracne$mi), ] head(aracne) tf target mi 1 "RNF11" "FAM177A1" 0.3013416 2 "SERPINE2" "SERPING1" 0.3357176 3 "GPRC5A" "RRAD" 0.4018612

str(rownames(exprs)) chr [1:12135] "NOC2L" "KLHL17" "HES4" "ISG15" "AGRN" str(tmp) data.frame': 43095 obs. of 3 variables: $ tf : Factor w/ 179 levels "\"ABHD12\"","\"AFG3L1P\"",..: 133 145 62 44 60 169 75 79 86 172 ... $ target: Factor w/ 11522 levels "\"A2LD1\"","\"A2M\"",..: 3372 8671 8373 11374 1817 10933 9535 7678 9794 575 ... $ mi : num 0.301 0.336 0.402 0.177 0.212 ...

tmp <- tmp[rowSums(matrix(as.matrix(tmp[, 1:2]) %in% rownames(exprs), nrow(tmp), 2)) == 2, ]

tmp [1] tf target mi
<0 rows> (or 0-length row.names)

sessionInfo

sessionInfo()

R version 3.5.1 (2018-07-02) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252
[3] LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C
[5] LCTIME=English_United States.1252

attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] viper1.16.0 Biobase2.42.0 Rgraphviz2.26.0 graph1.60.0
[5] BiocGenerics0.28.0 tidyr0.8.3 dplyr0.8.0.1 minet3.40.0

loaded via a namespace (and not attached): [1] fansi0.4.0 splines3.5.1 R62.4.0 assertthat0.2.0
[5] utf81.1.4 e10711.7-0.1 knitr1.21 survival2.42-3
[9] cli1.0.1 tidyselect0.2.5 pillar1.3.1 segmented1.0-0
[13] compiler3.5.1 tibble2.0.1 lattice0.20-35 pkgconfig2.0.2
[17] Matrix1.2-14 purrr0.3.1 KernSmooth2.23-15 rstudioapi0.9.0
[21] MASS7.3-50 glue1.3.0 xfun0.5 stats43.5.1
[25] BiocManager1.30.4 magrittr1.5 rlang0.3.1 yaml2.2.0
[29] tools3.5.1 mixtools1.1.0 crayon1.3.4 class7.3-15
[33] Rcpp_1.0.0

viper • 4.6k views

ADD COMMENT • link updated 14 months ago by YNPAN910 • 0 • written 5.6 years ago by maya.kappil ▴ 30

1

Entering edit mode

A little late, but in case it helps anyone else -- I had to do three things to use output from ARACNe-AP with aracne2regulon: 1. drop the p-value column, 2. remove the header, and, 3. set format="3col" in the call to aracne2regulon()..

ADD REPLY • link 5.1 years ago Keith Hughitt ▴ 180

0

Entering edit mode

Hello Keith,

I am currently trying to use my ARACNE-AP generated networks in Viper. I did all three steps you mentioned above but my problem lies in the length of the network and the gene expression matrix. When I try to use the gene expression matrix as is and give the 3col aracne-ap output I get the following error:

Error in tapply(1:nrow(tmp), as.vector(tmp$tf), function(pos, tmp) { : arguments must have same length

I can subset the gene expression matrix according to regulators in the network but I don't think that is very reasonable. Since it wouldn't be the same gene expression matrix that I fed into ARACNE-AP. I was wondering if you came across the same problem or not and if so did you subset the gene expression matrix?

Thank you in advance!

ADD REPLY • link 3.8 years ago Luna_P • 0

1

Entering edit mode

Hi, I'm afraid not, so I don't think I can be much help, unfortunately. In the past, the authors of ARACNe-AP were generally quite responsive and helpful though, so you might consider reaching out to them directly. Best of luck!

ADD REPLY • link 3.8 years ago Keith Hughitt ▴ 180

0

Entering edit mode

Thank you!

ADD REPLY • link 3.8 years ago Luna_P • 0

0

Entering edit mode

Hi Luna_P, I think I am in a very similar situation. I get the same error message and modified my network file as mentioned above but I still get the "arguments must have same length" message. Did you come up with a solution by any chance ? Many thanks Best Mika

ADD REPLY • link 3.7 years ago Mikael ▴ 10

0

Entering edit mode

As @Keith Hughitt mentioned, the ARACNe network.txt file output has to be pre-processed before running aracne2regulon() function:

drop the p-value column, 2. remove the header, 3. remove the index, and, 4. set format="3col" in the call to aracne2regulon()

Removing the index solved the arguments must have same length error

Hope it helps!

Theo

ADD REPLY • link 3.7 years ago tb2928 • 0

0

Entering edit mode

You are right!And may be we should check the gene symbols cause when I open the expression matrix in EXCEL,it will change some genes into months ,like SEP,Mar,etc

ADD REPLY • link 2.6 years ago xlal • 0

0

Entering edit mode

Hi, do you mind sharing the line of code you used to to remove the index from the network.txt file? Thanks!

ADD REPLY • link 2.5 years ago emedinacastaned • 0

0

Entering edit mode

You are probably not interested anymore, but it might help somebody else.

If you run 'awk 'NR>1' network.txt > tmp && mv tmp network.txt' after getting the network file, it works fine.

I was removing the last column like some suggested, but that caused the file to switch from tab separated to space separated. I checked the viper code, and indeed it 'strsplits' the file with tab and it doesn't look past the third column, so this step is unnecessary.

ADD REPLY • link 23 months ago sofia.storres • 0

0

Entering edit mode

Had a similar issue generating the regulon object using the ARACNe network.txt output file and the same expression matrix used to make the network file. The error message I ran into was "Loading the dataset... Error in readLines(afile) : 'con' is not a connection". I solved this by specifying the path to the network.txt file as a string and then using that string in aracne2regulon(). I also pre-processed the network.txt file as suggested by @Keith Hughitt and @Theo. A warning message appeared, "In readLines(afile): incomplete final line found on "network.txt", which decreases the number of iterations when generating the regulon object. I saved the pre-processed network.txt file using the write_delim() function from the readr package. For those who may still have or come across this issue, here's what I did to fix the problem.

#Load expression matrix used to make network.txt file in ARACNe-AP

count.mt <- read.table(file = "/path/expression_counts.txt", sep = "\t", header = T, quote = "")

#set gene symbol column as row names

count.mt2 <- count.mt %>% column_to_rownames(var = "gene") %>% as.matrix()

setwd("/path/ARACNe-AP/output/")

#Load network.txt file generated from ARACNe-AP

net <- read.table(file = "network.txt", sep = "\t", header = T, quote = "")

#Remove pvalue column

net2 <- net %>% dplyr::select(! pvalue)

#save with no row names or col names, insert end of line character

write_delim(net2, file = "network2.txt", na = "NA", append = F, col_names = F, delim = "\t", quote = "none", escape = "none", eol = "\n")

#Set connection to pre-processed network file

net3 <- "/path/ARACNe-AP/output/network2.txt"

#Make regulon object

regulons <- aracne2regulon(afile = net3, eset = count.mt2, format = "3col")

ADD REPLY • link 2.5 years ago emedinacastaned • 0

score 0 · Answer 1 · 2021-06-24

0

Entering edit mode

reef103 • 0

@reef103-8824

Last seen 3.7 years ago

United States

Hi Luna_P , I'm sorry I didn;t see this before. The error might be due to a lack of match between the genes in the network (regulators and targets) and the geneIDs in your expression matrix (rownames). Can you please, check whether you are using the same geneIDs in the ARACNe network and expression matrix? Best, Mariano

ADD COMMENT • link 3.7 years ago reef103 • 0

1

Entering edit mode

Hi Mariano, Thank you for your input. I have checked the geneIDs as you mentioned and they are the same. Would you be able to share a reproducible example (matrix and network) that I could run and compare to my input files please ? Many thanks Mikael

ADD REPLY • link 3.7 years ago Mikael ▴ 10

0

Entering edit mode

Hi Mariano,

I'm in the same boat, I'm getting a bunch of errors trying to use an aracne.network provided network with Entrez gene ID expression data and I don't know if it's input file related. The gene ID's are the same but I'm getting the following error:

Error in cor(t(expset[rownames(expset) %in% tf, ]), t(expset[rownames(expset) %in%  : 
  incompatible dimensions

My problem is very similar to those in this question, can you share a reproducible example with me? Preferably from text files as I suspect it's something with my file formatting and indexes getting mixed up.

Thanks! -Adam

ADD REPLY • link 3.2 years ago Adam • 0

score 0 · Answer 2 · 2024-01-05

I was also struggling with the same bug this afternoon.

Loading the dataset...
Generating the regulon objects...
Error in tapply(1:nrow(tmp), as.vector(tmp$tf), function(pos, tmp) { : 
  arguments must have same length

And I finally fixed it up by realizing that the gene expression matrix I fed in had its first column containing rownames. So I executed these:

mice_bulk_exc_lcpm2<- mice_bulk_exc_lcpm2 %>% column_to_rownames(var = 'X')
mice_bulk_exc_lcpm2<- as.matrix(mice_bulk_exc_lcpm2)

It solved the problem. Hope this can help someone who come across the same issue.