How to fix 'long vectors not supported yet' error in R
1
0
Entering edit mode
r.tor • 0
@rtor-19887
Last seen 5.1 years ago

Hello. I have asked this question in Stack Overflow but the problem is still unsolved. I'm trying to run some R code and it is crashing because of long vector error. I'm running R 3.5.1 on Ubuntu 18.04.1 LTS and getting the following error:

"Error in for (n in 1:k) { : long vectors not supported yet: eval.c:6393"

The input FASTA file size is around 1 GB. The error appears right after running the for loop. I tried to make the input file smaller, but seems this was not the case and I guess could be more related to the used packages. The code that creates the troubles is the following:

library(biomaRt) #version 2.36.1
library(biomartr)#version 0.8.0
library(R.utils) #version 2.7.0
library(seqinr)  #version 3.4.5

genmt <- read.fasta("genymt.fa")

gensize1 <- 16900
subsize1 <- 22*2

BinToDec <- function(x) 
sum(2^(which(rev(unlist(strsplit(x, "")) == 1))-1))

DecToBin <- function(x)
{
 b <- intToBin(x)
 while(nchar(b) < subsize1)
  b <- paste("0",b,sep = "")
b
  }

bin1 <- gsub('A','00',genmt)
bin1 <- gsub('T','01',bin1)
bin1 <- gsub('C','10',bin1)
bin1 <- gsub('G','11',bin1)

for (i in 1:((gensize1*2)-subsize1)) {

 print(i)
 beg1 <- i
 end1 <- i+(subsize1-1)

 sub1 <- substr(bin1, beg1, end1)

 dec1 <- BinToDec(sub1)

 if (i == 1) {
exists1 <- dec1
rep1 <- 1
 } else {
   flag1 <- any(exists1 == dec1)

   if (flag1) {
  ind1 <- which(exists1 == dec1)
  rep1[ind1] <- rep1[ind1]+1
   } else {
   exists1 <- c(exists1,dec1)
   rep1 <- c(rep1,1)
   }
 }
}

dec_res <- -1
k <- 2^subsize1
for (n in 1:k) {
 print(n)
 flag1 <- any(exists1 == n)

 if (!flag1) {
   dec_res <- n
   break
 } 
}

  bin_res <- DecToBin(dec_res)

 gen_res <- matrix(,nrow = 0,ncol = subsize1/2)
ind <- 0
for(i in seq(1,subsize1,2)) {
  ind <- ind + 1
  ifelse(substr(bin_res,i,i+1) == "00",gen_res[ind] <- "A",
        ifelse(substr(bin_res,i,i+1) == "01",gen_res[ind] <- "T",
               ifelse(substr(bin_res,i,i+1) == "10",gen_res[ind] <-"C",gen_res[ind] <- "G")))
  }

As it is clear "K" is large, but it regards to the length of the proposed output. Is there any technical point that I missed or any possible alternative solutions available?

r • 13k views
ADD COMMENT
3
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

The error message you are seeing is because you're iterating the for-loop over every entry in a vector that is longer than 2^31-1. You can verify that's where the message comes from with the following example:

for(i in 1:(2^44)) { 
    print(i)
}
Error in for (i in 1:(2^44)) { : 
  long vectors not supported yet: eval.c:6387

You can get round this limitation by using a while-loop e.g.

i <- 1
while(i < 2^44) { 
    print(i)
    i <- i+1 
}

However, I'd suggest this is going to take a really long time (> 20 years) to loop through and print every value, so it's probably best to try and find another strategy to approach whatever it is you're doing.

ADD COMMENT

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6