How to fix 'long vectors not supported yet' error in R
2
0
Entering edit mode
r.tor • 0
@rtor-19887
Last seen 5.8 years ago

Hello. I have asked this question in Stack Overflow but the problem is still unsolved. I'm trying to run some R code and it is crashing because of long vector error. I'm running R 3.5.1 on Ubuntu 18.04.1 LTS and getting the following error:

"Error in for (n in 1:k) { : long vectors not supported yet: eval.c:6393"

The input FASTA file size is around 1 GB. The error appears right after running the for loop. I tried to make the input file smaller, but seems this was not the case and I guess could be more related to the used packages. The code that creates the troubles is the following:

library(biomaRt) #version 2.36.1
library(biomartr)#version 0.8.0
library(R.utils) #version 2.7.0
library(seqinr)  #version 3.4.5

genmt <- read.fasta("genymt.fa")

gensize1 <- 16900
subsize1 <- 22*2

BinToDec <- function(x) 
sum(2^(which(rev(unlist(strsplit(x, "")) == 1))-1))

DecToBin <- function(x)
{
 b <- intToBin(x)
 while(nchar(b) < subsize1)
  b <- paste("0",b,sep = "")
b
  }

bin1 <- gsub('A','00',genmt)
bin1 <- gsub('T','01',bin1)
bin1 <- gsub('C','10',bin1)
bin1 <- gsub('G','11',bin1)

for (i in 1:((gensize1*2)-subsize1)) {

 print(i)
 beg1 <- i
 end1 <- i+(subsize1-1)

 sub1 <- substr(bin1, beg1, end1)

 dec1 <- BinToDec(sub1)

 if (i == 1) {
exists1 <- dec1
rep1 <- 1
 } else {
   flag1 <- any(exists1 == dec1)

   if (flag1) {
  ind1 <- which(exists1 == dec1)
  rep1[ind1] <- rep1[ind1]+1
   } else {
   exists1 <- c(exists1,dec1)
   rep1 <- c(rep1,1)
   }
 }
}

dec_res <- -1
k <- 2^subsize1
for (n in 1:k) {
 print(n)
 flag1 <- any(exists1 == n)

 if (!flag1) {
   dec_res <- n
   break
 } 
}

  bin_res <- DecToBin(dec_res)

 gen_res <- matrix(,nrow = 0,ncol = subsize1/2)
ind <- 0
for(i in seq(1,subsize1,2)) {
  ind <- ind + 1
  ifelse(substr(bin_res,i,i+1) == "00",gen_res[ind] <- "A",
        ifelse(substr(bin_res,i,i+1) == "01",gen_res[ind] <- "T",
               ifelse(substr(bin_res,i,i+1) == "10",gen_res[ind] <-"C",gen_res[ind] <- "G")))
  }

As it is clear "K" is large, but it regards to the length of the proposed output. Is there any technical point that I missed or any possible alternative solutions available?

r • 15k views
ADD COMMENT
3
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

The error message you are seeing is because you're iterating the for-loop over every entry in a vector that is longer than 2^31-1. You can verify that's where the message comes from with the following example:

for(i in 1:(2^44)) { 
    print(i)
}
Error in for (i in 1:(2^44)) { : 
  long vectors not supported yet: eval.c:6387

You can get round this limitation by using a while-loop e.g.

i <- 1
while(i < 2^44) { 
    print(i)
    i <- i+1 
}

However, I'd suggest this is going to take a really long time (> 20 years) to loop through and print every value, so it's probably best to try and find another strategy to approach whatever it is you're doing.

ADD COMMENT
0
Entering edit mode
Amruta • 0
@43718bdd
Last seen 8 months ago
Germany

Mike Smith I am trying to run this code on my fastq files:

matrixOfQualities <- as(quals,"matrix") rowSums(matrixOfQualities)[1] In my case [1] is 3725. Please see slide 29 in the link below https://rockefelleruniversity.github.io/Bioconductor_Introduction/presentations/slides/FastQInBioconductor.html#29 But I get an Error Error in asMethod(object) : long vectors not supported yet: memory.c:3888

Can you please help me troubleshoot?>

ADD COMMENT
0
Entering edit mode

My goal is to finding reads from fastq files corresponds to particular transcript from fasta file.

ADD REPLY

Login before adding your answer.

Traffic: 628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6