Hello. I have asked this question in Stack Overflow but the problem is still unsolved. I'm trying to run some R code and it is crashing because of long vector error. I'm running R 3.5.1 on Ubuntu 18.04.1 LTS and getting the following error:
"Error in for (n in 1:k) { : long vectors not supported yet: eval.c:6393"
The input FASTA file size is around 1 GB. The error appears right after running the for loop. I tried to make the input file smaller, but seems this was not the case and I guess could be more related to the used packages. The code that creates the troubles is the following:
library(biomaRt) #version 2.36.1
library(biomartr)#version 0.8.0
library(R.utils) #version 2.7.0
library(seqinr) #version 3.4.5
genmt <- read.fasta("genymt.fa")
gensize1 <- 16900
subsize1 <- 22*2
BinToDec <- function(x)
sum(2^(which(rev(unlist(strsplit(x, "")) == 1))-1))
DecToBin <- function(x)
{
b <- intToBin(x)
while(nchar(b) < subsize1)
b <- paste("0",b,sep = "")
b
}
bin1 <- gsub('A','00',genmt)
bin1 <- gsub('T','01',bin1)
bin1 <- gsub('C','10',bin1)
bin1 <- gsub('G','11',bin1)
for (i in 1:((gensize1*2)-subsize1)) {
print(i)
beg1 <- i
end1 <- i+(subsize1-1)
sub1 <- substr(bin1, beg1, end1)
dec1 <- BinToDec(sub1)
if (i == 1) {
exists1 <- dec1
rep1 <- 1
} else {
flag1 <- any(exists1 == dec1)
if (flag1) {
ind1 <- which(exists1 == dec1)
rep1[ind1] <- rep1[ind1]+1
} else {
exists1 <- c(exists1,dec1)
rep1 <- c(rep1,1)
}
}
}
dec_res <- -1
k <- 2^subsize1
for (n in 1:k) {
print(n)
flag1 <- any(exists1 == n)
if (!flag1) {
dec_res <- n
break
}
}
bin_res <- DecToBin(dec_res)
gen_res <- matrix(,nrow = 0,ncol = subsize1/2)
ind <- 0
for(i in seq(1,subsize1,2)) {
ind <- ind + 1
ifelse(substr(bin_res,i,i+1) == "00",gen_res[ind] <- "A",
ifelse(substr(bin_res,i,i+1) == "01",gen_res[ind] <- "T",
ifelse(substr(bin_res,i,i+1) == "10",gen_res[ind] <-"C",gen_res[ind] <- "G")))
}
As it is clear "K" is large, but it regards to the length of the proposed output. Is there any technical point that I missed or any possible alternative solutions available?
My goal is to finding reads from fastq files corresponds to particular transcript from fasta file.