Reading large file (15GB) in R
0
0
Entering edit mode
nabiyogesh ▴ 10
@nabiyogesh-11718
Last seen 6 hours ago
United Kingdom

Hi all,

I am trying to read large file but I am not able read the complete file using fread. I can only read a smaller part of this file using fread.

> mydt10 <- fread("df.num.txt", nrows = 5)
> > dim(mydt10) [1]      5 844488
> > str(mydt10) Classes ‘data.table’ and 'data.frame':  5 obs. of  844488 variables: $ cg14361672      : num  0.974 0.967 0.961 0.96
> 0.963 $ cg12950382      : num  0.718 0.766 0.848 0.723 0.94 $ cg02115394      : num  0.0337 0.0258 0.025 0.0317 0.0357 $ cg12480843 
> : num  0.0182 0.0189 0.0137 0.0167 0.0151 $ cg26724186      : num 
> 0.98 0.977 0.982 0.982 0.978 $ cg00617867      : num  0.96 0.979 0.98 0.977 0.977 $ cg13773083      : num  0.313 0.246 0.253 0.234 0.372 $ cg17236668      : num  0.974 0.975 0.975 0.979 0.978 $ cg19607165     
> : num  0.0866 0.0966 0.0804 0.1162 0.0792 $ cg08770523      : num 
> 0.0243 0.0213 0.0203 0.0194 0.0197
> 
> > table(sapply(mydt10, typeof))
> 
> double 844488

I am also trying the bigreadr but this is also got stuck at 0%

data <- big_read("df.num.txt",select=1:844880, progress=TRUE)
Will read the file in 30 parts.
| | 0%

Any help will be highly appreciated.

Many thanks, nabiyogesh

R Rstudio Bioconducter statistics • 321 views
ADD COMMENT
0
Entering edit mode

You don't mention what the issue is with data.table::fread. Do you have enough memory to actually read the file? What error do you get with fread?

ADD REPLY
0
Entering edit mode

Thanks Sean,

I am getting below error with fread, I am running this on HPC with 500GB.

> library(data.table)
data.table 1.14.2 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
> phyloseq <- fread("df.num.txt",header=T,sep='\t',check.names=F,fill=TRUE)

 *** caught segfault ***
address 0x7efdf4a26d44, cause 'memory not mapped'

Traceback:
 1: fread("df.num.txt", header = T, sep = "\t", check.names = F, fill = TRUE)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
ADD REPLY
0
Entering edit mode

Clearly not a Bioconductor problem, so I'd suggest following up with the data.table folks.

ADD REPLY

Login before adding your answer.

Traffic: 261 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6