Read big file!
4
0
Entering edit mode
@mohamed-lajnef-3515
Last seen 9.7 years ago
Dear R-Users i would like to read a big file (1000 lines and 1200000 columns) with R? but this is impossible ! Does someone have a magic solution to my problem? Otherwise I try a function to read by few columns this large file ( no by lines!)? Regards M
• 1.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 11 days ago
United States
Mohamed Lajnef wrote: > Dear R-Users > > i would like to read a big file (1000 lines and 1200000 columns) with R? > but this is impossible ! Does someone have a magic solution to my problem? > > Otherwise I try a function to read by few columns this large file ( no > by lines!)? on linux something like scan(pipe("cut -f 1000-2000 myfile.txt", open="r"), integer()) can sometimes be an effective way to select some of a very large number of columns. Sounds like SNPs, where personally I would parse these into a netCDF file for future easy access using the ncdf package. Martin > > Regards > M > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@stephen-piccolo-6761
Last seen 3.6 years ago
United States
Hi Mohamed, I have experienced this problem when trying to handle large files. My best solution has been to preprocess the data using Python or Java scripts and then do the final processing in R. But without more detail on your problem, it is hard to know exactly what to suggest (see the posting guide). Also, your question would probably be better suited for the R mailing list than this one: https://stat.ethz.ch/mailman/listinfo/r-help -Steve -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Mohamed Lajnef Sent: Wednesday, August 12, 2009 8:29 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Read big file! Dear R-Users i would like to read a big file (1000 lines and 1200000 columns) with R? but this is impossible ! Does someone have a magic solution to my problem? Otherwise I try a function to read by few columns this large file ( no by lines!)? Regards M _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Wed, Aug 12, 2009 at 10:28 AM, Mohamed Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote: > Dear R-Users > > i would like to read a big file (1000 lines and 1200000 columns) with R? but > this is impossible ! Does someone have a magic solution to my problem? > > Otherwise I try a function to read by few columns this large file ( no by > lines!)? This is better posted to the R-help list, but have you tried playing with colClasses from read.table()? Sean
ADD COMMENT
0
Entering edit mode
it is also possible to use a buffered reading/filtering approach. look carefully at scan(), readLines() and friends. On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at="" gmail.com=""> wrote: > On Wed, Aug 12, 2009 at 10:28 AM, Mohamed > Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote: >> Dear R-Users >> >> i would like to read a big file (1000 lines and 1200000 columns) with R? but >> this is impossible ! Does someone have a magic solution to my problem? >> >> Otherwise I try a function to read by few columns this large file ( no by >> lines!)? > > This is better posted to the R-help list, but have you tried playing > with colClasses from read.table()? > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Vincent Carey, PhD Biostatistics, Channing Lab 617 525 2265
ADD REPLY
0
Entering edit mode
Dear All, I think it would be better to use SAS to trait the large file. Sean: i tried colClasse option, but the computer fail to compile Thank you for your help M Vincent Carey a ?crit : > it is also possible to use a buffered reading/filtering approach. > look carefully at scan(), readLines() > and friends. > > On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at="" gmail.com=""> wrote: > >> On Wed, Aug 12, 2009 at 10:28 AM, Mohamed >> Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote: >> >>> Dear R-Users >>> >>> i would like to read a big file (1000 lines and 1200000 columns) with R? but >>> this is impossible ! Does someone have a magic solution to my problem? >>> >>> Otherwise I try a function to read by few columns this large file ( no by >>> lines!)? >>> >> This is better posted to the R-help list, but have you tried playing >> with colClasses from read.table()? >> >> Sean >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > >
ADD REPLY
0
Entering edit mode
Having a file with 1,200,000 columns, read.table will like take forever (and this no matter the options) you give, and so will readLines(). Having the same file transposed (1,000 columns / 1,200,000 rows) would not be such a problem to read in. Using scan() (and then add a dimension to your vector to make it a matrix again), or the system-call using cut given earlier (looping across columns) in this thread could be the simplest way. L. > Dear All, > I think it would be better to use SAS to trait the large file. > Sean: i tried colClasse option, but the computer fail to compile > Thank you for your help > M > > > Vincent Carey a ?crit : > > it is also possible to use a buffered reading/filtering approach. > > look carefully at scan(), readLines() > > and friends. > > > > On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at="" gmail.com=""> wrote: > > > >> On Wed, Aug 12, 2009 at 10:28 AM, Mohamed > >> Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote: > >> > >>> Dear R-Users > >>> > >>> i would like to read a big file (1000 lines and 1200000 columns) with R? but > >>> this is impossible ! Does someone have a magic solution to my problem? > >>> > >>> Otherwise I try a function to read by few columns this large file ( no by > >>> lines!)? > >>> > >> This is better posted to the R-help list, but have you tried playing > >> with colClasses from read.table()? > >> > >> Sean > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > > > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@carlos-j-gil-bellosta-1569
Last seen 9.7 years ago
2009/8/12 Mohamed Lajnef <mohamed.lajnef at="" inserm.fr="">: > Dear R-Users > > i would like to read a big file (1000 lines and 1200000 columns) with R? but > this is impossible ! Does someone have a magic solution to my problem? > > Otherwise I try a function to read by few columns this large file ( no by > lines!)? Hello, I would recommend you to use the package "colbycol" in CRAN, which reads files column by column and lets you select which ones to load into memory, etc. It can be better suited to you than the regular read functions. As a disclaimer: I am the author of the"colbycol" package, which was motivated by problems similar to this one. I would love to hear how it fares and to get feedback from users so that I can improve it. Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com
ADD COMMENT

Login before adding your answer.

Traffic: 504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6