Dear R-Users
i would like to read a big file (1000 lines and 1200000 columns) with
R?
but this is impossible ! Does someone have a magic solution to my
problem?
Otherwise I try a function to read by few columns this large file ( no
by lines!)?
Regards
M
Mohamed Lajnef wrote:
> Dear R-Users
>
> i would like to read a big file (1000 lines and 1200000 columns)
with R?
> but this is impossible ! Does someone have a magic solution to my
problem?
>
> Otherwise I try a function to read by few columns this large file (
no
> by lines!)?
on linux something like
scan(pipe("cut -f 1000-2000 myfile.txt", open="r"), integer())
can sometimes be an effective way to select some of a very large
number
of columns. Sounds like SNPs, where personally I would parse these
into
a netCDF file for future easy access using the ncdf package.
Martin
>
> Regards
> M
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
Hi Mohamed,
I have experienced this problem when trying to handle large files. My
best solution has been to preprocess the data using Python or Java
scripts and then do the final processing in R. But without more detail
on your problem, it is hard to know exactly what to suggest (see the
posting guide).
Also, your question would probably be better suited for the R mailing
list than this one: https://stat.ethz.ch/mailman/listinfo/r-help
-Steve
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-
bounces@stat.math.ethz.ch] On Behalf Of Mohamed Lajnef
Sent: Wednesday, August 12, 2009 8:29 AM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Read big file!
Dear R-Users
i would like to read a big file (1000 lines and 1200000 columns) with
R?
but this is impossible ! Does someone have a magic solution to my
problem?
Otherwise I try a function to read by few columns this large file ( no
by lines!)?
Regards
M
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
On Wed, Aug 12, 2009 at 10:28 AM, Mohamed
Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote:
> Dear R-Users
>
> i would like to read a big file (1000 lines and 1200000 columns)
with R? but
> this is impossible ! Does someone have a magic solution to my
problem?
>
> Otherwise I try a function to read by few columns this large file (
no by
> lines!)?
This is better posted to the R-help list, but have you tried playing
with colClasses from read.table()?
Sean
it is also possible to use a buffered reading/filtering approach.
look carefully at scan(), readLines()
and friends.
On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at="" gmail.com="">
wrote:
> On Wed, Aug 12, 2009 at 10:28 AM, Mohamed
> Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote:
>> Dear R-Users
>>
>> i would like to read a big file (1000 lines and 1200000 columns)
with R? but
>> this is impossible ! Does someone have a magic solution to my
problem?
>>
>> Otherwise I try a function to read by few columns this large file (
no by
>> lines!)?
>
> This is better posted to the R-help list, but have you tried playing
> with colClasses from read.table()?
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Vincent Carey, PhD
Biostatistics, Channing Lab
617 525 2265
Dear All,
I think it would be better to use SAS to trait the large file.
Sean: i tried colClasse option, but the computer fail to compile
Thank you for your help
M
Vincent Carey a ?crit :
> it is also possible to use a buffered reading/filtering approach.
> look carefully at scan(), readLines()
> and friends.
>
> On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at="" gmail.com="">
wrote:
>
>> On Wed, Aug 12, 2009 at 10:28 AM, Mohamed
>> Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote:
>>
>>> Dear R-Users
>>>
>>> i would like to read a big file (1000 lines and 1200000 columns)
with R? but
>>> this is impossible ! Does someone have a magic solution to my
problem?
>>>
>>> Otherwise I try a function to read by few columns this large file
( no by
>>> lines!)?
>>>
>> This is better posted to the R-help list, but have you tried
playing
>> with colClasses from read.table()?
>>
>> Sean
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
>
>
Having a file with 1,200,000 columns, read.table will like take
forever
(and this no matter the options) you give, and so will readLines().
Having the same file transposed (1,000 columns / 1,200,000 rows) would
not be such a problem to read in.
Using scan() (and then add a dimension to your vector to make it a
matrix again), or the system-call using cut given earlier (looping
across columns) in this thread could be the simplest way.
L.
> Dear All,
> I think it would be better to use SAS to trait the large file.
> Sean: i tried colClasse option, but the computer fail to compile
> Thank you for your help
> M
>
>
> Vincent Carey a ?crit :
> > it is also possible to use a buffered reading/filtering approach.
> > look carefully at scan(), readLines()
> > and friends.
> >
> > On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at="" gmail.com=""> wrote:
> >
> >> On Wed, Aug 12, 2009 at 10:28 AM, Mohamed
> >> Lajnef<mohamed.lajnef at="" inserm.fr=""> wrote:
> >>
> >>> Dear R-Users
> >>>
> >>> i would like to read a big file (1000 lines and 1200000 columns)
with R? but
> >>> this is impossible ! Does someone have a magic solution to my
problem?
> >>>
> >>> Otherwise I try a function to read by few columns this large
file ( no by
> >>> lines!)?
> >>>
> >> This is better posted to the R-help list, but have you tried
playing
> >> with colClasses from read.table()?
> >>
> >> Sean
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >
> >
> >
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
2009/8/12 Mohamed Lajnef <mohamed.lajnef at="" inserm.fr="">:
> Dear R-Users
>
> i would like to read a big file (1000 lines and 1200000 columns)
with R? but
> this is impossible ! Does someone have a magic solution to my
problem?
>
> Otherwise I try a function to read by few columns this large file (
no by
> lines!)?
Hello,
I would recommend you to use the package "colbycol" in CRAN, which
reads files column by column and lets you select which ones to load
into memory, etc.
It can be better suited to you than the regular read functions.
As a disclaimer: I am the author of the"colbycol" package, which was
motivated by problems similar to this one. I would love to hear how it
fares and to get feedback from users so that I can improve it.
Best regards,
Carlos J. Gil Bellosta
http://www.datanalytics.com