Hello,
I would need to purchase a desktop computer for the analysis of high-
throughput genomic data using R/Bioconductor. I want to ask you the
specifications that I should look for (as for example ram memory,
processor type, number of cores). Can you mention specific examples of
state of the art machines fitted for this purpose?
Thank you very much.
Alberto Capurro
Marie Curie Research Fellow
Department of Cell Physiology and Pharmacology
College of Medicine, Biological Sciences and Psychology
Maurice Shock Medical Sciences Building Room 319
University of Leicester
Leicester LE1 9HN
United Kingdom
https://sites.google.com/site/albertocapurro/
Hi,
On Thu, Dec 27, 2012 at 11:14 AM, Capurro, Alberto (Dr.)
<ac331 at="" leicester.ac.uk=""> wrote:
> Hello,
>
> I would need to purchase a desktop computer for the analysis of
high-throughput genomic data using R/Bioconductor. I want to ask you
the specifications that I should look for (as for example ram memory,
processor type, number of cores). Can you mention specific examples of
state of the art machines fitted for this purpose?
This is a hard question to answer.
Is the machine for yourself, or a lab, more users?
What types of analysis will you be doing? Will you be working with raw
NGS data -- if so, just sequence alignment, assembly, etc? Do you have
a compute cluster to push those types of jobs over to?
Anyway -- absent any of that information, for us mere mortals who shop
at "pro-sumer" levels, I wouldn't complain too much if I had, say, a
12 core, 32-64gb ram osx or linux machine sitting underneath my desk.
If you're working w/ NGS data, no matter how much HD space you get,
you will always run out, so you will constantly have to be adding
upgrades there, but start with several terabytes ... also ... you'll
have to think about how you plan to backup your data, but I'll leave
that as another topic.
But, as I said, the more (of everything) the merrier.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Thank you very much. I will do microarray analysis at first but in the
future we are also interested in sequencing. The computer is for the
lab, I will be in charge of the processing, I have experience in
computational neuroscience but not in genomics, so I am learning now.
I think that the Uni usually buys windows machines. Regarding the
operating system, is there an important reason to use linux instead of
windows 7 to run bioconductor and R?. I can use linux if it is better.
I can get 10 T and backup in and external disk and in space provided
by the Uni network.
Thank you very much.
Best,
Alberto
Alberto Capurro
Marie Curie Research Fellow
Department of Cell Physiology and Pharmacology
College of Medicine, Biological Sciences and Psychology
Maurice Shock Medical Sciences Building Room 319
University of Leicester
Leicester LE1 9HN
United Kingdom
Tel +44 (0)116 252 2673
E-mail: ac331 at le.ac.uk
https://sites.google.com/site/albertocapurro/
________________________________________
From: Steve Lianoglou [mailinglist.honeypot@gmail.com]
Sent: Thursday, December 27, 2012 8:55 PM
To: Capurro, Alberto (Dr.)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Computer for the analysis of high-throughput
genomic data
Hi,
On Thu, Dec 27, 2012 at 11:14 AM, Capurro, Alberto (Dr.)
<ac331 at="" leicester.ac.uk=""> wrote:
> Hello,
>
> I would need to purchase a desktop computer for the analysis of
high-throughput genomic data using R/Bioconductor. I want to ask you
the specifications that I should look for (as for example ram memory,
processor type, number of cores). Can you mention specific examples of
state of the art machines fitted for this purpose?
This is a hard question to answer.
Is the machine for yourself, or a lab, more users?
What types of analysis will you be doing? Will you be working with raw
NGS data -- if so, just sequence alignment, assembly, etc? Do you have
a compute cluster to push those types of jobs over to?
Anyway -- absent any of that information, for us mere mortals who shop
at "pro-sumer" levels, I wouldn't complain too much if I had, say, a
12 core, 32-64gb ram osx or linux machine sitting underneath my desk.
If you're working w/ NGS data, no matter how much HD space you get,
you will always run out, so you will constantly have to be adding
upgrades there, but start with several terabytes ... also ... you'll
have to think about how you plan to backup your data, but I'll leave
that as another topic.
But, as I said, the more (of everything) the merrier.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Hi,
On Fri, Dec 28, 2012 at 4:36 AM, Capurro, Alberto (Dr.)
<ac331 at="" leicester.ac.uk=""> wrote:
> Thank you very much. I will do microarray analysis at first but in
the future we are also interested in sequencing. The computer is for
the lab, I will be in charge of the processing, I have experience in
computational neuroscience but not in genomics, so I am learning now.
I think that the Uni usually buys windows machines. Regarding the
operating system, is there an important reason to use linux instead of
windows 7 to run bioconductor and R?. I can use linux if it is better.
I can get 10 T and backup in and external disk and in space provided
by the Uni network.
Without inciting a flamewar, I don't think it's too controversial to
say that most scientific tools in this space are written for linux
first, then tweaked to run on osx (us osx folks are, by default, stuck
on an older version of gcc, so some tweaks are harder than others),
and likely windows is the after thought.
Look at, for example, some of the aligners out there.
* Bowtie provides compiled binaries for linux and osx, no windows:
http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.0.4/
* The STAR aligner runs on linux, and recently was tweaked to run on
osx (not sure if it's entirely working).
* bwa's SF page suggests it only runs on linux and BSD (osx).
* "A unix system" is listed as a prerequisite for installing GSNAP.
For the most part, however, this isn't true for the R/bioconductor
packages you will likely be using. AFAIK, the majority of the bioc
packages work just fine on unix, osx, and windows.
Also, if you're planning on having several people log into the machine
to do work, then I think a *nix is likely going to be your best bet.
So, to be honest, even though I have a slight osx bent, if I were in
your shoes and was put in a position to buy a workhorse machine, I'd
go linux. I assume you, and the other members in the lab, will have
their own desktops/laptops to do downstream analysis -- which can be
the OS of your choosing.
After doing some of the heavy lifting on a compute-server (I'm
thinking of alignment/assembly), you can likely do most all of your
work on a lower powered machine -- especially if we're talking about
more "canned"/routinary analysis. I've done lots of downstream
analysis on my 8gb ram, dual core macbook pro, for instance, although
having access to some big iron to do some heavy computing at times is
totally necessary.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Thank you very much Steve, I will go for a linux operating system
then.
Best,
Alberto
Alberto Capurro
Marie Curie Research Fellow
Department of Cell Physiology and Pharmacology
College of Medicine, Biological Sciences and Psychology
Maurice Shock Medical Sciences Building Room 319
University of Leicester
Leicester LE1 9HN
United Kingdom
Tel +44 (0)116 252 2673
E-mail: ac331 at le.ac.uk
https://sites.google.com/site/albertocapurro/
________________________________________
From: Steve Lianoglou [mailinglist.honeypot@gmail.com]
Sent: Friday, December 28, 2012 3:52 PM
To: Capurro, Alberto (Dr.)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Computer for the analysis of high-throughput
genomic data
Hi,
On Fri, Dec 28, 2012 at 4:36 AM, Capurro, Alberto (Dr.)
<ac331 at="" leicester.ac.uk=""> wrote:
> Thank you very much. I will do microarray analysis at first but in
the future we are also interested in sequencing. The computer is for
the lab, I will be in charge of the processing, I have experience in
computational neuroscience but not in genomics, so I am learning now.
I think that the Uni usually buys windows machines. Regarding the
operating system, is there an important reason to use linux instead of
windows 7 to run bioconductor and R?. I can use linux if it is better.
I can get 10 T and backup in and external disk and in space provided
by the Uni network.
Without inciting a flamewar, I don't think it's too controversial to
say that most scientific tools in this space are written for linux
first, then tweaked to run on osx (us osx folks are, by default, stuck
on an older version of gcc, so some tweaks are harder than others),
and likely windows is the after thought.
Look at, for example, some of the aligners out there.
* Bowtie provides compiled binaries for linux and osx, no windows:
http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.0.4/
* The STAR aligner runs on linux, and recently was tweaked to run on
osx (not sure if it's entirely working).
* bwa's SF page suggests it only runs on linux and BSD (osx).
* "A unix system" is listed as a prerequisite for installing GSNAP.
For the most part, however, this isn't true for the R/bioconductor
packages you will likely be using. AFAIK, the majority of the bioc
packages work just fine on unix, osx, and windows.
Also, if you're planning on having several people log into the machine
to do work, then I think a *nix is likely going to be your best bet.
So, to be honest, even though I have a slight osx bent, if I were in
your shoes and was put in a position to buy a workhorse machine, I'd
go linux. I assume you, and the other members in the lab, will have
their own desktops/laptops to do downstream analysis -- which can be
the OS of your choosing.
After doing some of the heavy lifting on a compute-server (I'm
thinking of alignment/assembly), you can likely do most all of your
work on a lower powered machine -- especially if we're talking about
more "canned"/routinary analysis. I've done lots of downstream
analysis on my 8gb ram, dual core macbook pro, for instance, although
having access to some big iron to do some heavy computing at times is
totally necessary.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact