Hello all, I'm new to Amazon Web Services and cloud computing in general, so please bear with me. My question is, how can I access CEL files stored in an Amazon S3 bucket using the functions in the "oligo" package to create Affybatch objects from CEL files? In other words, in the file path argument for the list.celfiles() function, how do I point it to the bucket with my CEL files?
For a little more background: I want to analyze a microarray dataset from NCBI GEO that my local machine does not have the capacity for, so I will be using the Bioconductor Cloud AMI to run R and Amazon S3 to store the raw CEL files. The platform is Affymetrix Human Exon 1.0 ST Array. The raw files are about 11.5 Gb, if that makes a difference. Let me know if you need more information. Thanks in advance.
The location of a file in a AWS S3 bucket is defined by a url. To download the file, you can use this url in an function that supports remote downloads, e.g., `download.file()`.
I'm not familiar with the functions in oligo so I'm not sure if they offer a function that downloads remote CEL files and creates an Affybatch. Even if they don't you can still do this yourself by downloading the file, reading it with oligo functions, converting to Affybatch etc.
When you create the AMI be sure to choose an instance that has enough disk space and memory to support the computations. You'll probably want an 'm' series greater than or equal to 'xlarge'.
https://aws.amazon.com/ec2/instance-types/
Valerie