Would you please give me a few minutes for my question?
However, I am not sure that I can ask my question here!
Anyways, could you please direct me on how to download and process a microarray data file(GSE) from GEO?
Thank you so much!
Hi, for most microarray datasets, there should be a blue Analyze with GEO2R button on the main accession page. If you click on that, and then go to the R script tab, you should see code that allows you to obtain the data, usually via the Series Matrix File.
This data should already be normalised but GEO cannot guarantee this for every study; so, you need to verify the authors' methods and the other information in the GEO record [that the data is normalised]. Plotting the data via histograms and box-and-whiskers can help, too - if microarray and RMA-normalised, it is easy to identify this [quantile normalisation] distribution via a box-and-whiskers.
Thanks James, and sorry for the misunderstandings!
However, how I can have them in my R environment?
Is there a Bioconductor package that help me to download the files?
Best,
AD
in principle, as Kevin said, GEOquery should download the processed expression profiles using the function getGEO(). however, i'm afraid in the case of GSE117134 this is not going to work because those expression profiles seem to be stored as supplementary file in GEO. to access them you can do the following:
and now you're ready to analyze the gene expression data matrix in y. A starting point for a beginner may be any of the available workflows for this kind of data, such as this one or this other one, taking into account that your starting point are not RNA-seq raw counts but RNA-seq processed expression profiles in some kind of continuous units of expression, check the associated publication to learn what kind of units are those and how they have been processed.
Beware that these appear to be RPKM values rather than read counts, so they don't input directly into any of the Bioconductor workflows. See https://support.bioconductor.org/p/56275/ for discussion and work-around suggestions.
Thanks, but the microarray datasets I am looking for has no this "Analyze with GEO2R" button.
What are the IDs?
here is the IDs and the link GSE117134 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi
That's not a microarray data set. It's RNA-Seq
Thanks James, and sorry for the misunderstandings! However, how I can have them in my R environment? Is there a Bioconductor package that help me to download the files? Best, AD
in principle, as Kevin said, GEOquery should download the processed expression profiles using the function
getGEO(). however, i'm afraid in the case of GSE117134 this is not going to work because those expression profiles seem to be stored as supplementary file in GEO. to access them you can do the following:because
read.csv()outputs adata.frameobject you might want to transform it into a matrix as follows:and now you're ready to analyze the gene expression data matrix in
y. A starting point for a beginner may be any of the available workflows for this kind of data, such as this one or this other one, taking into account that your starting point are not RNA-seq raw counts but RNA-seq processed expression profiles in some kind of continuous units of expression, check the associated publication to learn what kind of units are those and how they have been processed.Thank you so much! It was a great help!
Beware that these appear to be RPKM values rather than read counts, so they don't input directly into any of the Bioconductor workflows. See https://support.bioconductor.org/p/56275/ for discussion and work-around suggestions.