Hello,
I am a computer science student who is trying to work with expression data for the first time. I went on GEO to download the raw data (because I want to merge different series). However the raw files have different extensions. Some are .CELL others are .CSV or .GPR. I tried to look at the .CSV files and I saw that they scanarray express file. I saw that the data is a matrix with different headers such as "Chx Log Ratio" or ''Chx mean" "Chx Median" etc . For the .GPR files the headers are for example "F633 Median" or "F543 Median" etc. So, if I want to normalize the data by myself which entries should I consider depending on the file's type?
Also, I cannot find over the internet any tutorials that explain how to analyze .gpr and .csv files with bioconductor. So if anybody can point any good tutorial for beginners in microarray data analysis with R, that will help me a lot.
Thanks you very much.
I can't speak to that, as I have no idea how you plan to construct a gene regulatory network. The point I am making is much simpler than that - the expression values you get from a microarray are basically highly processed data that measure how much mRNA was in the original sample, as well as a whole host of other technical variability that has nothing whatsoever to do with the amount of mRNA. If you make comparisons within a set of samples that were processed at the same time, on the same platform, etc. then you can make the assumption that most of the technical variability is consistent across arrays, and any differences are mostly due to biological differences (e.g., changes in the amount of underlying mRNA in the sample). And given certain assumptions, you can attempt to account for any inconsistent technical variability using one of the various normalization procedures that have been developed over time.
But if you want to compare between experiments, then all the technical differences are likely to be as large or larger than any biological differences that may exist, and you cannot in many cases distinguish between the two. There are tools like frozen RMA or SCAN.UPC that are intended to account for batch effects, which may help you to combine different data sets run on the same platform, but combining data from lots of different platforms is likely to be a daunting task.