Question

New to analysing miccroarrays data

0

Entering edit mode

kawess • 0

@kawess-9219

Last seen 4.5 years ago

Canada

Hello,

I am a computer science student who is trying to work with expression data for the first time. I went on GEO to download the raw data (because I want to merge different series). However the raw files have different extensions. Some are .CELL others are .CSV or .GPR. I tried to look at the .CSV files and I saw that they scanarray express file. I saw that the data is a matrix with different headers such as "Chx Log Ratio" or ''Chx mean" "Chx Median" etc . For the .GPR files the headers are for example "F633 Median" or "F543 Median" etc. So, if I want to normalize the data by myself which entries should I consider depending on the file's type?

Also, I cannot find over the internet any tutorials that explain how to analyze .gpr and .csv files with bioconductor. So if anybody can point any good tutorial for beginners in microarray data analysis with R, that will help me a lot.

Thanks you very much.

microarray • 821 views

ADD COMMENT • link 8.4 years ago kawess • 0

score 0 · Answer 1 · 2015-11-17

You probably don't want to merge different series, even if they are the same array type. That usually makes very little sense from a statistical standpoint, although there are ways to analyze data simultaneously without merging in a conventional sense.

Anyway, for affy arrays, see this workflow. For data read using an Axon scanner, see the limma User's guide.

score 0 · Answer 2 · 2015-11-17

0

Entering edit mode

kawess • 0

@kawess-9219

Last seen 4.5 years ago

Canada

Thank you James for your response and for the links. As I should not merge the series, what should I have to do if I want to construct a big dataset (with many samples)? Because available series do not have enough sample. My objective is to reconstruct the gene regulatory network.

ADD COMMENT • link 8.4 years ago kawess • 0

0

Entering edit mode

I can't speak to that, as I have no idea how you plan to construct a gene regulatory network. The point I am making is much simpler than that - the expression values you get from a microarray are basically highly processed data that measure how much mRNA was in the original sample, as well as a whole host of other technical variability that has nothing whatsoever to do with the amount of mRNA. If you make comparisons within a set of samples that were processed at the same time, on the same platform, etc. then you can make the assumption that most of the technical variability is consistent across arrays, and any differences are mostly due to biological differences (e.g., changes in the amount of underlying mRNA in the sample). And given certain assumptions, you can attempt to account for any inconsistent technical variability using one of the various normalization procedures that have been developed over time.

But if you want to compare between experiments, then all the technical differences are likely to be as large or larger than any biological differences that may exist, and you cannot in many cases distinguish between the two. There are tools like frozen RMA or SCAN.UPC that are intended to account for batch effects, which may help you to combine different data sets run on the same platform, but combining data from lots of different platforms is likely to be a daunting task.

ADD REPLY • link 8.4 years ago James W. MacDonald 65k

score 0 · Answer 3 · 2015-11-17

0

Entering edit mode

kawess • 0

@kawess-9219

Last seen 4.5 years ago

Canada

Thank you for your help. When reconstructing my network I don't want to compare the samples but instead find a correlation between the expression profiles of genes. Here my expression profile for a particular gene is the expression values across all the samples.

ADD COMMENT • link 8.4 years ago kawess • 0