Doubts about manipulation and annotation of microarray files deposited at GEO
1
0
Entering edit mode
@c6e10469
Last seen 13 months ago
Portugal

Dear colleagues, I need help.

I have no experience with microarray files deposited in GEO and I have some doubts.

I don't want to do a differential expression analysis, I want to name the genes and come up with an expression value for each one, in each sample. I intend to plot a heatmap graph of all samples for some genes. For the graphics I already have a script.

I made a "manual annotation", using the procv function of the spreadsheet and observed that some genes are represented by more than one probe, with different expression values. How do I analyze this type of data?

Another question is how do I annotate banks like GSE77930 in which the IDs of the probes in the file with expression values are different from the IDs in the identification file of the GPL21289 genes?

Thanks in advance to anyone willing to help me. Best regards,

Michele Breton ```

MicroarrayData Annotation GEOdata • 490 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

Your first question is something that you will have to answer for yourself. There are any number of reasons that an array manufacturer will add multiple probes for the same gene. You could hypothetically do a deep dive on the array and inspect each of the duplicated probes and decide for yourself which one is to be preferred (or if they are equivalent) and use that information to decide which one(s) to retain. Or you could use the probes with the highest overall intensity (they bind better, so maybe they ARE better?). Or you could just average them. Or just randomly exclude one. Each has tradeoffs, and since you are doing the work, it's up to you to decide.

For your second question, the experiment you link to used two different Agilent arrays. There are 320 total arrays, and some unknown (to me) number were run on one platform, and some on the other. It's unclear to me what might be in the series matrix file, particularly since one array has over 411K probes and the other has around 38K probes. That sounds like a situation where downloading the raw data and processing separately is the smart play. The limma package is your friend in that case.

ADD COMMENT

Login before adding your answer.

Traffic: 796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6