Getting annotated and normalized data.
Entering edit mode
jslow • 0
Last seen 5 months ago
United States


I am trying to analyze microarray data with R and am stuck at annotation step. I was wondering if anyone could help. 

Here's the code i have so far 




OligoEset<-rma(OligoRaw) # 35556 features, 24 samples 


ID <- getMainProbes(OligoEset)
annot <- select(mogene10sttranscriptcluster.db, featureNames(ID),
                c("SYMBOL","GENENAME","ENTREZID")) # 36631


I am stuck at trying to merge the "annot" with "OligoEset". I would like to have annotated and normalized data set in a dataframe/.txt/.xls files to analyze. 

I'd very much appreciate any help.



microarray rstudio annotation • 756 views
Entering edit mode
Last seen 58 minutes ago
United States

You actually don't want the annotated and normalized data any of those forms. If you are going to use Bioconductor to analyze, then you need to learn to use the tools that are supplied.

The ExpressionSet containing your data is a perfect input to say, the limma package. So you now need to define what comparisons you want to make, and express that as a design matrix. See the limma user's guide.

What you would tend to do is something like

design <- model.matrix(~<args go here>)

fit <- lmFit(data.oligo, design)

fit2 <- eBayes(fit)

You will have duplicates in your annot data.frame, so you have to deal with that. The most naive thing you could do is choose the first one:

annot <- annot[!duplicated(annot[,1]),]

fit2$genes <- annot

Now your topTable() output will have annotations, as well as statistics.

topTable(fit2, coef = 2)



Entering edit mode

Thanks James,

but assuming I want a collated file so that non-R users can read and understand it, Im thinking a file that contains normalized raw datum such that layman can compare and make their own analysis. 

Is there a way I can get this done?

Entering edit mode
Last seen 58 minutes ago
United States

Sure. If you are already using affycoretools, see ?writeFit.

I am not in general enthused with giving normalized data to 'laymen' so they can make their own analyses. In other words, generating summarized data from raw celfiles is not usually the part of the analysis that requires the most sophistication (although the QC part does take some base knowledge). Instead, fitting models to the data and ensuring that statistically unsophisticated collaborators understand what was done and why is the main deliverable for my line of work.

Because of that, I much prefer giving people either HTML or Excel spreadsheets that already contain the comparisons they wanted. The ReportingTools package makes it very easy to generate HTML tables that are easy to work with. The openxlsx package makes it easy to output Excel spreadsheets directly, which allows you to circumvent Excel's tendency to convert gene symbols that look like dates into actual dates, when people import data incorrectly (as an example, SEPT1 is helpfully converted to 9/1/2015, because obviously).


Login before adding your answer.

Traffic: 651 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6