I am a Chinese student, English is not very good, some places may not be clear, I hope you can understand
Recently, I've been using oligo packages to analyze Affymetrix Mouse Gene 2 .0 ST Array chips. But I'm not going to convert the probe's ID into the ID of the gene. This problem has been bothering me for a long time. I checked some information and didn't solve it. Is there anyone who can help me? Thank you very much .
Here's the code I'm using :
library(oligo)
celFiles <- list.celfiles()
affyRaw <- read.celfiles(celFiles)
librarypd.mogene.2.0.st)
eset <- rma(affyRaw)
library(limma)
design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))
colnames(design) <- c("group1", "group2")
contrast.matrix <- makeContrasts(contrasts="group2-group1",levels=design)
design
fit <- lmFit(eset, design)
fit1<- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit1)
dif<-topTable(fit2,coef="group2-group1",n=nrow(fit2),lfc=log2(2))
dif<-dif[dif[,"adj.P.Val"]<0.05,]
head(dif)
I can only do it here, how to do the ID conversion, I can not do it, can anyone help me, thank you again
First of all, thank you very much for your answer, but I have done it according to your method. After that, it seems that the problem has not been solved, and there are many NA values. I don't know what caused it.
Well, I don't fully agree with you. Your annotation 'problem' HAS been solved, because SYMBOLs and GENENAMEs were retrieved and added to your output. I agree with you regarding the many NA's that are present. However, this has (solely) to do with the limited annotation information Affymetrix provides for this array. In other words, you have to 'blame' Affymetrix for providing such poorly annotated csv file... (which is the basis of all annotation files).
In this thread A: affycoretools annotateEset problem using Clariom D arrays James MacDonald provides an informative line of code that will show you the fraction of your data that could be annotated:
To reduce the number of not-annotated probeids you might considering to use the so-called custom-defined array definitions made by Manhong Dai from the Brain Array group here. Manhong remaps all probes present on the array to a current genome build available at e.g. the NCBI or ENSEMBL databases. In addition of filtering out probes that are not specific, another advantage is that (almost) all probeids are annotated. If you would like to go that way, below some code to get you started (note: this code uses the remapped probes based on the ENTREZG database from NCBI):
Thank you very much for your reply. I'll take a closer look at it. Thank you very much
Sorry, there's another question I'd like to ask you .I used the code above to annotate the data .But there are some small problems in the result .
-4.90076
What does the XR-398539 mean in the column of ID?And, in the result, there are some annotated names of genes, but there is no name in the GPL annotation file. What's the reason?
Sorry, my English is not very good, you know my description of the problem you have read?
Mmm, you also need to explore things yourselves a bit...
XR is one of the 9 RefSeq annotation categories; the abbreviation XR is used to describe a 'predicted ncRNA model' that has been given the (numerical) ID 398539. Please note that this is a computational prediction, so no experimental evidence does (yet) exist for this gene (model) to exist. See also: https://en.wikipedia.org/wiki/RefSeq (or if that link will not work for you here or here).
Regarding the absence of info in the GPL annotation file: I think this has to do with the fact that the annotation info at GEO was last updated in 2013 (Jan 30, 2013: annotation table updated with netaffx build 33), whereas the PdInfo package has been created with the latest Affymetrix information available, which is from January 2017 (netaffx build 36). In other words, the annoation info available at GEO is outdated.
Thank you very much for your answer. I am a self-taught biological information, the school teachers and students are not very well understood, so there are many problems can not be solved, only online help. I will find some information to learn, thank you very much for your help