I have expression data from an Affymetrix Human Gene 1.1 ST Array. According to the CDF, the chip is called hugene11st. I'm wishing to perform a differential expression in limma
after parsing CEL files with oligo
.
While following limma's userguide, I faced problems while trying to annotate the data. I already have an ExpressionSet
object named eset
. I found an annotation package called hugene11stprobeset.db
, thus I ran
BiocManager::install("hugene11stprobeset.db")
library(hugene11stprobeset.db)
library(annotate)
ids <- featureNames(eset)
symbol <- getSYMBOL(ID,"hugene11stprobeset.db")
Nevertheless, it seems that the array symbol
is full of NAs.
It is worth mentioning that, while watching the outputs of keys(hugene11stprobeset.db)
and featureNames(eset)
, the numbers seem to match; thus, it seems that the database is the problem.
Besides, I have a GPL22286_hugene11st_Hs_ENSG.cdf.gz
file. Is there a way to parse it into R to annotate the ExpressionSet
object?
Any recommendation will be highly appreciated. Thank you.
I did as you told, and after running
eset.annot <- annotateEset(eset, hugene11sttranscriptcluster.db)
I got the next output:In fact, after running
fData(eset.annot)
, the ENTREZID, SYMBOL and GENENAME are still a bunch of NAsLet's take a closer look.
The annotations provided in the various
ChipDb
packages supplied for Affymetrix are based on data that is supplied by Affymetrix (now Fisher Scientific). We are just providing those data in an easily consumable form, and make no representations about the accuracy of the data. But do note that not all of the probesets on this array should measure something recognizable.So all of those probesets are controls or normgene exon or intron probesets, none of which should be annotated anyway.
Thank you si much!! This was such a complete and useful explanation, and now everything is clear. Thank you for all the information provided (as well as all the packages)