Entering edit mode
Azby Cdex
▴
10
@azby-cdex-5038
Last seen 9.6 years ago
Dear Friends,
First of all let me tell that I am not an expert bioinformatician. I
would
like to do some basic microarray analysis using R & Bioconductor with
CEL
files obtained using Affymetrix HuGene-1.0-ST-v1 platform. I have so
many
questions and I tried to search and read several threads in the
Bioconductor Help List and other webpages. My questions are related to
or
the same as many of the previous threads but after reading several of
those
answers, questions remain almost the same.
The main question is regarding the number of genes probed in this
platform.
According to Affymetrix Data sheet on this platform there are 764,885
distinct probes and 28,869 estimated genes. When I use affy and use
the
function ReadAffy() and rma I get an expression set with 32321
features. Very different from 28,869!
I read in most of the replies to previous threads that affy should
not be
used for the analysis of this platform. (It will be great if somebody
can
explain or point to relevant literature on the reasons for these
differences). However, with affy it automatically identifies the
correct
annotation file (at least the name *hugene10stv1*) and processes the
CEL
file without giving any error message or warning.
As suggested in many threads and in Bioconductor website I used the
package
oligo for processing my HuGene10STv1 based CEL file. After
summarizing at
the core level using rma function, I obtained an expression set
object
with 33297 features, and of course it is neither 28,869 nor 32321.
Here the
annotation used is pd.hugene.1.0.st.v1 instead of the hugene10stv1
in
the previous case.
I am fine with using oligo. [See, I am blindly using a software,
like
most of the people! I found papers, even in prestigious journals,
using
affy to process CEL files obtained using hugene10stv1 chip. Please
help
me to open my eyes or enlighten me (and many others)!] However, when I
want
to get gene Symbols corresponding to the transcripts, again there is a
number mismatch. For example when I used the package
'hugene10sttranscriptcluster.db' , I found that there are 21995 keys
out of
33295 (not 33297) can be mapped to gene symbols. What happened to two
of
them? Or, with oligo I have to use something else to convert
transcript
ids to SYMBOLS or ENTREZIDs, than 'hugene10sttranscriptcluster.db'?
I read that affy can be used with * "hugene10stv1.r3cdf" *but there
is
no such thing available in bioconductor website among the annotation
packages. May be that was applicable to an older Bioconductor release
as
those threads were 2-3 years old. Doesnt it imply that the currently
available *hugene10stv1 *is the correct one to use with affy? On
the
other hand, if it cannot be used why is it there in Bioconductor?
Where do
we use the annotation *hugene10stv1*?
I read there are other packages such as aroma-affymetrix, xps, etc,
but I
am trying to do some simple things with standard, official,
bioconductor
packages. Any suggestions and helpful hints are highly appreciated.
Here are the commands that used in Bioconductor version 2.8 (with R
2.13)
[Yes, I will update to most recent version soon!].
As an example, I used the CEL file, 'GSM857535.CEL.gz', down loaded
from
GEO.
> library(affy)
> as <- ReadAffy('GSM857535.CEL.gz')
> as
> aset <- rma(as)
> aset
> library('hugene10sttranscriptcluster.db')
x <- hugene10sttranscriptclusterSYMBOL
xx <- x[mappedkeys(x)]
> length(x)
[1] 33295
> length(xx)
[1] 21995
library(oligo)
bs <- read.celfiles('GSM857535.CEL.gz')
> bs
> bset <- rma(bs,target='core')
> bset
Thanks,
Asha
[[alternative HTML version deleted]]