Hu Gene 1.0 ST v1 microarray processing and analysis
1
0
Entering edit mode
Azby Cdex ▴ 10
@azby-cdex-5038
Last seen 9.6 years ago
Dear Friends, First of all let me tell that I am not an expert bioinformatician. I would like to do some basic microarray analysis using R & Bioconductor with CEL files obtained using Affymetrix HuGene-1.0-ST-v1 platform. I have so many questions and I tried to search and read several threads in the Bioconductor Help List and other webpages. My questions are related to or the same as many of the previous threads but after reading several of those answers, questions remain almost the same. The main question is regarding the number of genes probed in this platform. According to Affymetrix Data sheet on this platform there are 764,885 distinct probes and 28,869 estimated genes. When I use ‘affy’ and use the function ‘ReadAffy()’ and ‘rma’ I get an expression set with 32321 features. Very different from 28,869! I read in most of the replies to previous threads that ‘affy’ should not be used for the analysis of this platform. (It will be great if somebody can explain or point to relevant literature on the reasons for these differences). However, with ‘affy’ it automatically identifies the correct annotation file (at least the name ‘*hugene10stv1’*) and processes the CEL file without giving any error message or warning. As suggested in many threads and in Bioconductor website I used the package ‘oligo’ for processing my HuGene10STv1 based CEL file. After summarizing at the core level using ‘rma’ function, I obtained an expression set object with 33297 features, and of course it is neither 28,869 nor 32321. Here the annotation used is ‘pd.hugene.1.0.st.v1’ instead of the ‘hugene10stv1’ in the previous case. I am fine with using ‘oligo’. [See, I am ‘blindly’ using a software, like most of the people! I found papers, even in prestigious journals, using ‘affy’ to process CEL files obtained using ‘hugene10stv1’ chip. Please help me to open my eyes or enlighten me (and many others)!] However, when I want to get gene Symbols corresponding to the transcripts, again there is a ‘number mismatch’. For example when I used the package 'hugene10sttranscriptcluster.db' , I found that there are 21995 keys out of 33295 (not 33297) can be mapped to gene symbols. What happened to two of them? Or, with ‘oligo’ I have to use something else to convert ‘transcript ids’ to SYMBOLS or ENTREZIDs, than 'hugene10sttranscriptcluster.db'? I read that ‘affy’ can be used with * "hugene10stv1.r3cdf" *but there is no such thing available in bioconductor website among the annotation packages. May be that was applicable to an older Bioconductor release as those threads were 2-3 years old. Doesn’t it imply that the currently available ‘*hugene10stv1’ *is the correct one to use with ‘affy’? On the other hand, if it cannot be used why is it there in Bioconductor? Where do we use the annotation ‘*hugene10stv1’*? I read there are other packages such as ‘aroma-affymetrix’, xps, etc, but I am trying to do some simple things with standard, official, ‘bioconductor’ packages. Any suggestions and helpful hints are highly appreciated. Here are the commands that used in Bioconductor version 2.8 (with R 2.13) [Yes, I will update to most recent version soon!]. As an example, I used the CEL file, 'GSM857535.CEL.gz', down loaded from GEO. > library(‘affy’) > as <- ReadAffy('GSM857535.CEL.gz') > as > aset <- rma(as) > aset > library('hugene10sttranscriptcluster.db') x <- hugene10sttranscriptclusterSYMBOL xx <- x[mappedkeys(x)] > length(x) [1] 33295 > length(xx) [1] 21995 library(‘oligo’) bs <- read.celfiles('GSM857535.CEL.gz') > bs > bset <- rma(bs,target='core') > bset Thanks, Asha [[alternative HTML version deleted]]
Microarray Annotation PROcess xps Microarray Annotation PROcess xps • 2.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 21 minutes ago
United States
Hi Asha, On 1/9/2012 5:17 PM, Azby Cdex wrote: > Dear Friends, > > First of all let me tell that I am not an expert bioinformatician. I would > like to do some basic microarray analysis using R& Bioconductor with CEL > files obtained using Affymetrix HuGene-1.0-ST-v1 platform. I have so many > questions and I tried to search and read several threads in the > Bioconductor Help List and other webpages. My questions are related to or > the same as many of the previous threads but after reading several of those > answers, questions remain almost the same. > > The main question is regarding the number of genes probed in this platform. > According to Affymetrix Data sheet on this platform there are 764,885 > distinct probes and 28,869 estimated genes. When I use ?affy? and use the > function ?ReadAffy()? and ?rma? I get an expression set with 32321 > features. Very different from 28,869! There is a difference between the number of genes interrogated and the number of probesets because there can be more than one probeset that interrogates a particular gene. Remember that this chip is supposed to interrogate transcripts, and that may include different splice variants. > > I read in most of the replies to previous threads that ?affy? should not be > used for the analysis of this platform. (It will be great if somebody can > explain or point to relevant literature on the reasons for these > differences). However, with ?affy? it automatically identifies the correct > annotation file (at least the name ?*hugene10stv1?*) and processes the CEL > file without giving any error message or warning. The reason to use oligo instead of affy is that the affy package (actually the makecdfenv package, which makes the cdf packages) was designed for an older chip style that never re-used probes for different probesets. In both the Gene and Exon chips, there are some probes that are part of more than one probeset. If you use the affy package, these probes will only be assigned to a single probeset. > > As suggested in many threads and in Bioconductor website I used the package > ?oligo? for processing my HuGene10STv1 based CEL file. After summarizing at > the core level using ?rma? function, I obtained an expression set object > with 33297 features, and of course it is neither 28,869 nor 32321. Here the > annotation used is ?pd.hugene.1.0.st.v1? instead of the ?hugene10stv1? in > the previous case. > > I am fine with using ?oligo?. [See, I am ?blindly? using a software, like > most of the people! I found papers, even in prestigious journals, using > ?affy? to process CEL files obtained using ?hugene10stv1? chip. Please help > me to open my eyes or enlighten me (and many others)!] However, when I want > to get gene Symbols corresponding to the transcripts, again there is a > ?number mismatch?. For example when I used the package > 'hugene10sttranscriptcluster.db' , I found that there are 21995 keys out of > 33295 (not 33297) can be mapped to gene symbols. What happened to two of > them? Or, with ?oligo? I have to use something else to convert ?transcript > ids? to SYMBOLS or ENTREZIDs, than 'hugene10sttranscriptcluster.db'? No, if you use oligo and the 'core' transcripts, then you want to use the hugene10sttranscriptcluster.db annotation package. Note that the annotation packages are made by taking the manufacturer's mapping of probesets to (usually) Entrez Gene IDs, and then using that mapping to get all the other annotation data. So any lack of probeset -> gene mapping is usually due to a lack of annotation by the manufacturer. > > I read that ?affy? can be used with * "hugene10stv1.r3cdf" *but there is > no such thing available in bioconductor website among the annotation > packages. May be that was applicable to an older Bioconductor release as > those threads were 2-3 years old. Doesn?t it imply that the currently > available ?*hugene10stv1? *is the correct one to use with ?affy?? On the > other hand, if it cannot be used why is it there in Bioconductor? Where do > we use the annotation ?*hugene10stv1?*? I am not sure what version of the unsupported cdf we used to create the cdf package. I see that there is in fact an unsupported cdf on the Affy website with an 'r3' in the file name. You could hypothetically download that cdf file and use the makecdfenv package to create a cdf package yourself. However, this will suffer from the same shortcomings as the cdf that we supply. Note also that this isn't an annotation package. Instead, it is a package that tells the affy package which probe belongs in which probeset, used during the summarization step. Best, Jim > > I read there are other packages such as ?aroma-affymetrix?, xps, etc, but I > am trying to do some simple things with standard, official, ?bioconductor? > packages. Any suggestions and helpful hints are highly appreciated. > > Here are the commands that used in Bioconductor version 2.8 (with R 2.13) > [Yes, I will update to most recent version soon!]. > > As an example, I used the CEL file, 'GSM857535.CEL.gz', down loaded from > GEO. > >> library(?affy?) >> as<- ReadAffy('GSM857535.CEL.gz') >> as >> aset<- rma(as) >> aset >> library('hugene10sttranscriptcluster.db') > x<- hugene10sttranscriptclusterSYMBOL > > xx<- x[mappedkeys(x)] > >> length(x) > [1] 33295 > >> length(xx) > [1] 21995 > > library(?oligo?) > > bs<- read.celfiles('GSM857535.CEL.gz') > >> bs >> bset<- rma(bs,target='core') >> bset > Thanks, > Asha > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6