How to do Affy ST array analysis
Hi all! I am feeling a little bit stupid, but I have been searching for two days now (maybe I search wrong?!) and could not figure it out. I want to analyze a Human Gene st array. I know that there is the oligo package, I found this annotation package here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used to the affy package and affy pipelines. All I find when searching for solutions are ways on how to make your own annotation package, that is not necessary, I think, because I found the pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way as I do with the for example hgu133a.db package that provides me the annotations. Im really lost... I want to do: - probe level analysis (similar to affyplm) - RMA normalization (Somehow oligo does this, I think) - Filter probes that are controls (as one does with affy: AFFX, for hgu133a) - annotation of probesets (normally, I would use the IQR filter to get unique entrez ids, but how do I do this with the ST array?) I know that there is something about probe and transcript to be aware of and core? But I cannot connect the workflow. I would be so happy if someone helped me, pointed me to the right docs. (the oligo userguide is not so helpful for me because I still dont understand what to do with what and when...) Sorry! Thanks! Ninni -- output of sessionInfo(): - -- Sent via the guest posting facility at bioconductor.org.
Hi Ninni, I guess a very simple workflow would be: 1.read celfiles library(oligo) rawData = read.celfiles(< character vector of celfiles >) 2. perform RMA and get "transcript cluster" summarized data back using only "core" genes ("safely" annotated genes according to affy) this is the default in oligo. Eset = rma(rawData,target="core") 3. Load annotation package and annotate "transcript clusters" with some stuff contained in that package. ## load Annotation package library("hugene20sttranscriptcluster.db") annotateGene = function ( db , what , missing ) { tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) mt = match ( featureNames ( Eset ) , tab$probe_id ) ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) } fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL ,"symbol" , missing = NA ) fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , "gene_name" , missing = NA ) fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , "ensembl_id" , missing = NA ) 4. After that keep only the "transcript clusters" that have a ENSEMBL Gene ID. (for example) Hope that helps, Bernd On Wed, 7 May 2014 05:06:00 -0700 (PDT) "Ninni Nahm $guest$" <guest at="" bioconductor.org=""> wrote: > > Hi all! > > I am feeling a little bit stupid, but I have been searching for two days now (maybe I search wrong?!) and could not figure it out. > I want to analyze a Human Gene st array. > I know that there is the oligo package, I found this annotation package here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used to the affy package and affy pipelines. > All I find when searching for solutions are ways on how to make your own annotation package, that is not necessary, I think, because I found the pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way as I do with the for example hgu133a.db package that provides me the annotations. > > Im really lost... > > I want to do: > > - probe level analysis (similar to affyplm) > - RMA normalization (Somehow oligo does this, I think) > - Filter probes that are controls (as one does with affy: AFFX, for hgu133a) > - annotation of probesets (normally, I would use the IQR filter to get unique entrez ids, but how do I do this with the ST array?) > > > I know that there is something about probe and transcript to be aware of and core? But I cannot connect the workflow. > > I would be so happy if someone helped me, pointed me to the right docs. (the oligo userguide is not so helpful for me because I still dont understand what to do with what and when...) Sorry! > > Thanks! > > Ninni > > -- output of sessionInfo(): > > - > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Thank you! That was very very helpful! I wanted to ask if I can use the hugene20sttranscriptcluster.db package for all hugene arrays? I have one more to analyze, which is a st 1 array.
Hi Ninni, no, you need to switch, there is an annotation data base for every single platform (thanks to Jim MacDonald!), search here: http://bioconductor.org/packages/release/BiocViews.html#___AnnotationD ata The 1.0 one is here: http://bioconductor.org/packages/release/data/annotation/html hugene10sttranscriptcluster.db.html
Thank you, Bernd! and of course Jim!! :)
hello ,

can you give me an example of pipeline where i use rma(rawData, target='probeset')

for the exon summarization level

thanks