How to do Affy ST array analysis
2
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.9 years ago
Hi all! I am feeling a little bit stupid, but I have been searching for two days now (maybe I search wrong?!) and could not figure it out. I want to analyze a Human Gene st array. I know that there is the oligo package, I found this annotation package here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used to the affy package and affy pipelines. All I find when searching for solutions are ways on how to make your own annotation package, that is not necessary, I think, because I found the pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way as I do with the for example hgu133a.db package that provides me the annotations. Im really lost... I want to do: - probe level analysis (similar to affyplm) - RMA normalization (Somehow oligo does this, I think) - Filter probes that are controls (as one does with affy: AFFX, for hgu133a) - annotation of probesets (normally, I would use the IQR filter to get unique entrez ids, but how do I do this with the ST array?) I know that there is something about probe and transcript to be aware of and core? But I cannot connect the workflow. I would be so happy if someone helped me, pointed me to the right docs. (the oligo userguide is not so helpful for me because I still dont understand what to do with what and when...) Sorry! Thanks! Ninni -- output of sessionInfo(): - -- Sent via the guest posting facility at bioconductor.org.
Annotation Normalization hgu133a probe affy oligo Annotation Normalization hgu133a probe • 4.8k views
ADD COMMENT
0
Entering edit mode
Bernd Klaus ▴ 600
@bernd-klaus-6281
Last seen 3.8 years ago
Germany
Hi Ninni, I guess a very simple workflow would be: 1.read celfiles library(oligo) rawData = read.celfiles(< character vector of celfiles >) 2. perform RMA and get "transcript cluster" summarized data back using only "core" genes ("safely" annotated genes according to affy) this is the default in oligo. Eset = rma(rawData,target="core") 3. Load annotation package and annotate "transcript clusters" with some stuff contained in that package. ## load Annotation package library("hugene20sttranscriptcluster.db") annotateGene = function ( db , what , missing ) { tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) mt = match ( featureNames ( Eset ) , tab$probe_id ) ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) } fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL ,"symbol" , missing = NA ) fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , "gene_name" , missing = NA ) fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , "ensembl_id" , missing = NA ) 4. After that keep only the "transcript clusters" that have a ENSEMBL Gene ID. (for example) Hope that helps, Bernd On Wed, 7 May 2014 05:06:00 -0700 (PDT) "Ninni Nahm \[guest\]" <guest at="" bioconductor.org=""> wrote: > > Hi all! > > I am feeling a little bit stupid, but I have been searching for two days now (maybe I search wrong?!) and could not figure it out. > I want to analyze a Human Gene st array. > I know that there is the oligo package, I found this annotation package here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used to the affy package and affy pipelines. > All I find when searching for solutions are ways on how to make your own annotation package, that is not necessary, I think, because I found the pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way as I do with the for example hgu133a.db package that provides me the annotations. > > Im really lost... > > I want to do: > > - probe level analysis (similar to affyplm) > - RMA normalization (Somehow oligo does this, I think) > - Filter probes that are controls (as one does with affy: AFFX, for hgu133a) > - annotation of probesets (normally, I would use the IQR filter to get unique entrez ids, but how do I do this with the ST array?) > > > I know that there is something about probe and transcript to be aware of and core? But I cannot connect the workflow. > > I would be so happy if someone helped me, pointed me to the right docs. (the oligo userguide is not so helpful for me because I still dont understand what to do with what and when...) Sorry! > > Thanks! > > Ninni > > -- output of sessionInfo(): > > - > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Thank you! That was very very helpful! I wanted to ask if I can use the hugene20sttranscriptcluster.db package for all hugene arrays? I have one more to analyze, which is a st 1 array. Best Ninni On Wed, May 7, 2014 at 2:57 PM, Bernd Klaus <bernd.klaus@embl.de> wrote: > Hi Ninni, > > I guess a very simple workflow would be: > > 1.read celfiles > library(oligo) > rawData = read.celfiles(< character vector of celfiles >) > > 2. perform RMA and get "transcript cluster" summarized data back > using only "core" genes ("safely" annotated genes according to affy) > this is the default in oligo. > > Eset = rma(rawData,target="core") > > 3. Load annotation package and annotate "transcript clusters" with some > stuff contained in that package. > > ## load Annotation package > library("hugene20sttranscriptcluster.db") > > annotateGene = function ( db , what , missing ) { > tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) > mt = match ( featureNames ( Eset ) , tab$probe_id ) > ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) > } > > > fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL > ,"symbol" , missing = NA ) > fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , > "gene_name" , missing = NA ) > fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , > "ensembl_id" , missing = NA ) > > > 4. After that keep only the "transcript clusters" that have a ENSEMBL > Gene ID. > (for example) > > > Hope that helps, > > Bernd > > On Wed, 7 May 2014 05:06:00 -0700 (PDT) > "Ninni Nahm \[guest\]" <guest@bioconductor.org> wrote: > > > > > Hi all! > > > > I am feeling a little bit stupid, but I have been searching for two days > now (maybe I search wrong?!) and could not figure it out. > > I want to analyze a Human Gene st array. > > I know that there is the oligo package, I found this annotation package > here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used > to the affy package and affy pipelines. > > All I find when searching for solutions are ways on how to make your own > annotation package, that is not necessary, I think, because I found the > pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way > as I do with the for example hgu133a.db package that provides me the > annotations. > > > > Im really lost... > > > > I want to do: > > > > - probe level analysis (similar to affyplm) > > - RMA normalization (Somehow oligo does this, I think) > > - Filter probes that are controls (as one does with affy: AFFX, for > hgu133a) > > - annotation of probesets (normally, I would use the IQR filter to get > unique entrez ids, but how do I do this with the ST array?) > > > > > > I know that there is something about probe and transcript to be aware of > and core? But I cannot connect the workflow. > > > > I would be so happy if someone helped me, pointed me to the right docs. > (the oligo userguide is not so helpful for me because I still dont > understand what to do with what and when...) Sorry! > > > > Thanks! > > > > Ninni > > > > -- output of sessionInfo(): > > > > - > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Ninni, no, you need to switch, there is an annotation data base for every single platform (thanks to Jim MacDonald!), search here: http://bioconductor.org/packages/release/BiocViews.html#___AnnotationD ata The 1.0 one is here: http://bioconductor.org/packages/release/data/annotation/html hugene10sttranscriptcluster.db.html Best wishes, Bernd On Wed, 7 May 2014 18:17:19 +0200 Ninni Nahm <ninninahm at="" gmail.com=""> wrote: > Thank you! That was very very helpful! > I wanted to ask if I can use the hugene20sttranscriptcluster.db package for > all hugene arrays? I have one more to analyze, which is a st 1 array. > Best > Ninni > > > > On Wed, May 7, 2014 at 2:57 PM, Bernd Klaus <bernd.klaus at="" embl.de=""> wrote: > > > Hi Ninni, > > > > I guess a very simple workflow would be: > > > > 1.read celfiles > > library(oligo) > > rawData = read.celfiles(< character vector of celfiles >) > > > > 2. perform RMA and get "transcript cluster" summarized data back > > using only "core" genes ("safely" annotated genes according to affy) > > this is the default in oligo. > > > > Eset = rma(rawData,target="core") > > > > 3. Load annotation package and annotate "transcript clusters" with some > > stuff contained in that package. > > > > ## load Annotation package > > library("hugene20sttranscriptcluster.db") > > > > annotateGene = function ( db , what , missing ) { > > tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) > > mt = match ( featureNames ( Eset ) , tab$probe_id ) > > ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) > > } > > > > > > fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL > > ,"symbol" , missing = NA ) > > fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , > > "gene_name" , missing = NA ) > > fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , > > "ensembl_id" , missing = NA ) > > > > > > 4. After that keep only the "transcript clusters" that have a ENSEMBL > > Gene ID. > > (for example) > > > > > > Hope that helps, > > > > Bernd > > > > On Wed, 7 May 2014 05:06:00 -0700 (PDT) > > "Ninni Nahm \[guest\]" <guest at="" bioconductor.org=""> wrote: > > > > > > > > Hi all! > > > > > > I am feeling a little bit stupid, but I have been searching for two days > > now (maybe I search wrong?!) and could not figure it out. > > > I want to analyze a Human Gene st array. > > > I know that there is the oligo package, I found this annotation package > > here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used > > to the affy package and affy pipelines. > > > All I find when searching for solutions are ways on how to make your own > > annotation package, that is not necessary, I think, because I found the > > pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way > > as I do with the for example hgu133a.db package that provides me the > > annotations. > > > > > > Im really lost... > > > > > > I want to do: > > > > > > - probe level analysis (similar to affyplm) > > > - RMA normalization (Somehow oligo does this, I think) > > > - Filter probes that are controls (as one does with affy: AFFX, for > > hgu133a) > > > - annotation of probesets (normally, I would use the IQR filter to get > > unique entrez ids, but how do I do this with the ST array?) > > > > > > > > > I know that there is something about probe and transcript to be aware of > > and core? But I cannot connect the workflow. > > > > > > I would be so happy if someone helped me, pointed me to the right docs. > > (the oligo userguide is not so helpful for me because I still dont > > understand what to do with what and when...) Sorry! > > > > > > Thanks! > > > > > > Ninni > > > > > > -- output of sessionInfo(): > > > > > > - > > > > > > -- > > > Sent via the guest posting facility at bioconductor.org. > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
Thank you, Bernd! and of course Jim!! :) On Wed, May 7, 2014 at 6:22 PM, Bernd Klaus <bernd.klaus@embl.de> wrote: > Hi Ninni, > > no, you need to switch, there is an annotation data base for every single > platform (thanks to Jim MacDonald!), search here: > > http://bioconductor.org/packages/release/BiocViews.html#___Annotatio nData > > The 1.0 one is here: > > http://bioconductor.org/packages/release/data/annotation/htmlhugene1 0sttranscriptcluster.db.html > > Best wishes, > > Bernd > > On Wed, 7 May 2014 18:17:19 +0200 > Ninni Nahm <ninninahm@gmail.com> wrote: > > > Thank you! That was very very helpful! > > I wanted to ask if I can use the hugene20sttranscriptcluster.db package > for > > all hugene arrays? I have one more to analyze, which is a st 1 array. > > Best > > Ninni > > > > > > > > On Wed, May 7, 2014 at 2:57 PM, Bernd Klaus <bernd.klaus@embl.de> wrote: > > > > > Hi Ninni, > > > > > > I guess a very simple workflow would be: > > > > > > 1.read celfiles > > > library(oligo) > > > rawData = read.celfiles(< character vector of celfiles >) > > > > > > 2. perform RMA and get "transcript cluster" summarized data back > > > using only "core" genes ("safely" annotated genes according to affy) > > > this is the default in oligo. > > > > > > Eset = rma(rawData,target="core") > > > > > > 3. Load annotation package and annotate "transcript clusters" with some > > > stuff contained in that package. > > > > > > ## load Annotation package > > > library("hugene20sttranscriptcluster.db") > > > > > > annotateGene = function ( db , what , missing ) { > > > tab = toTable(db[intersect(featureNames(Eset), > mappedkeys(db)) ]) > > > mt = match ( featureNames ( Eset ) , tab$probe_id ) > > > ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) > > > } > > > > > > > > > fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL > > > ,"symbol" , missing = NA ) > > > fData(Eset)$genename = annotateGene( > hugene20sttranscriptclusterGENENAME , > > > "gene_name" , missing = NA ) > > > fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL > , > > > "ensembl_id" , missing = NA ) > > > > > > > > > 4. After that keep only the "transcript clusters" that have a ENSEMBL > > > Gene ID. > > > (for example) > > > > > > > > > Hope that helps, > > > > > > Bernd > > > > > > On Wed, 7 May 2014 05:06:00 -0700 (PDT) > > > "Ninni Nahm \[guest\]" <guest@bioconductor.org> wrote: > > > > > > > > > > > Hi all! > > > > > > > > I am feeling a little bit stupid, but I have been searching for two > days > > > now (maybe I search wrong?!) and could not figure it out. > > > > I want to analyze a Human Gene st array. > > > > I know that there is the oligo package, I found this annotation > package > > > here pd.hugene.2.0.st, but, I do not know how to do the steps. I am > used > > > to the affy package and affy pipelines. > > > > All I find when searching for solutions are ways on how to make your > own > > > annotation package, that is not necessary, I think, because I found the > > > pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same > way > > > as I do with the for example hgu133a.db package that provides me the > > > annotations. > > > > > > > > Im really lost... > > > > > > > > I want to do: > > > > > > > > - probe level analysis (similar to affyplm) > > > > - RMA normalization (Somehow oligo does this, I think) > > > > - Filter probes that are controls (as one does with affy: AFFX, for > > > hgu133a) > > > > - annotation of probesets (normally, I would use the IQR filter to > get > > > unique entrez ids, but how do I do this with the ST array?) > > > > > > > > > > > > I know that there is something about probe and transcript to be > aware of > > > and core? But I cannot connect the workflow. > > > > > > > > I would be so happy if someone helped me, pointed me to the right > docs. > > > (the oligo userguide is not so helpful for me because I still dont > > > understand what to do with what and when...) Sorry! > > > > > > > > Thanks! > > > > > > > > Ninni > > > > > > > > -- output of sessionInfo(): > > > > > > > > - > > > > > > > > -- > > > > Sent via the guest posting facility at bioconductor.org. > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@benhrifoussama-8085
Last seen 7.2 years ago

hello ,

can you give me an example of pipeline where i use rma(rawData, target='probeset')

 

for the exon summarization level

 

thanks

ADD COMMENT

Login before adding your answer.

Traffic: 287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6