How to do Affy ST array analysis
2
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.9 years ago
Hi all! I am feeling a little bit stupid, but I have been searching for two days now (maybe I search wrong?!) and could not figure it out. I want to analyze a Human Gene st array. I know that there is the oligo package, I found this annotation package here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used to the affy package and affy pipelines. All I find when searching for solutions are ways on how to make your own annotation package, that is not necessary, I think, because I found the pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way as I do with the for example hgu133a.db package that provides me the annotations. Im really lost... I want to do: - probe level analysis (similar to affyplm) - RMA normalization (Somehow oligo does this, I think) - Filter probes that are controls (as one does with affy: AFFX, for hgu133a) - annotation of probesets (normally, I would use the IQR filter to get unique entrez ids, but how do I do this with the ST array?) I know that there is something about probe and transcript to be aware of and core? But I cannot connect the workflow. I would be so happy if someone helped me, pointed me to the right docs. (the oligo userguide is not so helpful for me because I still dont understand what to do with what and when...) Sorry! Thanks! Ninni -- output of sessionInfo(): - -- Sent via the guest posting facility at bioconductor.org.
0
Entering edit mode
Bernd Klaus ▴ 600
@bernd-klaus-6281
Last seen 3.8 years ago
Germany
Hi Ninni, I guess a very simple workflow would be: 1.read celfiles library(oligo) rawData = read.celfiles(< character vector of celfiles >) 2. perform RMA and get "transcript cluster" summarized data back using only "core" genes ("safely" annotated genes according to affy) this is the default in oligo. Eset = rma(rawData,target="core") 3. Load annotation package and annotate "transcript clusters" with some stuff contained in that package. ## load Annotation package library("hugene20sttranscriptcluster.db") annotateGene = function ( db , what , missing ) { tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) mt = match ( featureNames ( Eset ) , tab$probe_id ) ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) } fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL ,"symbol" , missing = NA ) fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , "gene_name" , missing = NA ) fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , "ensembl_id" , missing = NA ) 4. After that keep only the "transcript clusters" that have a ENSEMBL Gene ID. (for example) Hope that helps, Bernd On Wed, 7 May 2014 05:06:00 -0700 (PDT) "Ninni Nahm $guest$" <guest at="" bioconductor.org=""> wrote: > > Hi all! > > I am feeling a little bit stupid, but I have been searching for two days now (maybe I search wrong?!) and could not figure it out. > I want to analyze a Human Gene st array. > I know that there is the oligo package, I found this annotation package here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used to the affy package and affy pipelines. > All I find when searching for solutions are ways on how to make your own annotation package, that is not necessary, I think, because I found the pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way as I do with the for example hgu133a.db package that provides me the annotations. > > Im really lost... > > I want to do: > > - probe level analysis (similar to affyplm) > - RMA normalization (Somehow oligo does this, I think) > - Filter probes that are controls (as one does with affy: AFFX, for hgu133a) > - annotation of probesets (normally, I would use the IQR filter to get unique entrez ids, but how do I do this with the ST array?) > > > I know that there is something about probe and transcript to be aware of and core? But I cannot connect the workflow. > > I would be so happy if someone helped me, pointed me to the right docs. (the oligo userguide is not so helpful for me because I still dont understand what to do with what and when...) Sorry! > > Thanks! > > Ninni > > -- output of sessionInfo(): > > - > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
Thank you! That was very very helpful! I wanted to ask if I can use the hugene20sttranscriptcluster.db package for all hugene arrays? I have one more to analyze, which is a st 1 array. Best Ninni On Wed, May 7, 2014 at 2:57 PM, Bernd Klaus <bernd.klaus@embl.de> wrote: > Hi Ninni, > > I guess a very simple workflow would be: > > 1.read celfiles > library(oligo) > rawData = read.celfiles(< character vector of celfiles >) > > 2. perform RMA and get "transcript cluster" summarized data back > using only "core" genes ("safely" annotated genes according to affy) > this is the default in oligo. > > Eset = rma(rawData,target="core") > > 3. Load annotation package and annotate "transcript clusters" with some > stuff contained in that package. > > ## load Annotation package > library("hugene20sttranscriptcluster.db") > > annotateGene = function ( db , what , missing ) { > tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) > mt = match ( featureNames ( Eset ) , tab$probe_id ) > ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) > } > > > fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL > ,"symbol" , missing = NA ) > fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , > "gene_name" , missing = NA ) > fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , > "ensembl_id" , missing = NA ) > > > 4. After that keep only the "transcript clusters" that have a ENSEMBL > Gene ID. > (for example) > > > Hope that helps, > > Bernd > > On Wed, 7 May 2014 05:06:00 -0700 (PDT) > "Ninni Nahm $guest$" <guest@bioconductor.org> wrote: > > > > > Hi all! > > > > I am feeling a little bit stupid, but I have been searching for two days > now (maybe I search wrong?!) and could not figure it out. > > I want to analyze a Human Gene st array. > > I know that there is the oligo package, I found this annotation package > here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used > to the affy package and affy pipelines. > > All I find when searching for solutions are ways on how to make your own > annotation package, that is not necessary, I think, because I found the > pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way > as I do with the for example hgu133a.db package that provides me the > annotations. > > > > Im really lost... > > > > I want to do: > > > > - probe level analysis (similar to affyplm) > > - RMA normalization (Somehow oligo does this, I think) > > - Filter probes that are controls (as one does with affy: AFFX, for > hgu133a) > > - annotation of probesets (normally, I would use the IQR filter to get > unique entrez ids, but how do I do this with the ST array?) > > > > > > I know that there is something about probe and transcript to be aware of > and core? But I cannot connect the workflow. > > > > I would be so happy if someone helped me, pointed me to the right docs. > (the oligo userguide is not so helpful for me because I still dont > understand what to do with what and when...) Sorry! > > > > Thanks! > > > > Ninni > > > > -- output of sessionInfo(): > > > > - > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
0
Entering edit mode
Hi Ninni, no, you need to switch, there is an annotation data base for every single platform (thanks to Jim MacDonald!), search here: http://bioconductor.org/packages/release/BiocViews.html#___AnnotationD ata The 1.0 one is here: http://bioconductor.org/packages/release/data/annotation/html hugene10sttranscriptcluster.db.html Best wishes, Bernd On Wed, 7 May 2014 18:17:19 +0200 Ninni Nahm <ninninahm at="" gmail.com=""> wrote: > Thank you! That was very very helpful! > I wanted to ask if I can use the hugene20sttranscriptcluster.db package for > all hugene arrays? I have one more to analyze, which is a st 1 array. > Best > Ninni > > > > On Wed, May 7, 2014 at 2:57 PM, Bernd Klaus <bernd.klaus at="" embl.de=""> wrote: > > > Hi Ninni, > > > > I guess a very simple workflow would be: > > > > 1.read celfiles > > library(oligo) > > rawData = read.celfiles(< character vector of celfiles >) > > > > 2. perform RMA and get "transcript cluster" summarized data back > > using only "core" genes ("safely" annotated genes according to affy) > > this is the default in oligo. > > > > Eset = rma(rawData,target="core") > > > > 3. Load annotation package and annotate "transcript clusters" with some > > stuff contained in that package. > > > > ## load Annotation package > > library("hugene20sttranscriptcluster.db") > > > > annotateGene = function ( db , what , missing ) { > > tab = toTable(db[intersect(featureNames(Eset), mappedkeys(db)) ]) > > mt = match ( featureNames ( Eset ) , tab$probe_id ) > > ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) > > } > > > > > > fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL > > ,"symbol" , missing = NA ) > > fData(Eset)$genename = annotateGene( hugene20sttranscriptclusterGENENAME , > > "gene_name" , missing = NA ) > > fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL , > > "ensembl_id" , missing = NA ) > > > > > > 4. After that keep only the "transcript clusters" that have a ENSEMBL > > Gene ID. > > (for example) > > > > > > Hope that helps, > > > > Bernd > > > > On Wed, 7 May 2014 05:06:00 -0700 (PDT) > > "Ninni Nahm $guest$" <guest at="" bioconductor.org=""> wrote: > > > > > > > > Hi all! > > > > > > I am feeling a little bit stupid, but I have been searching for two days > > now (maybe I search wrong?!) and could not figure it out. > > > I want to analyze a Human Gene st array. > > > I know that there is the oligo package, I found this annotation package > > here pd.hugene.2.0.st, but, I do not know how to do the steps. I am used > > to the affy package and affy pipelines. > > > All I find when searching for solutions are ways on how to make your own > > annotation package, that is not necessary, I think, because I found the > > pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same way > > as I do with the for example hgu133a.db package that provides me the > > annotations. > > > > > > Im really lost... > > > > > > I want to do: > > > > > > - probe level analysis (similar to affyplm) > > > - RMA normalization (Somehow oligo does this, I think) > > > - Filter probes that are controls (as one does with affy: AFFX, for > > hgu133a) > > > - annotation of probesets (normally, I would use the IQR filter to get > > unique entrez ids, but how do I do this with the ST array?) > > > > > > > > > I know that there is something about probe and transcript to be aware of > > and core? But I cannot connect the workflow. > > > > > > I would be so happy if someone helped me, pointed me to the right docs. > > (the oligo userguide is not so helpful for me because I still dont > > understand what to do with what and when...) Sorry! > > > > > > Thanks! > > > > > > Ninni > > > > > > -- output of sessionInfo(): > > > > > > - > > > > > > -- > > > Sent via the guest posting facility at bioconductor.org. > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
0
Entering edit mode
Thank you, Bernd! and of course Jim!! :) On Wed, May 7, 2014 at 6:22 PM, Bernd Klaus <bernd.klaus@embl.de> wrote: > Hi Ninni, > > no, you need to switch, there is an annotation data base for every single > platform (thanks to Jim MacDonald!), search here: > > http://bioconductor.org/packages/release/BiocViews.html#___Annotatio nData > > The 1.0 one is here: > > http://bioconductor.org/packages/release/data/annotation/htmlhugene1 0sttranscriptcluster.db.html > > Best wishes, > > Bernd > > On Wed, 7 May 2014 18:17:19 +0200 > Ninni Nahm <ninninahm@gmail.com> wrote: > > > Thank you! That was very very helpful! > > I wanted to ask if I can use the hugene20sttranscriptcluster.db package > for > > all hugene arrays? I have one more to analyze, which is a st 1 array. > > Best > > Ninni > > > > > > > > On Wed, May 7, 2014 at 2:57 PM, Bernd Klaus <bernd.klaus@embl.de> wrote: > > > > > Hi Ninni, > > > > > > I guess a very simple workflow would be: > > > > > > 1.read celfiles > > > library(oligo) > > > rawData = read.celfiles(< character vector of celfiles >) > > > > > > 2. perform RMA and get "transcript cluster" summarized data back > > > using only "core" genes ("safely" annotated genes according to affy) > > > this is the default in oligo. > > > > > > Eset = rma(rawData,target="core") > > > > > > 3. Load annotation package and annotate "transcript clusters" with some > > > stuff contained in that package. > > > > > > ## load Annotation package > > > library("hugene20sttranscriptcluster.db") > > > > > > annotateGene = function ( db , what , missing ) { > > > tab = toTable(db[intersect(featureNames(Eset), > mappedkeys(db)) ]) > > > mt = match ( featureNames ( Eset ) , tab$probe_id ) > > > ifelse ( is.na(mt), missing , tab[[ what ]][ mt ]) > > > } > > > > > > > > > fData(Eset)$symbol = annotateGene( hugene20sttranscriptclusterSYMBOL > > > ,"symbol" , missing = NA ) > > > fData(Eset)$genename = annotateGene( > hugene20sttranscriptclusterGENENAME , > > > "gene_name" , missing = NA ) > > > fData(Eset)$ensembl = annotateGene( hugene20sttranscriptclusterENSEMBL > , > > > "ensembl_id" , missing = NA ) > > > > > > > > > 4. After that keep only the "transcript clusters" that have a ENSEMBL > > > Gene ID. > > > (for example) > > > > > > > > > Hope that helps, > > > > > > Bernd > > > > > > On Wed, 7 May 2014 05:06:00 -0700 (PDT) > > > "Ninni Nahm $guest$" <guest@bioconductor.org> wrote: > > > > > > > > > > > Hi all! > > > > > > > > I am feeling a little bit stupid, but I have been searching for two > days > > > now (maybe I search wrong?!) and could not figure it out. > > > > I want to analyze a Human Gene st array. > > > > I know that there is the oligo package, I found this annotation > package > > > here pd.hugene.2.0.st, but, I do not know how to do the steps. I am > used > > > to the affy package and affy pipelines. > > > > All I find when searching for solutions are ways on how to make your > own > > > annotation package, that is not necessary, I think, because I found the > > > pd.hugene.2.0.st. Or am I wrong? Somehow I can t use it in the same > way > > > as I do with the for example hgu133a.db package that provides me the > > > annotations. > > > > > > > > Im really lost... > > > > > > > > I want to do: > > > > > > > > - probe level analysis (similar to affyplm) > > > > - RMA normalization (Somehow oligo does this, I think) > > > > - Filter probes that are controls (as one does with affy: AFFX, for > > > hgu133a) > > > > - annotation of probesets (normally, I would use the IQR filter to > get > > > unique entrez ids, but how do I do this with the ST array?) > > > > > > > > > > > > I know that there is something about probe and transcript to be > aware of > > > and core? But I cannot connect the workflow. > > > > > > > > I would be so happy if someone helped me, pointed me to the right > docs. > > > (the oligo userguide is not so helpful for me because I still dont > > > understand what to do with what and when...) Sorry! > > > > > > > > Thanks! > > > > > > > > Ninni > > > > > > > > -- output of sessionInfo(): > > > > > > > > - > > > > > > > > -- > > > > Sent via the guest posting facility at bioconductor.org. > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
0
Entering edit mode
@benhrifoussama-8085
Last seen 7.2 years ago

hello ,

can you give me an example of pipeline where i use rma(rawData, target='probeset')

for the exon summarization level

thanks