pd.hugene.1.0.st.v1

0

Entering edit mode

Mark Robinson ★ 1.1k

@mark-robinson-2171

Last seen 9.6 years ago

Hi all. I wonder if its makes more sense to have the *transcript* version of this package, instead of the *probeset* version available when you install via: source("http://bioconductor.org/biocLite.R") biocLite("pd.hugene.1.0.st.v1") It seems like as a default, more people would want gene-level summaries for these arrays ... especially since ~200k (~80%) of the probesets have 3 probes or less. Of course I (and everyone around the world) could build this package locally from scratch using the transcript CSV, but it seems like there would be enough demand for this to make available direct from BioC. Just a thought. Does anyone agree? Or, am I missing something that will allow me to do gene-level analysis from this package? My session is below. Thanks in advance. Mark ---------------------- mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv 257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv 33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv ---------------------- ---------------------- > library(oligo) Loading required package: oligoClasses Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: preprocessCore Welcome to oligo version 1.8.1 > cf <- dir(celPath,"CEL") > fs <- read.celfiles( file.path(celPath,cf) ) Loading required package: pd.hugene.1.0.st.v1 Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL > rmaOligo <- oligo::rma(fs) Background correcting Normalizing Calculating Expression dmOligo <- exprs(rmaOligo) dim(rmaOligo) > dmOligo <- exprs(rmaOligo) > dim(rmaOligo) Features Samples 253002 4 > sessionInfo() R version 2.9.0 (2009-04-17) i386-apple-darwin8.11.1 locale: en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1 [3] DBI_0.2-4 oligo_1.8.1 [5] preprocessCore_1.6.0 oligoClasses_1.6.0 [7] Biobase_2.4.1 loaded via a namespace (and not attached): [1] affxparser_1.15.6 affyio_1.12.0 Biostrings_2.12.1 IRanges_1.2.2 [5] splines_2.9.0 ---------------------- ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robinson at garvan.org.au e: mrobinson at wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852

oligo oligo • 1.4k views

ADD COMMENT • link updated 14.7 years ago by Vincent J. Carey, Jr. 6.7k • written 14.7 years ago by Mark Robinson ★ 1.1k

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 5 weeks ago

United States

On Fri, Jul 31, 2009 at 12:48 AM, Mark Robinson<mrobinson at="" wehi.edu.au=""> wrote: > Hi all. > > I wonder if its makes more sense to have the *transcript* version of this > package, instead of the *probeset* version available when you install via: > This merits further discussion. Note that under the current approach you can obtain the transcript cluster indices for summarization using fData on the output of rma > class(tismix) [1] "GeneFeatureSet" attr(,"package") [1] "oligoClasses" > class(tismixRMA) [1] "ExpressionSet" attr(,"package") [1] "Biobase" > fData(tismixRMA)[1:4,] fsetid exon_id transcript_cluster_id level crosshyb_type chrom 7896737 7896737 96595542 7896736 NA 3 1 7896739 7896739 96595544 7896738 NA 3 1 7896741 7896741 96595546 7896740 NA 3 1 7896743 7896743 96595548 7896742 NA 3 1 accessions 7896737 <na> 7896739 <na> 7896741 BC136848,BC136907,ENST00000318050,ENST00000326183,ENST00000335 137,NM_001 004195,NM_001005240,NM_001005484 7896743 BC118988,ENST00000279067 > dim(fData(tismixRMA)) [1] 253002 7 > dim(exprs(tismixRMA)) [1] 253002 33 annotation packages are available at both the probescript and transcript cluster level, thanks to folks at city of hope (e.g., http://www.bioconductor.org/packages/release/data/annotation/html/huge ne10sttranscriptcluster.db.html) > source("http://bioconductor.org/biocLite.R") > biocLite("pd.hugene.1.0.st.v1") > > It seems like as a default, more people would want gene-level summaries for > these arrays ... especially since ~200k (~80%) of the probesets have 3 > probes or less. > > Of course I (and everyone around the world) could build this package locally > from scratch using the transcript CSV, but it seems like there would be > enough demand for this to make available direct from BioC. ?Just a thought. > ?Does anyone agree? > > Or, am I missing something that will allow me to do gene-level analysis from > this package? > > My session is below. > > Thanks in advance. > Mark > > > > ---------------------- > mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv > ?257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv > ? 33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv > ---------------------- > > > ---------------------- >> library(oligo) > Loading required package: oligoClasses > Loading required package: Biobase > > Welcome to Bioconductor > > ?Vignettes contain introductory material. To view, type > ?'openVignette()'. To cite Bioconductor, see > ?'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: preprocessCore > Welcome to oligo version 1.8.1 >> cf <- dir(celPath,"CEL") >> fs <- read.celfiles( file.path(celPath,cf) ) > Loading required package: pd.hugene.1.0.st.v1 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL > Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL > Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL > Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL >> rmaOligo <- oligo::rma(fs) > Background correcting > Normalizing > Calculating Expression > dmOligo <- exprs(rmaOligo) > dim(rmaOligo) >> dmOligo <- exprs(rmaOligo) >> dim(rmaOligo) > Features ?Samples > ?253002 ? ? ? ?4 >> sessionInfo() > R version 2.9.0 (2009-04-17) > i386-apple-darwin8.11.1 > > locale: > en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1 > [3] DBI_0.2-4 ? ? ? ? ? ? ? ? oligo_1.8.1 > [5] preprocessCore_1.6.0 ? ? ?oligoClasses_1.6.0 > [7] Biobase_2.4.1 > > loaded via a namespace (and not attached): > [1] affxparser_1.15.6 affyio_1.12.0 ? ? Biostrings_2.12.1 IRanges_1.2.2 > [5] splines_2.9.0 > ---------------------- > > > > > > > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robinson at garvan.org.au > e: mrobinson at wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Vincent Carey, PhD Biostatistics, Channing Lab 617 525 2265

ADD COMMENT • link 14.7 years ago Vincent J. Carey, Jr. 6.7k

Login before adding your answer.