Filtering pmSequence based on probe target level for HTA 2.0 arrays

0

Entering edit mode

Stephen Piccolo ▴ 590

@stephen-piccolo-6761

Last seen 3.6 years ago

United States

List members, I am working with some Affymetrix HTA 2.0 arrays. I have installed the draft annotation package described here: http://grokbase.com/t/r/bioconductor/1428394w2d/bioc-draft-support- for-hta- 2-0-with-oligo I am using the following commands from the oligo package to extract intensity values and PM sequences via the oligo package. However, I am running into a problem because the oligo::pmSequence function doesn't allow me to specify a target probe type for these arrays. By default oligo::pm() uses the "core" probes, whereas oligo::pmSequence only allows me to use the "probeset" probes. In contrast, for the ST arrays, I am able to do this. affyExpressionFS <- read.celfiles(celFilePath) pint = oligo::pm(affyExpressionFS, target="core") pmSeq = oligo::pmSequence(affyExpressionFS, target="core") Below is the error message I get. Loading required package: pd.hta.2.0 Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Reading in : testInputData/HTA2.CEL.gz Error in { : task 1 failed - "unused argument (target = "probeset")" Below is my session info. Any help would be appreciated. R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel methods stats graphics grDevices utils datasets [8] base other attached packages: [1] pd.hta.2.0_3.8.0 RSQLite_0.11.4 DBI_0.2-7 [4] GEOquery_2.30.1 sva_3.10.0 mgcv_1.8-2 [7] nlme_3.1-117 corpcor_1.6.6 foreach_1.4.2 [10] oligo_1.28.2 Biostrings_2.32.1 XVector_0.4.0 [13] IRanges_1.22.10 Biobase_2.24.0 oligoClasses_1.26.0 [16] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 [4] bit_1.1-12 codetools_0.2-8 compiler_3.1.0 [7] ff_2.2-13 GenomeInfoDb_1.0.2 GenomicRanges_1.16.4 [10] grid_3.1.0 iterators_1.0.7 lattice_0.20-29 [13] Matrix_1.1-4 preprocessCore_1.26.1 RCurl_1.95-4.3 [16] splines_3.1.0 stats4_3.1.0 XML_3.98-1.1 [19] zlibbioc_1.10.0 Regards, -Steve -??????????????????????????????????? Stephen Piccolo, Ph.D. Postdoctoral Research Associate Affiliations: Department of Pharmacology and Toxicology, University of Utah Division of Computational Biomedicine, Boston University School of Medicine ???????????????????????????????????

Annotation probe oligo Annotation probe oligo • 2.0k views

ADD COMMENT • link updated 9.7 years ago by James W. MacDonald 65k • written 9.7 years ago by Stephen Piccolo ▴ 590

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

Hi Steve, It looks like pmSequence() for HTAFeatureSet objects dispatches on the FeatureSet class: > showMethods(pmSequence, class="FeatureSet", includeDefs = TRUE) Function: pmSequence (package oligo) object="FeatureSet" function (object, ...) { .local <- function (object) { pmSequence(getPlatformDesign(object)) } .local(object, ...) } which doesn't allow for a target argument. I haven't looked closer to see why the dispatch is off. But it appears it should use stArrayDBPDInfo class: > showMethods(pmSequence) Function: pmSequence (package oligo) object="AffyGenePDInfo" object="AffyHTAPDInfo" (inherited from: object="stArrayDBPDInfo") object="AffySNPPDInfo" object="DBPDInfo" object="ExonFeatureSet" object="FeatureSet" object="GeneFeatureSet" object="HTAFeatureSet" (inherited from: object="FeatureSet") object="stArrayDBPDInfo" Which we can force by doing something like z <- pmSequence(getPD(dat), target="probeset") where 'dat' is a HTAFeatureSet. But we still get more probe sequences than I would expect: > pmid1 <- pmindex(dat, target="core") > pmid2 <- pmindex(dat, target="probeset") > length(pmid1) [1] 6058440 > length(pmid2) [1] 7576209 But since both pmid1 and pmid2 are ordered, I think you should be able to get the pmSequences for just the probes that will be summarized at the 'core' level by subsetting: > z.core <- z[pmid2 %in% pmid1,] > z.core A DNAStringSet instance of length 6056075 width seq [1] 25 GATTAATCTTAAATCAGGATGATCC [2] 25 CAAAATCTAAACCCGGACTGTACCT [3] 25 CACACTATTCACACCCGCACCGAAG [4] 25 CCGTACCTTTCAAGGTCGGCCAAGC [5] 25 ACCCCTTGACTAAGGACGGTTGTTG ... ... ... [6056071] 25 TCACCGTGTGTCGACGCCGGACACA [6056072] 25 AGGTTCCTGGGACCTCGTGAGTACA [6056073] 25 GACCCAGAGTGTAGCTCGACGACCT [6056074] 25 ACCACAGGTACGACACTACTAAGGA [6056075] 25 TGGCCTTCCGTGCATATCTGCACCT Best, Jim On Wed, Aug 20, 2014 at 10:55 AM, Steve Piccolo < stephen.piccolo at hsc.utah.edu> wrote: > List members, > > I am working with some Affymetrix HTA 2.0 arrays. I have installed the > draft annotation package described here: > http://grokbase.com/t/r/bioconductor/1428394w2d/bioc-draft-support- for-hta- > 2-0-with-oligo > > I am using the following commands from the oligo package to extract > intensity values and PM sequences via the oligo package. However, I am > running into a problem because the oligo::pmSequence function doesn't > allow me to specify a target probe type for these arrays. By default > oligo::pm() uses the "core" probes, whereas oligo::pmSequence only allows > me to use the "probeset" probes. In contrast, for the ST arrays, I am able > to do this. > > affyExpressionFS <- read.celfiles(celFilePath) > pint = oligo::pm(affyExpressionFS, target="core") > > pmSeq = oligo::pmSequence(affyExpressionFS, target="core") > > > > Below is the error message I get. > > Loading required package: pd.hta.2.0 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : testInputData/HTA2.CEL.gz > Error in { : task 1 failed - "unused argument (target = "probeset")" > > Below is my session info. Any help would be appreciated. > > > R version 3.1.0 (2014-04-10) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel methods stats graphics grDevices utils datasets > [8] base > > other attached packages: > [1] pd.hta.2.0_3.8.0 RSQLite_0.11.4 DBI_0.2-7 > [4] GEOquery_2.30.1 sva_3.10.0 mgcv_1.8-2 > [7] nlme_3.1-117 corpcor_1.6.6 foreach_1.4.2 > [10] oligo_1.28.2 Biostrings_2.32.1 XVector_0.4.0 > [13] IRanges_1.22.10 Biobase_2.24.0 oligoClasses_1.26.0 > [16] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 > [4] bit_1.1-12 codetools_0.2-8 compiler_3.1.0 > [7] ff_2.2-13 GenomeInfoDb_1.0.2 GenomicRanges_1.16.4 > [10] grid_3.1.0 iterators_1.0.7 lattice_0.20-29 > [13] Matrix_1.1-4 preprocessCore_1.26.1 RCurl_1.95-4.3 > [16] splines_3.1.0 stats4_3.1.0 XML_3.98-1.1 > [19] zlibbioc_1.10.0 > > > > > Regards, > -Steve > > -??????????????????????????????????? > Stephen Piccolo, Ph.D. > Postdoctoral Research Associate > > Affiliations: > Department of Pharmacology and Toxicology, University of Utah > Division of Computational Biomedicine, Boston University School of > Medicine > ??????????????????????????????????? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD COMMENT • link 9.7 years ago James W. MacDonald 65k

Login before adding your answer.