Question

Clariom D Human Microarray CDF file to package

0

Entering edit mode

jayme.rickman • 0

@jaymerickman-11439

Last seen 8.6 years ago

I am using R studio 3.3.1, and attempting to use 'make.cdf.package' from a cdf for the Clariom_D_Human.r1.Gene.cdf as obtained from Affymetrix and following this tutorial.

after calling 'make.cdf.package'

make.cdf.package("Clariom_D_Human.r1.Gene.CDF", cdf.path = "C:/Users/..../Clariom_D_Human.r1.Gene.CDF", compress = FALSE, species = "Homo_sapiens", package.path = pkgpath)

I get the following error:

"Error in isCDFXDA(file.path(path.expand(cdf.path), filename)):
Unable to open this file C:/Users/.../Clariom_D_Human.r1.Gene.CDF"

I obviously put in the whole path, simply for ease of typing have abbreviated it here. This as a result has made it so that I am unable to process the data or normalize it. I have called both AffyMetrix and the Laboratory whom preformed the MicroArray and was met with "we do not support this on third party programs" and "we couldn't get it to work either so we just used affy expression console" consecutively.
Suggestions of perhaps it is not yet supported by Bioconductor have also come in, but as I am attempting to treat it as a custom CDF this should not be of issue.

Thank you for any help

simpleaffy makecdfenv • 4.9k views

ADD COMMENT • link 8.6 years ago jayme.rickman • 0

0

Entering edit mode

So I ran the code, and am getting a strsplit error, as this is not a table within R, but a file outside of R I am not sure of the preferred method to coerce it to character arguments.

> seed <- new("AffyHTAPDInfoPkgSeed", pgfFile = "Clariom_D_Human.r1.pgf", clfFile = "Clariom_D_Human.r1.clf", coreMps = "Clariom_D_Human.r1.mps", transFile = "Clariom_D_Human.na36.hg38.transcript.csv", probeFile = "Clariom_D_Human.na36.hg38.probeset.csv", author = my name", email = "my email", version = "0.0.1") > makePdInfoPackage(seed) ============================================ Building annotation package for Affymetrix HTA Array PGF.........: Clariom_D_Human.r1.pgf CLF.........: Clariom_D_Human.r1.clf Probeset....: Clariom_D_Human.na36.hg38.probeset.csv Transcript..: Clariom_D_Human.na36.hg38.transcript.csv Core MPS....: Clariom_D_Human.r1.mps ============================================ Parsing file: Clariom_D_Human.r1.pgf... OK Parsing file: Clariom_D_Human.r1.clf... OK Creating initial table for probes... OK Creating dictionaries... OK Parsing file: Clariom_D_Human.na36.hg38.probeset.csv... OK Parsing file: Clariom_D_Human.r1.mps... Error in strsplit(mps[["probeset_list"]], " ") : non-character argument

edited: formatting code.

ADD REPLY • link 8.6 years ago jayme.rickman • 0

0

Entering edit mode

If you want to comment on a post, please click the ADD COMMENT button and use the box that pops up. The box below is intended for people to use for answering questions.

Anyway, here's what I get:

> dir()
 [1] "Clariom_D_Human.affymetrix.report_thresholds"
 [2] "Clariom_D_Human_Analysis.r1.zip"
 [3] "Clariom_D_Human.exon_analysis_configuration"
 [4] "Clariom_D_Human.na36.hg38.probeset.csv"
 [5] "Clariom_D_Human.na36.hg38.probeset.csv.zip"
 [6] "Clariom_D_Human.na36.hg38.transcript.csv"
 [7] "Clariom_D_Human.na36.hg38.transcript.csv.zip"
 [8] "Clariom_D_Human.r1.bgp"
 [9] "Clariom_D_Human.r1.clf"
[10] "Clariom_D_Human.r1.mps"
[11] "Clariom_D_Human.r1.pgf"
[12] "Clariom_D_Human.r1.ps"
[13] "Clariom_D_Human.r1.qcc"
[14] "XTAArray_NetAffx-CSV-Files.README.txt"

> seed <- new("AffyHTAPDInfoPkgSeed", clfFile = dir()[9], pgfFile = dir()[11], coreMps = dir()[10], transFile = "Clariom_D_Human.na36.hg38.transcript.csv", probeFile = "Clariom_D_Human.na36.hg38.probeset.csv", author = "me", email = "me@mine.org", version = "0.0.1")
> makePdInfoPackage(seed)
================================================================================
Building annotation package for Affymetrix HTA Array
PGF.........: Clariom_D_Human.r1.pgf
CLF.........: Clariom_D_Human.r1.clf
Probeset....: Clariom_D_Human.na36.hg38.probeset.csv
Transcript..: Clariom_D_Human.na36.hg38.transcript.csv
Core MPS....: Clariom_D_Human.r1.mps
================================================================================
Parsing file: Clariom_D_Human.r1.pgf... OK
Parsing file: Clariom_D_Human.r1.clf... OK
Creating initial table for probes... OK
Creating dictionaries... OK
Parsing file: Clariom_D_Human.na36.hg38.probeset.csv... OK
Parsing file: Clariom_D_Human.r1.mps... OK
Creating package in ./pd.clariom.d.human
Inserting 950 rows into table chrom_dict... OK
Inserting 5 rows into table level_dict... OK
Inserting 3 rows into table type_dict... OK
Inserting 138745 rows into table core_mps... OK
Inserting 1562457 rows into table featureSet... OK
Inserting 8132393 rows into table pmfeature... OK
Inserting 711 rows into table mmfeature... OK
Counting rows in chrom_dict
Counting rows in core_mps
Counting rows in featureSet
Counting rows in level_dict
Counting rows in mmfeature
Counting rows in pmfeature
Counting rows in type_dict
Creating index idx_pmfsetid on pmfeature... OK
Creating index idx_pmfid on pmfeature... OK
Creating index idx_fsfsetid on featureSet... OK
Creating index idx_core_meta_fsetid on core_mps... OK
Creating index idx_core_fsetid on core_mps... OK
Creating index idx_mmfsetid on mmfeature... OK
Creating index idx_mmfid on mmfeature... OK
Saving DataFrame object for PM.
Saving DataFrame object for MM.
Saving NetAffx Annotation... OK
Done.
> install.packages("pd.clariom.d.human/", repos = NULL)
* installing *source* package ‘pd.clariom.d.human’ ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (pd.clariom.d.human)

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] pd.hta.2.0_3.12.1        pd.clariom.d.human_0.0.1 pdInfoBuilder_1.37.1
 [4] oligo_1.37.2             Biostrings_2.40.2        XVector_0.12.0
 [7] IRanges_2.6.1            S4Vectors_0.10.2         oligoClasses_1.35.0
[10] affxparser_1.44.0        RSQLite_1.0.0            DBI_0.4-1
[13] Biobase_2.32.0           BiocGenerics_0.18.0

loaded via a namespace (and not attached):
 [1] GenomicRanges_1.24.2       splines_3.3.0
 [3] zlibbioc_1.18.0            bit_1.1-12
 [5] foreach_1.4.3              GenomeInfoDb_1.8.3
 [7] tools_3.3.0                SummarizedExperiment_1.2.3
 [9] ff_2.2-13                  iterators_1.0.8
[11] preprocessCore_1.34.0      affyio_1.42.0
[13] codetools_0.2-14           BiocInstaller_1.22.3

If you are using an outdated version of R/BioC, then you need to upgrade.

ADD REPLY • link 8.6 years ago James W. MacDonald 68k

0

Entering edit mode

I did upgrade and that seemed to make it work, up until install of the created package, at which point I got an error stating the package was not compatable with R. 3.3.1 (so I tried it all again in R 3.3.0 and got the same error)

ADD REPLY • link 8.6 years ago jayme.rickman • 0

0

Entering edit mode

Did you build the package using the same R/Bioc installation that you tried to install on? As I showed, this builds and installs on R-3.3.0/Bioc-3.3 without problems. You are being mysterious and neglecting to show the code and output you got, as well as sessionInfo(), so I have to resort to guesses.

ADD REPLY • link 8.6 years ago James W. MacDonald 68k

0

Entering edit mode

My apologies, when I have done it with R 3.3.1 that is where I install it. Just as with R 3.3.0 I generate it and attempt to install it there. I have posted the full code and than bellow that my general question. I have included a traceback and the session info.

Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Creating index idx_core_meta_fsetid on core_mps... OK Creating index idx_core_fsetid on core_mps... OK Creating index idx_mmfsetid on mmfeature... OK Creating index idx_mmfid on mmfeature... OK Saving DataFrame object for PM. Saving DataFrame object for MM. Saving NetAffx Annotation... OK Done. > install.packages("pd.clariom.d.human/", repos = NULL) Installing package into ‘C:/Users/David C Aughton/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) Error in install.packages : type == "both" cannot be used with 'repos = NULL' > traceback() 4: stop("directory '", pkgdir, "' exists; use unlink=TRUE ", "to remove it, or choose another destination directory") 3: createPackage(pkgname = pkgName, destinationDir = destDir, originDir = templateDir, symbolValues = syms, quiet = quiet) 2: makePdInfoPackage(seed) 1: makePdInfoPackage(seed) Warning messages: 1: In .HTMLsearch(query) : Unrecognized search field: title 2: In .HTMLsearch(query) : Unrecognized search field: keyword 3: In .HTMLsearch(query) : Unrecognized search field: alias 4: In .HTMLsearch(query) : Unrecognized search field: title 5: In .HTMLsearch(query) : Unrecognized search field: keyword 6: In .HTMLsearch(query) : Unrecognized search field: alias > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics [5] grDevices utils datasets methods [9] base

other attached packages: [1] pdInfoBuilder_1.37.1 [2] oligo_1.37.2 [3] Biostrings_2.41.4 [4] XVector_0.13.7 [5] IRanges_2.7.14 [6] S4Vectors_0.11.13 [7] oligoClasses_1.35.0 [8] affxparser_1.45.0 [9] RSQLite_1.0.0 [10] DBI_0.5 [11] Biobase_2.33.3 [12] BiocGenerics_0.19.2 [13] BiocInstaller_1.23.9 [14] affyio_1.43.0

loaded via a namespace (and not attached): [1] splines_3.3.0 [2] GenomicRanges_1.25.93 [3] zlibbioc_1.19.0 [4] bit_1.1-12 [5] lattice_0.20-33 [6] foreach_1.4.3 [7] GenomeInfoDb_1.9.8 [8] tools_3.3.0 [9] SummarizedExperiment_1.3.82 [10] grid_3.3.0 [11] ff_2.2-13 [12] iterators_1.0.8 [13] preprocessCore_1.35.0 [14] Matrix_1.2-7.1 [15] codetools_0.2-14

From reading the install.packages() help pages I know the repos = NULL specifies that it can be something besides just a URL. Yet this is also the line of code that the error is calling on. Had to remove some of the package formation portion in code due to character limits.

ADD REPLY • link 8.6 years ago jayme.rickman • 0

1

Entering edit mode

Here's the part of interest.

> install.packages("pd.clariom.d.human/", repos = NULL)
Installing package into ‘C:/Users/David C Aughton/Documents/R/win-library/3.3’
(as ‘lib’ is unspecified)
Error in install.packages : type == "both" cannot be used with 'repos = NULL'

And you could have read that and said 'what is this type argument you speak of, R?'. And then looked at the help page for install.packages, which says

Usage:

     install.packages(pkgs, lib, repos = getOption("repos"),
                      contriburl = contrib.url(repos, type),
                      method, available = NULL, destdir = NULL,
                      dependencies = NA, type = getOption("pkgType"),
                      configure.args = getOption("configure.args"),
                      configure.vars = getOption("configure.vars"),
                      clean = FALSE, Ncpus = getOption("Ncpus", 1L),
                      verbose = getOption("verbose"),
                      libs_only = FALSE, INSTALL_opts, quiet = FALSE,
                      keep_outputs = FALSE, ...)

Which is suitably mysterious. I mean, what is this getOption("pkgType") business?

> getOption("pkgType")
[1] "both"

So by default the type argument is 'both'. And if you then read the blahblahblah under Binary packages in the help file, it will bore you with all this stuff about Windows and MacOS having binary and source packages and whatnot. The upshot being that you can hypothetically install either a source package (which is what you have) or a binary package (which is not what you have), and if you say repos = NULL, you can't say you have 'both' because obviously you don't. Instead, you have to say what type of package you do have, which means that you need to include a type = "source" to over-ride the default.

ADD REPLY • link 8.6 years ago James W. MacDonald 68k

score 1 · Answer 1 · 2016-09-07

The Clariom D arrays are just HTA-3.0 arrays, with a new name. And I am not sure without checking that they aren't actually HTA-2.0 arrays with a new name. But regardless, you will have to use either pdInfoBuilder/oligo or (possibly, I don't really know) xts to analyze these arrays.

For pdInfoBuilder/oligo, you first need to build a pdInfoPackage. First, download the installer package from affy, and unzip somewhere. You also need to get the probeset and transcript csv files, which you should unzip as well. Then open R in that directory, and do (and note that anything in <brackets> is something that you have to do for yourself):

library(pdInfoBuilder)

seed <- new("AffyHTAPDInfoPkgSeed", pgfFile = "Clariom_D_Human.r1.pgf", clfFile = "Clariom_D_Human.r1.clf", coreMps = "Clariom_D_Human.r1.mps", transFile = "Clariom_D_Human.na36.hg38.transcript.csv", probeFile = "Clariom_D_Human.na36.hg38.probeset.csv", author = <your name goes here>, email = <your email>, version = "0.0.1")

makePdInfoPackage(seed)

And then wait for a while. Once it's done, you need to do

install.package(<whatever makePdInfoPackage calls it>, repos = NULL)

Et voila! Up and running with the cool kids. The above assumes you are on a reasonable OS (by which I mean Linux, natch) you might be able to get away with a modern version of Windows or MacOS, but I avoid both for real work, so I have no real idea if this works seamlessly on those OSes or not.