I am attempting to use the xps package to perform background correction and summarization (at the probeset level) on Affymetrix DroGene 1.1 ST arrays, but I am encountering an error right away when attempting to build a scheme file for these arrays as 'exon' arrays ie. using both the probeset and transcript annotations.
I am guessing that the issue may involve changes in the annotation files for na36 that are producing some sort of mismatch with the available PGF and CLF files, but I am not savvy enough with these formats to find the issue. Here's what the error looks like:
> library(xps) Welcome to xps version 1.40.0 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2018 by Christian Stratowa > > libdir <- "/Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st" > anndir <- "/Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st-v1-na36-dm3-transcript-csv" > anndir2 <- "/Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st-v1-na36-dm3-probeset-csv" > scmdir <- "/Users/chrki23/Documents/Becky Data/XPS/Schemes" > ### DroGene-1_1-st-v1.na36.dm3: as exon array > scheme.drogene11stv1.na36.dm3 <- import.exon.scheme("Scheme_DroGene11stv1", filedir = scmdir, + layoutfile = file.path(libdir, "DroGene-1_1-st.clf"), + schemefile = file.path(libdir, "DroGene-1_1-st.pgf"), + probeset = file.path(anndir2, "DroGene-1_1-st-v1.na36.dm3.probeset.csv"), + transcript = file.path(anndir, "DroGene-1_1-st-v1.na36.dm3.transcript.csv"), add.mask = T, verbose = T) Creating new file </Users/chrki23/Documents/Becky Data/XPS/Schemes/Scheme_DroGene11stv1.root>... Importing </Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st/DroGene-1_1-st.clf> as <DroGene-1_1-st.cxy>... <1416100> records imported...Finished New dataset <DroGene-1_1-st> is added to Content... Importing </Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st-v1-na36-dm3-probeset-csv/DroGene-1_1-st-v1.na36.dm3.probeset.csv> as <DroGene-1_1-st.anp>... Number of probesets is <176275>. <176275> records read...Finished <175929> records imported...Finished <71433> exon annotations imported. Importing </Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st/DroGene-1_1-st.pgf> as <DroGene-1_1-st.scm>... Reading data from input file... Number of probesets is <176275>. <176275> records read...Finished Sorting data for probeset_type and position... Total number of controls is <23> Filling trees with data for probeset type: control->chip... Number of control->chip items is <0>. Filling trees with data for probeset type: control->bgp... Filling trees with data for probeset type: control->affx... Number of control->affx probesets is <167>. Error: Number of control->affx imported <167> is not equal to number of annotated AFFX controls <75>. Error: CDF with version/magic number </Users/chrki23/Documents/Becky Data/XPS/DroGene-1_1-st/DroGene-1_1-st.pgf> is not supported. Error in import.exon.scheme("Scheme_DroGene11stv1", filedir = scmdir, : error in function ‘ImportExonSchemes’
My session info looks like this:
R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X Yosemite 10.10.5 locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  xps_1.40.0 loaded via a namespace (and not attached):  tools_3.3.2 yaml_2.2.0
The mention of 167 affx controls vs. 75 makes me think that maybe something in the pgf/clf files is indicating there are also affx->ercc spike-in controls, which I think are present in the HuGeneST arrays (where there are 167 affx controls), but these are not present in the DroGene annotation files. I don't think they are actually present on the DroGene arrays either, though it would be nice if they were. Any ideas how to resolve this problem and successfully build a DroGeneST scheme? Do I somehow have to cook up a new pgf or clf?