Error Oligo package asking for pd.2.0 package annotation
1
0
Entering edit mode
@alantb_cederj-6768
Last seen 4 months ago
United States

I am doing some microarray analysis using same code for diferente experiments using Affymetrix Human Genome U133 Plus 2.0 Array. The code has worked fine for all of them but this one: GSE28829. When I run the command "read.celfiles" it returns an error as described below. It seems to be asking for a pd.2.0 annotation package that I cannot find anywhere. Is there a way around to fix it? How can I do that? Thanks in advance.

library(oligo)
library(GEOquery)
library(icesTAF)
library(genefilter)
library(limma)
library(annotate)
library(hgu133plus2.db)

setwd("C:/Users/alant/Documents/R_test_IC/raw_data_dir_GSE28829")
list.files("GSE28829/CEL")
 [1] "GSM714070_ADV1.CEL.gz"  "GSM714071_ADV2.CEL.gz" 
 [3] "GSM714072_ADV3.CEL.gz"  "GSM714073_ADV4.CEL.gz" 
 [5] "GSM714074_ADV5.CEL.gz"  "GSM714075_ADV6.CEL.gz" 
 [7] "GSM714076_ADV7.CEL.gz"  "GSM714077_ADV8.CEL.gz" 
 [9] "GSM714078_ADV9.CEL.gz"  "GSM714079_ADV10.CEL.gz"
[11] "GSM714080_ADV11.CEL.gz" "GSM714081_ADV12.CEL.gz"
[13] "GSM714082_ADV13.CEL.gz" "GSM714083_ADV14.CEL.gz"
[15] "GSM714084_ADV15.CEL.gz" "GSM714085_ADV16.CEL.gz"
[17] "GSM714086_EAR1.CEL.gz"  "GSM714087_EAR2.CEL.gz" 
[19] "GSM714088_EAR3.CEL.gz"  "GSM714089_EAR4.CEL.gz" 
[21] "GSM714090_EAR5.CEL.gz"  "GSM714091_EAR6.CEL.gz" 
[23] "GSM714092_EAR7.CEL.gz"  "GSM714093_EAR8.CEL.gz" 
[25] "GSM714094_EAR9.CEL.gz"  "GSM714095_EAR10.CEL.gz"
[27] "GSM714096_EAR11.CEL.gz" "GSM714097_EAR12.CEL.gz"
[29] "GSM714098_EAR13.CEL.gz"
> celfiles <- list.files("GSE28829/CEL", full = TRUE)
> cf <- read.celfiles(celfiles)
Loading required package: pd.2.0
Attempting to obtain 'pd.2.0' from BioConductor website.
Checking to see if your internet connection works...
'getOption("repos")' replaces Bioconductor standard
repositories, see 'help("repositories", package =
"BiocManager")' for details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
Package 'pd.2.0' was not found in the BioConductor repository.
The 'pdInfoBuilder' package can often be used in situations like this.
Error in read.celfiles(celfiles) : 
  The annotation package, pd.2.0, could not be loaded.
oligo • 369 views
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 12 hours ago
Wageningen University, Wageningen, the …

I can reproduce your error:

> library(oligo)
> 
> filenames = list.celfiles()
> affy.data <- read.celfiles( filenames = filenames)
Loading required package: pd.2.0
Attempting to obtain 'pd.2.0' from BioConductor website.
Checking to see if your internet connection works...
Package 'pd.2.0' was not found in the BioConductor repository.
The 'pdInfoBuilder' package can often be used in situations like this.
Error in read.celfiles(filenames = filenames) : 
  The annotation package, pd.2.0, could not be loaded.
> 

The error occurs because the content of the CEL files are not according to specification. Because of that, oligo deduces it are pd.2.0 arrays that it is trying to load, which is not the case (and these type of arrays even do also not exist)! If you check the GEO submission (and publication), you will see these arrays have actually been run on a platform with ID GPL570, which correspond to Affymetrix Human Genome U133 Plus 2.0 Arrays, abbreviated with HG-U133_Plus_2. The corresponding PdInfo package is pd.hg.u133.plus.2. Note that the oligo uses so-called PdInfo (probe design info) packages as annotation files.

Having a mote detailed look: if you check the help page of the oligo function read.celfiles (type ?read.celfiles), you will notice under the hood oligo uses functions from affyio to load the CEL files, and based on the header of the CEL file (i.e. the cdfName-slot) the corresponding annotation file will be automagically loaded. In other words, the error points to something going wrong when the CEL file header is read. Let's inspect this manually:

> affyio::read.celfile.header("GSM714070_ADV1.CEL", info="full")
$cdfName
[1] "2.0"

$`CEL dimensions`
Cols Rows 
1164 1164 

$GridCornerUL
[1] 0 0

$GridCornerUR
[1] 0 0

$GridCornerLR
[1] 0 0

$GridCornerLL
[1] 0 0

$DatHeader
[1] " \024 \024 HG-U133 Plus 2.0.1sq \024 \024 \024 \024 \024 \024 \024 \024 \024 "

$Algorithm
[1] "Unknown"

$AlgorithmParameters
[1] "P1:"

$ScanDate
character(0)

>

Note that the cdfName is (just) "2.0". This is weird, because this should be the full name of the type of chip used... (Also see that in $DatHeader the term HG-U133 Plus 2.0.1sq is present, pointing to Human Genome U133 Plus 2.0 Arrays).

To show this, the output when checking one of the (very old) files we generated in our lab:

> affyio::read.celfile.header("A125_01_CTR.CEL", info="full")
$cdfName
[1] "HG-U133_Plus_2"

$`CEL dimensions`
Cols Rows 
1164 1164 

$GridCornerUL
[1] 213 206

$GridCornerUR
[1] 8404  215

$GridCornerLR
[1] 8400 8395

$GridCornerLL
[1]  208 8386

$DatHeader
[1] "[12..47665]  A125_01_ctr:CLS=8609 RWS=8609 XIN=1  YIN=1  VE=30        2.0 06/25/08 11:41:44 50209050  M10   \024  \024 HG-U133_Plus_2.1sq \024  \024  \024  \024  \024 570 \024 25347.941406 \024 3.500000 \024 1.5600 \024 6"

$Algorithm
[1] "Percentile"

$AlgorithmParameters
[1] "Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.00000"

$ScanDate
[1] "06/25/08 11:41:44"

>

Note that cdfName is "HG-U133_Plus_2". That is a very well know chip type!

Do the same, but now using an affxparser function:

> affxparser::readCelHeader("GSM714070_ADV1.CEL")
$filename
[1] "./GSM714070_ADV1.CEL"

$version
[1] 4

$cols
[1] 1164

$rows
[1] 1164

$total
[1] 1354896

$algorithm
[1] "Unknown"

$parameters
[1] "P1:1;CellMargin:0"

$chiptype
[1] "HG-U133 Plus 2"

$header
[1] "Cols=1164\nRows=1164\nTotalX=1164\nTotalY=1164\nOffsetX=0\nOffsetY=0\nGridCornerUL=0 0\nGridCornerUR=0 0\nGridCornerLR=0 0\nGridCornerLL=0 0\nAxis-invertX=0\nAxisInvertY=0\nswapXY=0\nDatHeader= \024 \024 HG-U133 Plus 2.1sq \024 \024 \024 \024 \024 \024 \024 \024 \024 \nAlgorithm=Unknown\nAlgorithmParameters=P1:1;CellMargin:0\n"

$datheader
[1] " \024 \024 HG-U133 Plus 2.1sq \024 \024 \024 \024 \024 \024 \024 \024 \024 "

$librarypackage
[1] ""

$cellmargin
[1] 0

$noutliers
[1] 146487

$nmasked
[1] 0

> 

Note that chiptype is "HG-U133 Plus 2".

Idem, using my old file:

> affxparser::readCelHeader("A125_01_CTR.CEL")
$filename
[1] "./A125_01_CTR.CEL"

$version
[1] 4

$cols
[1] 1164

$rows
[1] 1164

$total
[1] 1354896

$algorithm
[1] "Percentile"

$parameters
[1] "Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000"

$chiptype
[1] "HG-U133_Plus_2"

$header
[1] "Cols=1164\nRows=1164\nTotalX=1164\nTotalY=1164\nOffsetX=0\nOffsetY=0\nGridCornerUL=213 206\nGridCornerUR=8404 215\nGridCornerLR=8400 8395\nGridCornerLL=208 8386\nAxis-invertX=0\nAxisInvertY=0\nswapXY=0\nDatHeader=[12..47665]  A125_01_ctr:CLS=8609 RWS=8609 XIN=1  YIN=1  VE=30        2.0 06/25/08 11:41:44 50209050  M10   \024  \024 HG-U133_Plus_2.1sq \024  \024  \024  \024  \024 570 \024 25347.941406 \024 3.500000 \024 1.5600 \024 6\nAlgorithm=Percentile\nAlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000\n"

$datheader
[1] "[12..47665]  A125_01_ctr:CLS=8609 RWS=8609 XIN=1  YIN=1  VE=30        2.0 06/25/08 11:41:44 50209050  M10   \024  \024 HG-U133_Plus_2.1sq \024  \024  \024  \024  \024 570 \024 25347.941406 \024 3.500000 \024 1.5600 \024 6"

$librarypackage
[1] ""

$cellmargin
[1] 2

$noutliers
[1] 67

$nmasked
[1] 0

> 

... thus:

somehow the CEL files that are available at GEO miss some important information, and as a consequence of that oligo attempts to download a wrong/non-existing file. This behavior can be overcome by manually specifying the chiptype. See post below.

Why this info is missing in the CEL files, and whether these CEL files may have additional issues, is another question...

ADD COMMENT
0
Entering edit mode

... to complete my previous post:

Thus, manually setting the chiptype/PdInfo package will work:

> affy.data <- read.celfiles( filenames = filenames, pkgname = "pd.hg.u133.plus.2")
Platform design info loaded.
Reading in : GSM714070_ADV1.CEL
Reading in : GSM714071_ADV2.CEL
Reading in : GSM714072_ADV3.CEL
Reading in : GSM714073_ADV4.CEL
Reading in : GSM714074_ADV5.CEL
Reading in : GSM714075_ADV6.CEL
Reading in : GSM714076_ADV7.CEL
Reading in : GSM714077_ADV8.CEL
Reading in : GSM714078_ADV9.CEL
Reading in : GSM714079_ADV10.CEL
Reading in : GSM714080_ADV11.CEL
Reading in : GSM714081_ADV12.CEL
Reading in : GSM714082_ADV13.CEL
Reading in : GSM714083_ADV14.CEL
Reading in : GSM714084_ADV15.CEL
Reading in : GSM714085_ADV16.CEL
Reading in : GSM714086_EAR1.CEL
Reading in : GSM714087_EAR2.CEL
Reading in : GSM714088_EAR3.CEL
Reading in : GSM714089_EAR4.CEL
Reading in : GSM714090_EAR5.CEL
Reading in : GSM714091_EAR6.CEL
Reading in : GSM714092_EAR7.CEL
Reading in : GSM714093_EAR8.CEL
Reading in : GSM714094_EAR9.CEL
Reading in : GSM714095_EAR10.CEL
Reading in : GSM714096_EAR11.CEL
Reading in : GSM714097_EAR12.CEL
Reading in : GSM714098_EAR13.CEL
> affy.data
ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 1354896 features, 29 samples 
  element names: exprs 
protocolData
  rowNames: GSM714070_ADV1.CEL GSM714071_ADV2.CEL ...
    GSM714098_EAR13.CEL (29 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM714070_ADV1.CEL GSM714071_ADV2.CEL ...
    GSM714098_EAR13.CEL (29 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2 
> norm.data <- rma(affy.data)
Background correcting
Normalizing
Calculating Expression
> head(exprs(norm.data))
          GSM714070_ADV1.CEL GSM714071_ADV2.CEL GSM714072_ADV3.CEL
1007_s_at           7.476734           7.560470           7.545101
1053_at             5.231710           5.571284           5.502338
117_at              6.407551           6.456310           5.912837
121_at              8.086023           8.056792           7.715602
1255_g_at           3.334447           3.174585           3.156232
1294_at             7.727043           7.573688           7.457024
          GSM714073_ADV4.CEL GSM714074_ADV5.CEL GSM714075_ADV6.CEL
1007_s_at           7.137021           7.313099           7.320076
1053_at             5.406858           5.961533           5.719497
117_at              6.138684           6.431826           6.274274
121_at              7.423653           7.747922           7.833796
1255_g_at           3.277893           3.197367           3.195213
1294_at             7.626756           7.616204           7.706446
          GSM714076_ADV7.CEL GSM714077_ADV8.CEL GSM714078_ADV9.CEL
1007_s_at           6.757667           7.379789           7.545558
1053_at             5.464369           5.687824           5.465359
117_at              5.657739           5.846036           5.710486
121_at              7.633695           7.562925           7.985645
1255_g_at           3.268658           3.442809           3.269492
1294_at             7.274741           7.540043           7.495845
          GSM714079_ADV10.CEL GSM714080_ADV11.CEL GSM714081_ADV12.CEL
1007_s_at            7.491856            7.387317            7.309027
1053_at              5.275807            5.400409            5.496070
117_at               6.053606            5.972611            5.691696
121_at               7.753887            7.783865            7.638980
1255_g_at            3.127979            3.247063            3.233435
1294_at              7.537099            7.628460            7.412809
          GSM714082_ADV13.CEL GSM714083_ADV14.CEL GSM714084_ADV15.CEL
1007_s_at            6.981679            6.765899            6.411382
1053_at              5.661703            5.259893            5.360490
117_at               5.863908            5.927943            6.347463
121_at               7.460363            7.619349            7.376733
1255_g_at            3.355288            3.431904            3.150406
1294_at              7.673328            7.282095            7.324794
          GSM714085_ADV16.CEL GSM714086_EAR1.CEL GSM714087_EAR2.CEL
1007_s_at            6.471320           6.916683           7.272092
1053_at              5.423323           5.215497           5.387831
117_at               6.163279           5.913345           5.957130
121_at               7.463625           7.496556           7.562990
1255_g_at            3.299306           3.244415           3.290230
1294_at              6.984796           7.271448           7.249620
          GSM714088_EAR3.CEL GSM714089_EAR4.CEL GSM714090_EAR5.CEL
1007_s_at           7.290633           7.515731           7.427269
1053_at             6.111040           5.325759           5.662993
117_at              5.634361           5.845570           5.860925
121_at              7.380778           7.671676           7.660248
1255_g_at           3.294858           3.263839           3.269480
1294_at             7.162763           7.585158           7.439052
          GSM714091_EAR6.CEL GSM714092_EAR7.CEL GSM714093_EAR8.CEL
1007_s_at           7.046685           7.362381           7.537203
1053_at             5.803273           5.406180           5.713884
117_at              6.435217           5.536453           6.086993
121_at              7.200921           8.062172           7.596732
1255_g_at           3.368173           3.337614           3.249723
1294_at             7.486536           8.113033           7.569026
          GSM714094_EAR9.CEL GSM714095_EAR10.CEL GSM714096_EAR11.CEL
1007_s_at           7.467394            7.281994            7.340640
1053_at             5.389508            5.171387            5.768445
117_at              5.458502            5.751894            5.863148
121_at              7.435648            7.382841            7.493919
1255_g_at           3.268982            3.363065            3.181215
1294_at             7.275367            7.250188            7.376961
          GSM714097_EAR12.CEL GSM714098_EAR13.CEL
1007_s_at            7.311836            7.066598
1053_at              5.735633            5.617395
117_at               4.882628            5.738776
121_at               7.492598            7.106533
1255_g_at            3.228107            3.360925
1294_at              7.086117            7.022475
    > 
> norm.data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 29 samples 
  element names: exprs 
protocolData
  rowNames: GSM714070_ADV1.CEL GSM714071_ADV2.CEL ...
    GSM714098_EAR13.CEL (29 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM714070_ADV1.CEL GSM714071_ADV2.CEL ...
    GSM714098_EAR13.CEL (29 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2 
>
ADD REPLY

Login before adding your answer.

Traffic: 566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6