Question

AnnotationData package for Affymetrix HT MG-430 PM Array Plate

0

Entering edit mode

olrema • 0

@olrema-24382

Last seen 3.5 years ago

Hi!

I'm trying to do an expression analysis from an Affymetrix HT MG-430 PM Array Plate (mus musculus). R asks me for the "pd.ht.mg.430.pm" package but is not available in Bioconductor. I'm wondering if I could use "pd.ht.mg.430a" package instead without any later error in the analysis. I've tried and R accepts the "pd.ht.mg.430a" package without problems but I am worried that the results I get are not appropriate.

Hope you can help me.

RNAseq Affymetrix Mouse • 1.4k views

ADD COMMENT • link updated 3.5 years ago by Guido Hooiveld ★ 3.9k • written 3.5 years ago by olrema • 0

score 2 · Answer 1 · 2020-10-16

2

Entering edit mode

Guido Hooiveld ★ 3.9k

@guido-hooiveld-2020

Last seen 9 hours ago

Wageningen University, Wageningen, the …

No, you should indeed not do this!

From the product description: Each microarray on the GeneChip HT MG-430 PM Array Plate contains the same number of probe sets as the industry-standard GeneChip Mouse Genome 430 2.0 Array. This enables researchers to take a whole genome approach to expression profiling and smoothly scale up to process large numbers of samples. One critical design change was introduced with the GeneChip HT MG-430 PM Array Plate: Only Perfect Match (PM) probes from the cartridge design were retained while Mismatch (MM) probes were removed.

Thus, the GeneChip HT MG-430 PM array is basically a plate version of the earlier GeneChip Mouse Genome 430 2.0 cartridge array.

The package pd.ht.mg.430a is for the HT MG-430 Array Plate A, which basically is a subset of the HT MG-430 PM Array Plate you would like to analyze. See here, especially Figure 2.

But how then to analyze your data? As far as I know this is only possible using the package affy (here) in combination with a CDF file (htmg430pmcdf here) (so basically through the 'old' way, and not through the use of oligo). oligo cannot be used because a so-called PlatformDesign (pd) package cannot be made,and this is due to the fact that the pgf and clf files that are required for this are not available... This has been discussed here years ago... See the post/thread here, and apparently this still seems to be the case today.

ADD COMMENT • link 3.5 years ago Guido Hooiveld ★ 3.9k

1

Entering edit mode

Hi Guido,

It is possible these days to use pdInfoBuilder and oligo for all Affy arrays, even those with only CDFs (see for example the pd.ht.mg.430a package).

Figuring out how to do such things is a non-trivial endeavor however.

> showMethods(makePdInfoPackage)
Function: makePdInfoPackage (package pdInfoBuilder)
object="AffyClariomSPDInfoPkgSeed"
object="AffyExpressionPDInfoPkgSeed"
object="AffyHTAPDInfoPkgSeed"
object="AffyMiRNAPDInfoPkgSeed"
object="AffySNPCNVPDInfoPkgSeed"
object="AffySNPCNVPDInfoPkgSeed2"
object="AffySNPPDInfoPkgSeed"
object="AffySNPPDInfoPkgSeed2"
object="AffySTPDInfoPkgSeed"
object="AffyTilingPDInfoPkgSeed"
object="GenericPDInfoPkgSeed"
object="NgsExpressionPDInfoPkgSeed"
object="NgsTilingPDInfoPkgSeed"

## Uh, let's try 'AffyExpression'

> selectMethod(makePdInfoPackage, "AffyExpressionPDInfoPkgSeed")
Method Definition:

function (object, destDir = ".", batch_size = 10000, quiet = FALSE, 
    unlink = FALSE) 
{
    msgBar()
    cat("Building annotation package for Affymetrix Expression array\n")
    cat("CDF...............: ", basename(object@cdfFile), "\n")
    cat("CEL...............: ", basename(object@celFile), "\n")
    cat("Sequence TAB-Delim: ", basename(object@tabSeqFile), 
        "\n")
    msgBar()
    chip <- chipName(object)

<snip>

## that looks like what we want.

So essentially what one would need is the CDF file, a CEL file, and the tab-delimited sequence file. We can get those using AffyCompatible and GEOquery.

> library(AffyCompatible)
>  rsrc <- NetAffxResource("jmacdon@med.umich.edu", "passwordgoeshere")
> grep("mg-430", names(rsrc), ignore.case = TRUE, value = TRUE)
[1] "HT_MG-430A"   "HT_MG-430B"   "HT_MG-430_PM"
> annos <- rsrc[["HT_MG-430_PM"]]
> sapply(affxAnnotation(annos), force)[c(4,11)]
[[1]]
affxType: CDF 
affxDescription: CDF Library File 
affxFile: AffxFile(1)

[[2]]
affxType: Probe Tabular 
affxDescription: Probe Sequences, tabular format 
affxFile: AffxFile(1)
## this will download the data to a temp dir. Move to your working dir
> readAnnotation(rsrc, annotation = affxAnnotation(annos)[[4]], content = FALSE)
> readAnnotation(rsrc, annotation = affxAnnotation(annos)[[11]], content = FALSE)
## can't be zipped
> unzip("HT_MG-430_PM.probe_tab.zip")
> unzip("/HT_MG-430_PM.cdf.zip")
## get a random CEL file
> getGEOSuppFiles("GSE151727")
## and gunzip one
> gunzip( "GSM4589134_SAMR1_+_Water_1.cel.gz")
## make a seed file
> seed <- new("AffyExpressionPDInfoPkgSeed", cdfFile = "HT_MG-430_PM.cdf", celFile = "GSM4589134_SAMR1_+_Water_1.cel", tabSeqFile = "HT_MG-430_PM.probe_tab")
## and build
> makePdInfoPackage(seed)
================================================================================
Building annotation package for Affymetrix Expression array
CDF...............:  HT_MG-430_PM.cdf 
CEL...............:  GSM4589134_SAMR1_+_Water_1.cel 
Sequence TAB-Delim:  HT_MG-430_PM.probe_tab 
================================================================================
Parsing file: HT_MG-430_PM.cdf... OK
Parsing file: GSM4589134_SAMR1_+_Water_1.cel... OK
Parsing file: HT_MG-430_PM.probe_tab... OK
Getting information for featureSet table... OK
Getting information for pm/mm feature tables... OK
Combining probe information with sequence information... OK
Getting PM probes and sequences... OK
Done parsing.
Creating package in ./pd.ht.mg.430.pm 
Inserting 45141 rows into table featureSet... OK
Inserting 513598 rows into table pmfeature... OK
Inserting 180 rows into table mmfeature... OK
Counting rows in featureSet
Counting rows in mmfeature
Counting rows in pmfeature
Creating index idx_pmfsetid on pmfeature... OK
Creating index idx_pmfid on pmfeature... OK
Creating index idx_fsfsetid on featureSet... OK
Saving DataFrame object for PM.
Saving DataFrame object for MM.
Done.
There were 11 warnings (use warnings() to see them)
## install
>  install.packages("pd.ht.mg.430.pm/", repos = NULL, type = "source")
Installing package into 'C:/Users/jmacdon/AppData/Roaming/R/win-library/4.0'
(as 'lib' is unspecified)
* installing *source* package 'pd.ht.mg.430.pm' ...
** using staged installation
** R
** data
** inst
** byte-compile and prepare package for lazy loading
<snip>

ADD REPLY • link 3.5 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James, Very good to know; I would not have been able to find this out by myself... Thanks!