fRMA: Applying mouse4302frmavecs on HT_MG-430_PM Arrays
2
0
Entering edit mode
Matthias Z. ▴ 20
@matthias-z-8768
Last seen 6.8 years ago
University Medical Center of Münster, G…

Hello everybody,

I was wondering, if it is possible/advised to use the fRMA algorithm to also analyse HT_MG-430_PM arrays.

The Affymetrix website states that: "The GeneChip HT MG-430 PM Array Plate enables high-throughput expression profiling of 16, 24, or 96 samples at a time using the same industry-standard content as the GeneChip Mouse Genome 430 2.0 Array".

Hence I hope, that I can still apply the fRMA algorithm also to this type of array. However if I read in the AffyBatch in a regular manner and just provide the input.vecs to fRMA, I end up with this type of error:

#library("mouse4302frmavecs")
#data(mouse4302frmavecs)

myEset <- frma(myAffyBatch,input.vecs = mouse4302frmavecs)

Error in frmaAffyBatch(object, background, normalize, summarize, input.vecs,  : Mismatch between pmindex(object) and names of input.vecs and unable to create unique mapping.

It turns out, that the probe names do not match:

names(pmindex(myAffyBatch))[1:3]
[1] "1415670_PM_at"   "1415671_PM_at"   "1415672_PM_at"   
names(mouse4302frmavecs[["probesetSD"]])[1:3]
[1] "1415670_at" "1415671_at" "1415672_at"

The initial approach, which I considered was a little gsub manimulation of the names, but I have doubts to mess with the probe assignments too much and end up with wrongly normalized expression calls, since also the length of the vectors does not match (difference of 40 probes).

Since the ReadAffy function offers a possibility to modify the cdfname, I rather tested this (unsuccessful) approach: 

myAffyBatch <- ReadAffy(celfile.path="./data/CELfiles/", cdfname="mouse4302")
myEset <- frma(myAffyBatch,input.vecs = mouse4302frmavecs)
Error in exprs(object)[index, , drop = FALSE] : subscript out of bounds
In addition: Warning messages:
1: replacing previous import by ‘utils::tail’ when loading ‘mouse4302cdf’ 
2: replacing previous import by ‘utils::head’ when loading ‘mouse4302cdf’ 

So I would kindly ask for help, if and how I can reasonably analyse HT_MG-430_PM Arrays with fRMA.

Thanks a lot

Matthias


> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] htmg430pmcdf_2.18.0      mouse4302cdf_2.18.0      mouse4302frmavecs_1.5.0 
 [4] mouse430a2frmavecs_1.3.0 frma_1.22.0              limma_3.26.5            
 [7] affy_1.48.0              Biobase_2.30.0           BiocGenerics_0.16.1     
[10] BiocInstaller_1.20.1    

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3       affxparser_1.42.0          XVector_0.10.0            
 [4] splines_3.2.3              GenomicRanges_1.22.3       zlibbioc_1.16.0           
 [7] MASS_7.3-45                IRanges_2.4.6              bit_1.1-12                
[10] foreach_1.4.3              GenomeInfoDb_1.6.1         tools_3.2.3               
[13] SummarizedExperiment_1.0.2 ff_2.2-13                  DBI_0.3.1                 
[16] iterators_1.0.8            oligoClasses_1.32.0        preprocessCore_1.32.0     
[19] oligo_1.34.2               affyio_1.40.0              S4Vectors_0.8.7           
[22] codetools_0.2-14           RSQLite_1.0.0              Biostrings_2.38.3         
[25] stats4_3.2.3              
frma microarray affiymetrix • 1.8k views
ADD COMMENT
2
Entering edit mode
@matthew-mccall-4459
Last seen 4.9 years ago
United States

Matthias,

I haven't tested this. A while back, we tested the behavior of identical probes between HGU133a and HGU133plus2 platforms and saw substantially different behavior. However, the mouse4302 and HT_MG-430_PM arrays are possibly more similar, but without some extensive testing I'm not sure. If you're interested in helping with such testing, please let me know and we can discuss how to proceed. 

Alternatively, if you have a large enough data set (and depending on your analysis plans), you could create your own frma vectors using the frmaTools package. 

Best,

Matt

ADD COMMENT
0
Entering edit mode
Matthias Z. ▴ 20
@matthias-z-8768
Last seen 6.8 years ago
University Medical Center of Münster, G…

Hello Matt,

Thanks a lot for your answer. Sadly our dataset is very small (two conditions and four biological replicates each), so the creation of a real frma vector package is probably out of the question - at least without third-party data from public repositories. 

So I would like to accept your kind offer to provide some more details about the required testing to be able to apply the mouse4302 vectors to our arrays.  Any suggestions how to achieve this successfully on a technical level are also highly appreciated:

library("mouse4302frmavecs")
data(mouse4302frmavecs)

mouse4302frmavecs_manipulated <- mouse4302frmavecs # copy

names(mouse4302frmavecs_manipulated[["probesetSD"]]) <- gsub("(^[0-9]+)_","\\1_PM_",names(mouse4302frmavecs_manipulated[["probesetSD"]]),perl=TRUE)

names(mouse4302frmavecs_manipulated[["medianSE"]]) <- gsub("(^[0-9]+)_","\\1_PM_",names(mouse4302frmavecs_manipulated[["medianSE"]]),perl=TRUE)

myEset <- frma(myAffyBatch,input.vecs = mouse4302frmavecs_manipulated)

Error in frmaAffyBatch(object, background, normalize, summarize, input.vecs,  : 
  Mismatch between pmindex(object) and names of input.vecs and unable to create unique mapping.

Unfortunately also after the replacement, there is a mismatch of 40 probes causing fRMA to quit with an error

names(pmindex(myAffyBatch))[!names(pmindex(myAffyBatch)) %in% names(mouse4302frmavecs_manipulated[["probesetSD"]])]
 [1] "AFFX-NonspecificGC10_at" "AFFX-NonspecificGC11_at" "AFFX-NonspecificGC12_at"
 [4] "AFFX-NonspecificGC13_at" "AFFX-NonspecificGC14_at" "AFFX-NonspecificGC15_at"
 [7] "AFFX-NonspecificGC16_at" "AFFX-NonspecificGC17_at" "AFFX-NonspecificGC18_at"
[10] "AFFX-NonspecificGC19_at" "AFFX-NonspecificGC20_at" "AFFX-NonspecificGC21_at"
[13] "AFFX-NonspecificGC22_at" "AFFX-NonspecificGC23_at" "AFFX-NonspecificGC24_at"
[16] "AFFX-NonspecificGC25_at" "AFFX-NonspecificGC3_at"  "AFFX-NonspecificGC4_at" 
[19] "AFFX-NonspecificGC5_at"  "AFFX-NonspecificGC6_at"  "AFFX-NonspecificGC7_at" 
[22] "AFFX-NonspecificGC8_at"  "AFFX-NonspecificGC9_at"  "AFFX-r2-TagA_at"        
[25] "AFFX-r2-TagB_at"         "AFFX-r2-TagC_at"         "AFFX-r2-TagD_at"        
[28] "AFFX-r2-TagE_at"         "AFFX-r2-TagF_at"         "AFFX-r2-TagG_at"        
[31] "AFFX-r2-TagH_at"         "AFFX-r2-TagIN-3_at"      "AFFX-r2-TagIN-5_at"     
[34] "AFFX-r2-TagIN-M_at"      "AFFX-r2-TagJ-3_at"       "AFFX-r2-TagJ-5_at"      
[37] "AFFX-r2-TagO-3_at"       "AFFX-r2-TagO-5_at"       "AFFX-r2-TagQ-3_at"      
[40] "AFFX-r2-TagQ-5_at"      

Thanks a lot for your help. Matthias

ADD COMMENT
0
Entering edit mode

Matthias,

I would start by creating a map from the htmg430pm CDF to the mouse4302 CDF. To do this you will likely need to look at the specific probe sequences using the htmg430pmprobe and mouse4302probe packages. These have the actual sequence of each probe. This is the safe way to (1) check that the probes are actually identical between platforms, and (2) create a mapping between probe ids. 

Here is a function I wrote a while back to map probes between platforms:

makeMaps <- function(chipname1, chipname2){

  require(paste(chipname1,"probe",sep=""), character.only=TRUE)
  require(paste(chipname2,"probe",sep=""), character.only=TRUE)

  probe1 <- get(paste(chipname1,"probe", sep=""))
  seq1 <- probe1$sequence
  index1 <- xy2indices(probe1$x, probe1$y, cdf=paste(chipname1,"cdf",sep=""))

  probe2 <- get(paste(chipname2,"probe", sep=""))
  seq2 <- probe2$sequence
  index2 <- xy2indices(probe2$x, probe2$y, cdf=paste(chipname2,"cdf",sep=""))

  Index <- match(seq1,seq2)
  Index1 <- which(!is.na(Index))
  Index2 <- Index[Index1]

  chipmap <- cbind(index1[Index1], index2[Index2])
  colnames(chipmap) <- c(chipname1, chipname2)
  return(chipmap)
}

And here is a function that uses the makeMaps function to convert between platforms:

convertPlatform <- function(object, new.platform){
  if(class(object) != "AffyBatch") stop("object must be of class AffyBatch.")
  if(length(new.platform) != 1 | !is.character(new.platform)){
    stop("new.platform must be a character vector of length 1.")
  }

  cdfname <- cleancdfname(cdfName(object))
  old.platform <- gsub("cdf","",cdfname)
  map <- makeMaps(new.platform, old.platform)

  tmp <- new("AffyBatch", cdfName=new.platform)
  pns <- probeNames(tmp)
  index <- unlist(pmindex(tmp))
  mIndex <- match(index,map[,1])
  if(any(is.na(mIndex))) stop("new.platform is not a subset of the original platform")
  pmIndex <- map[mIndex,2]

  require(paste(new.platform,"cdf",sep=""), character.only=TRUE)
  env <- get(paste(new.platform,"dim",sep=""))
  nc <- env$NCOL
  nr <- env$NROW
  exprs2 <- matrix(nrow=nc*nr, ncol=length(object))
  exprs2[index,] <- exprs(object)[pmIndex,]
  
  new("AffyBatch", exprs=exprs2, cdfName=new.platform, nrow=nr, ncol=nc)
}

Those functions may need to be modified to suit your specific situation, but hopefully they provide a starting point. 

It also appears that the 40 probes that are specific to the htmg430pm array are control probes (begin with AFFX). It's probably safe to remove those. 

Not to dissuade you from exploring this further, but is there a specific reason you are not using RMA? With a small data set like the one you describe, there isn't a huge benefit to using fRMA. And the results from fRMA and RMA will likely be very similar. 

Best, Matt

 

ADD REPLY
0
Entering edit mode

Hello Matt,

Thanks a lot for your help. Your convertPlatform function was exactly, what I was looking for and solved my technical problems. In the meantime I realized, that you also included it as part of frmatools package - sorry for bothering you with that issue.

Frankly speaking, the differences between fRMA and RMA are a closed book to me. I have some NGS experience, but none with arrays. We are interested single array results and the absolute expression of a particular gene set and hence planned to apply the Barcode algorithm to the data. To me it seemed that the fRMA analysis is a prerequisite to correctly apply Barcode. In the NGS field, there are lots of aligners available, but I nevertheless usually stick to the one used by the original authors of the pipeline, even though I might have other personal preferences.

Thanks again for your help

Matthias

ADD REPLY
0
Entering edit mode

You are correct -- to use the barcode algorithm, you need to preprocess your data with frma. I didn't realize this was the eventual goal.

 

 

ADD REPLY

Login before adding your answer.

Traffic: 668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6