Weird error with raw data using function boxplot after importing Affymetrix Human ST 2.0 CEL files with oligo
1
1
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

based on the code presented on a previous post, regarding the import and analysis of Affymetrix Human ST 2.0 microarrays (Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotation options), i tried to create some exploratory diagnostic plots regarding the success of normalization, for a needed report:

library(oligo)
library(affycoretools)
library(hugene20sttranscriptcluster.db)

library(limma)

librarypd.hugene.2.0.st)

setwd(mydir)

pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info

celfiles = list.celfiles()

affyRaw <- read.celfiles(celfiles)

identical(colnames(affyRaw),rownames(pdat)) # need to be identical for incorporate phenotype info

[1] TRUE

pd <- AnnotatedDataFrame(data= pdat)
phenoData(affyRaw) <- pd
celfiles.rma <- rma(affy.cels, target="probeset")

But then when i tried:

par(mfrow=c(1,2))
boxplot(affyRaw, col = cols, target="core",las = 2, main = "Pre-Normalization");

Error in `$<-.data.frame`(`*tmp*`, "channel", value = integer(0)) : 
  replacement has 0 rows, data has 2


boxplot(celfiles.rma, col = cols, las = 2, main = "Post-Normalization")

Any ideas or suggestions about this weird error ?

affyRaw
GeneFeatureSet (storageMode: lockedEnvironment)
assayData: 2598544 features, 6 samples 
  element names: exprs 
protocolData
  rowNames: Eogh1_DN41.CEL Eogh1_DN42.CEL ... Eogh1_MN243.CEL (6 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: Eogh1_DN41.CEL Eogh1_DN42.CEL ... Eogh1_MN243.CEL (6 total)
  varLabels: Condition Batch
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.2.0.st 

 sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Greek_Greece.1253  LC_CTYPE=Greek_Greece.1253   
[3] LC_MONETARY=Greek_Greece.1253 LC_NUMERIC=C                 
[5] LC_TIME=Greek_Greece.1253    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] RColorBrewer_1.1-2                   pd.hugene.2.0.st_3.14.1             
 [3] DBI_0.5-1                            RSQLite_1.1-2                       
 [5] limma_3.28.21                        hugene20sttranscriptcluster.db_8.4.0
 [7] org.Hs.eg.db_3.3.0                   AnnotationDbi_1.34.4                
 [9] affycoretools_1.44.3                 oligo_1.36.1                        
[11] Biostrings_2.40.2                    XVector_0.12.1                      
[13] IRanges_2.6.1                        S4Vectors_0.10.3                    
[15] Biobase_2.32.0                       oligoClasses_1.34.0                 
[17] BiocGenerics_0.18.0                 

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2              hwriter_1.3.2                
  [3] biovizBase_1.20.0             htmlTable_1.9                
  [5] GenomicRanges_1.24.3          base64enc_0.1-3              
  [7] dichromat_2.0-0               affyio_1.42.0                
  [9] interactiveDisplayBase_1.10.3 codetools_0.2-15             
 [11] splines_3.3.1                 R.methodsS3_1.7.1            
 [13] ggbio_1.20.2                  geneplotter_1.50.0           
 [15] knitr_1.15.1                  Formula_1.2-1                
 [17] Rsamtools_1.24.0              annotate_1.50.1              
 [19] cluster_2.0.5                 GO.db_3.3.0                  
 [21] R.oo_1.21.0                   graph_1.50.0                 
 [23] shiny_1.0.0                   httr_1.2.1                   
 [25] GOstats_2.38.1                backports_1.0.5              
 [27] assertthat_0.1                Matrix_1.2-8                 
 [29] lazyeval_0.2.0                acepack_1.4.1                
 [31] htmltools_0.3.5               tools_3.3.1                  
 [33] gtable_0.2.0                  affy_1.50.0                  
 [35] Category_2.38.0               reshape2_1.4.2               
 [37] affxparser_1.44.0             Rcpp_0.12.9                  
 [39] gdata_2.17.0                  preprocessCore_1.34.0        
 [41] rtracklayer_1.32.2            iterators_1.0.8              
 [43] stringr_1.1.0                 mime_0.5                     
 [45] ensembldb_1.4.7               gtools_3.5.0                 
 [47] XML_3.98-1.5                  AnnotationHub_2.4.2          
 [49] edgeR_3.14.0                  zlibbioc_1.18.0              
 [51] scales_0.4.1                  BSgenome_1.40.1              
 [53] VariantAnnotation_1.18.7      BiocInstaller_1.22.3         
 [55] SummarizedExperiment_1.2.3    RBGL_1.48.1                  
 [57] memoise_1.0.0                 gridExtra_2.2.1              
 [59] ggplot2_2.2.1                 biomaRt_2.28.0               
 [61] rpart_4.1-10                  reshape_0.8.6                
 [63] latticeExtra_0.6-28           stringi_1.1.2                
 [65] gcrma_2.44.0                  genefilter_1.54.2            
 [67] foreach_1.4.3                 checkmate_1.8.2              
 [69] caTools_1.17.1                GenomicFeatures_1.24.5       
 [71] BiocParallel_1.6.6            GenomeInfoDb_1.8.7           
 [73] ReportingTools_2.12.2         bitops_1.0-6                 
 [75] lattice_0.20-34               GenomicAlignments_1.8.4      
 [77] htmlwidgets_0.8               bit_1.1-12                   
 [79] GSEABase_1.34.1               AnnotationForge_1.14.2       
 [81] GGally_1.3.0                  plyr_1.8.4                   
 [83] magrittr_1.5                  DESeq2_1.12.4                
 [85] R6_2.2.0                      gplots_3.0.1                 
 [87] Hmisc_4.0-2                   foreign_0.8-67               
 [89] survival_2.40-1               RCurl_1.95-4.8               
 [91] nnet_7.3-12                   tibble_1.2                   
 [93] KernSmooth_2.23-15            OrganismDbi_1.14.1           
 [95] PFAM.db_3.3.0                 locfit_1.5-9.1               
 [97] grid_3.3.1                    data.table_1.10.4            
 [99] digest_0.6.12                 xtable_1.8-2                 
[101] ff_2.2-13                     httpuv_1.3.3                 
[103] R.utils_2.5.0                 munsell_0.4.3 

 

microarray affymetrix Human ST arrays oligo hugene20sttranscriptcluster.db • 1.8k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States

Adding in your own phenoData object is usually a recipe for disaster, unless you ensure that you have done things correctly. I would bet if you do

validObject(rawData)

you will get a FALSE returned, due to some problems with the phenoData object you put in. And since boxplot uses information from the phenoData slot, that is likely where the problem is.

ADD COMMENT
0
Entering edit mode

Dear James, 

thank you for your answer and suggestions-regarding your comment about phenoData object, i procedeed as above, because of other errors initially appeared--but i have checked consistently that the rownames of the pdat object are identical and in the same order with the row names of the affy data object by the function identical above. In detail, when i initially used:

affyRaw <- read.celfiles(celfiles, phenoData = pdat)
Platform design info loaded.
Reading in : Eogh1_DN41.CEL
Reading in : Eogh1_DN42.CEL
Reading in : Eogh1_DN43.CEL
Reading in : Eogh1_MN241.CEL
Reading in : Eogh1_MN242.CEL
Reading in : Eogh1_MN243.CEL
Error in environmentIsLocked(object) : not an environment

Then, when i moved as above, adding after the phenodata info:

validObject(affyRaw)
Error in validObject(affyRaw) : invalid class "GeneFeatureSet" object: 
  'NChannelSet' varMetadata must have a 'channel' column

And finally, why the error appears in the raw data and not after normalization with boxplot ?

Thus, what is your opinion based on the above errors ?

(*Just to mention, this is not a public dataset and is based on a collaboration, that's why the phenodata data frame is added externally (if you meant so above))

 

ADD REPLY
3
Entering edit mode

That's exactly what I was talking about. It's non-trivial to add a phenoData object to any of the eSet-derived objects, because there are expectations for the format that you might not know about. In this example, you are trying to put a phenoData object into something that derives from an NChannelSet. And as you see in  your error, an NChannelSet's phenoData has to contain a varMetadata that contains a 'channel' column.

So there are two ways you can approach this issue. You can educate yourself as to the ins and outs of eSet derived classes, which sounds pretty fun now that I mention it, or you can just let oligo generate the correct phenoData object, and then modify what it made to suit. As an example, using some random data I happen to have floating around here:

> dat <- read.celfiles(list.celfiles())
Loading required package: pd.ragene.2.0.st
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : G10.CEL
Reading in : G11.CEL
<snip>

So now I have a GeneFeatureSet that is officially up to snuff, and we can now modify the information in the phenoData slot. Let's see what's in the pData slot of the phenoData object:

> pData(dat)
          index
G10.CEL       1
G11.CEL       2
G12.CEL       3
G13.CEL       4

That's pretty boring. Let's say we want to add extra stuff like the treatment and type of animal.

> z <- pData(dat)
> otherdat <- data.frame(treatment = rep(c("treated","control"), each = 18), type = rep(c("KO","WT"), each = 9, times = 2))
> z <- data.frame(z, otherdat)
> head(z)
        index treatment type
G10.CEL     1   treated   KO
G11.CEL     2   treated   KO
G12.CEL     3   treated   KO
G13.CEL     4   treated   KO
G14.CEL     5   treated   KO
G15.CEL     6   treated   KO

## Note here that the row.names of z still match the file names!

> pData(dat) <- z
> validObject(dat)
[1] TRUE

And now I have a GeneFeatureSet that has the phenotypic information I care about, while remaining a valid object!

ADD REPLY
0
Entering edit mode

Dear James, 

thank you again for your answer and comprehensive example !! Although i have read extensively about eSet containers, especially with the Affy package, but it seems that i have encoutered this problem with the oligo R package for the first time-also regarding the possible choises you suggest, because i have only 6 samples, i will try your second approach from scratch for a quick way-and then perhaps check more extensively the above problem and the specific format-NChannelSet

ADD REPLY

Login before adding your answer.

Traffic: 603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6