Newbie Question - adding metadata to an analysis
1
0
Entering edit mode
Bryan H • 0
@009a9ffd
Last seen 1 hour ago
Canada

I am trying to reanalyze some old microarray data from my lab, using this tutorial: https://bioconductor.org/packages/release/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html

The origonal data was collected on an Affymetrix human genome 2.0 microarray. I have 6 .CEL files - 3 controls, 3 patients. While I am able to load the files and perform most of the analyses in the tutorial, the .CEL files lack meaningful metadata, and as a result, I cannot identify specific samples or groups in the resulting analyses, nor can I get some of the data to plot properly.

Everything goes fine until step 5 of the tutorial. Here, because my files lack metadata, the command:

head(Biobase::pData(raw_data))

Gives the following output instead of a list of metadata columns I can select for additional analysis:

             index
ctrlMac1.CEL     1
ctrlMac2.CEL     2
ctrMac3.CEL      3
PtMac1.CEL       4
PtMac2.CEL       5
PtMac3.CEL       6

I have searched the documentation, but cannot figure out how to add metainformation that can then be extracted/used using the pData command. How do I go about annotating my data?

Thank you,

Bryan

Output of sessioninfo():

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] openxlsx_4.2.7.1                     genefilter_1.88.0                   
 [3] matrixStats_1.4.1                    stringr_1.5.1                       
 [5] tidyr_1.3.1                          dplyr_1.1.4                         
 [7] enrichplot_1.26.5                    pheatmap_1.0.12                     
 [9] RColorBrewer_1.1-3                   geneplotter_1.84.0                  
[11] annotate_1.84.0                      XML_3.99-0.18                       
[13] lattice_0.22-6                       clusterProfiler_4.14.4              
[15] ReactomePA_1.50.0                    topGO_2.58.0                        
[17] SparseM_1.84-2                       GO.db_3.20.0                        
[19] graph_1.84.0                         arrayQualityMetrics_3.62.0          
[21] hugene10sttranscriptcluster.db_8.8.0 pd.hugene.1.0.st.v1_3.14.1          
[23] ArrayExpress_1.66.0                  ggplot2_3.5.1                       
[25] gplots_3.2.0                         hugene20sttranscriptcluster.db_8.8.0
[27] org.Hs.eg.db_3.20.0                  AnnotationDbi_1.68.0                
[29] BiocManager_1.30.25                  pd.hugene.2.0.st_3.14.1             
[31] DBI_1.2.3                            RSQLite_2.3.9                       
[33] limma_3.62.1                         affy_1.84.0                         
[35] oligo_1.70.0                         Biostrings_2.74.1                   
[37] GenomeInfoDb_1.42.1                  XVector_0.46.0                      
[39] IRanges_2.40.1                       S4Vectors_0.44.0                    
[41] Biobase_2.66.0                       oligoClasses_1.68.0                 
[43] BiocGenerics_0.52.0                 

loaded via a namespace (and not attached):
  [1] fs_1.6.5                    bitops_1.0-9               
  [3] httr_1.4.7                  tools_4.4.2                
  [5] gcrma_2.78.0                backports_1.5.0            
  [7] R6_2.5.1                    lazyeval_0.2.2             
  [9] withr_3.0.2                 graphite_1.52.0            
 [11] gridExtra_2.3               base64_2.0.2               
 [13] preprocessCore_1.68.0       cli_3.6.3                  
 [15] labeling_0.4.3              askpass_1.2.1              
 [17] systemfonts_1.1.0           yulab.utils_0.1.8          
 [19] gson_0.1.0                  foreign_0.8-87             
 [21] illuminaio_0.48.0           DOSE_4.0.0                 
 [23] svglite_2.1.3               R.utils_2.12.3             
 [25] affyPLM_1.82.0              BeadDataPackR_1.58.0       
 [27] rstudioapi_0.17.1           generics_0.1.3             
 [29] gridGraphics_0.5-1          hwriter_1.3.2.1            
 [31] gtools_3.9.5                zip_2.3.1                  
 [33] Matrix_1.7-1                interp_1.1-6               
 [35] abind_1.4-8                 R.methodsS3_1.8.2          
 [37] lifecycle_1.0.4             SummarizedExperiment_1.36.0
 [39] beadarray_2.56.0            qvalue_2.38.0              
 [41] SparseArray_1.6.0           grid_4.4.2                 
 [43] blob_1.2.4                  affxparser_1.78.0          
 [45] crayon_1.5.3                ggtangle_0.0.6             
 [47] cowplot_1.1.3               KEGGREST_1.46.0            
 [49] pillar_1.10.0               knitr_1.49                 
 [51] fgsea_1.32.2                GenomicRanges_1.58.0       
 [53] codetools_0.2-20            fastmatch_1.1-6            
 [55] glue_1.8.0                  ggfun_0.1.8                
 [57] data.table_1.16.4           vctrs_0.6.5                
 [59] png_0.1-8                   treeio_1.30.0              
 [61] gtable_0.3.6                cachem_1.1.0               
 [63] xfun_0.49                   S4Arrays_1.6.0             
 [65] tidygraph_1.3.1             survival_3.8-3             
 [67] iterators_1.0.14            statmod_1.5.0              
 [69] nlme_3.1-166                ggtree_3.14.0              
 [71] bit64_4.5.2                 affyio_1.76.0              
 [73] KernSmooth_2.23-26          rpart_4.1.23               
 [75] colorspace_2.1-1            Hmisc_5.2-1                
 [77] nnet_7.3-20                 tidyselect_1.2.1           
 [79] bit_4.5.0.1                 compiler_4.4.2             
 [81] htmlTable_2.4.3             DelayedArray_0.32.0        
 [83] checkmate_2.3.2             scales_1.3.0               
 [85] caTools_1.18.3              hexbin_1.28.5              
 [87] rappdirs_0.3.3              digest_0.6.37              
 [89] rmarkdown_2.29              htmltools_0.5.8.1          
 [91] pkgconfig_2.0.3             jpeg_0.1-10                
 [93] base64enc_0.1-3             MatrixGenerics_1.18.0      
 [95] fastmap_1.2.0               rlang_1.1.4                
 [97] htmlwidgets_1.6.4           UCSC.utils_1.2.0           
 [99] farver_2.1.2                jsonlite_1.8.9             
[101] BiocParallel_1.40.0         GOSemSim_2.32.0            
[103] R.oo_1.27.0                 magrittr_2.0.3             
[105] Formula_1.2-5               GenomeInfoDbData_1.2.13    
[107] ggplotify_0.1.2             patchwork_1.3.0            
[109] munsell_0.5.1               Rcpp_1.0.13-1              
[111] ape_5.8-1                   viridis_0.6.5              
[113] vsn_3.74.0                  stringi_1.8.4              
[115] ggraph_2.2.1                zlibbioc_1.52.0            
[117] MASS_7.3-63                 plyr_1.8.9                 
[119] parallel_4.4.2              ggrepel_0.9.6              
[121] deldir_2.0-4                graphlayouts_1.2.1         
[123] splines_4.4.2               igraph_2.1.2               
[125] reshape2_1.4.4              evaluate_1.0.1             
[127] latticeExtra_0.6-30         foreach_1.5.2              
[129] tweenr_2.0.3                openssl_2.3.0              
[131] purrr_1.0.2                 polyclip_1.10-7            
[133] ggforce_0.4.2               xtable_1.8-4               
[135] ff_4.5.0                    reactome.db_1.89.0         
[137] tidytree_0.4.6              viridisLite_0.4.2          
[139] tibble_3.2.1                aplot_0.2.4                
[141] memoise_2.0.1               setRNG_2024.2-1            
[143] cluster_2.1.8               gridSVG_1.7-5   
Biobase • 149 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 23 minutes ago
United States

See Section 4. But do note that you do not need for the phenoData slot to contain the data. You can simply have a 'targets' data.frame that is consistent with your ExpressionSet that you can use to generate the design matrix.

0
Entering edit mode

Thanks for the documentation. I have most of it working, except for the most critical step - incorporating the new metadata with the dataset.

At this point I am successfully:

  1. Importing the .CEL files into the variable raw_data
  2. I can generate the pData, metadata, phenoData, and experimentData variables as per the documentation, but with my relevant information included.

However, when I try to merge the raw_data (e.g. microarray data) with this data, using the command:

annotated_data <- ExpressionSet(assayData=raw_data, phenoData=phenoData, experimentData=experimentData, annotation="pd.hugene.2.0.st"

I get the error:

Error: unable to find an inherited method for function 'ExpressionSet' for signature 'assayData = "GeneFeatureSet"'
ADD REPLY
0
Entering edit mode

You don't need to generate an ExpressionSet, as you already have a GeneFeatureSet, which extends eSet and is a better fit for what you are doing.

> getClass("GeneFeatureSet")
Class "GeneFeatureSet" [package "oligoClasses"]

Slots:

Name:        manufacturer
Class:          character

Name:       intensityFile
Class:          character

Name:           assayData
Class:          AssayData

Name:           phenoData
Class: AnnotatedDataFrame

Name:         featureData
Class: AnnotatedDataFrame

Name:      experimentData
Class:              MIAxE

Name:          annotation
Class:          character

Name:        protocolData
Class: AnnotatedDataFrame

Name:   .__classVersion__
Class:           Versions

Extends: 
Class "FeatureSet", directly
Class "NChannelSet", by class "FeatureSet", distance 2
Class "eSet", by class "FeatureSet", distance 3
Class "VersionedBiobase", by class "FeatureSet", distance 4
Class "Versioned", by class "FeatureSet", distance 5

I pointed you to the help for ExpressionSet because it shows how to generate a phenoData object. But maybe that is a bit too pedantic. Since you already have a GeneFeatureSet, it's easier to just pull out the existing 'phenoData' object, add your stuff, and put it back in using pData<-

> library(oligo)
## just get an example GeneFeatureSet
> example(rma)
<things happen>
## pull out the phenoData
> z <- pData(summarized)
> z
                index
9868701_532.xys     1
9868901_532.xys     2
9869001_532.xys     3
9870301_532.xys     4
9870401_532.xys     5
9870601_532.xys     6
## add stuff
> z$Treatment <- factor(rep(c("Control","Treated"), each = 3))
> z
                index Treatment
9868701_532.xys     1   Control
9868901_532.xys     2   Control
9869001_532.xys     3   Control
9870301_532.xys     4   Treated
9870401_532.xys     5   Treated
9870601_532.xys     6   Treated

## put it back in
> pData(summarized) <- z
> pData(summarized)
                index Treatment
9868701_532.xys     1   Control
9868901_532.xys     2   Control
9869001_532.xys     3   Control
9870301_532.xys     4   Treated
9870401_532.xys     5   Treated
9870601_532.xys     6   Treated
ADD REPLY
0
Entering edit mode

Makes sense now, thanks for the help!

ADD REPLY

Login before adding your answer.

Traffic: 663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6