Question

Newbie Question - adding metadata to an analysis

0

Entering edit mode

Bryan H • 0

@009a9ffd

Last seen 13 months ago

Canada

I am trying to reanalyze some old microarray data from my lab, using this tutorial: https://bioconductor.org/packages/release/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html

The origonal data was collected on an Affymetrix human genome 2.0 microarray. I have 6 .CEL files - 3 controls, 3 patients. While I am able to load the files and perform most of the analyses in the tutorial, the .CEL files lack meaningful metadata, and as a result, I cannot identify specific samples or groups in the resulting analyses, nor can I get some of the data to plot properly.

Everything goes fine until step 5 of the tutorial. Here, because my files lack metadata, the command:

head(Biobase::pData(raw_data))

Gives the following output instead of a list of metadata columns I can select for additional analysis:

             index
ctrlMac1.CEL     1
ctrlMac2.CEL     2
ctrMac3.CEL      3
PtMac1.CEL       4
PtMac2.CEL       5
PtMac3.CEL       6

I have searched the documentation, but cannot figure out how to add metainformation that can then be extracted/used using the pData command. How do I go about annotating my data?

Thank you,

Bryan

Output of sessioninfo():

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] openxlsx_4.2.7.1                     genefilter_1.88.0                   
 [3] matrixStats_1.4.1                    stringr_1.5.1                       
 [5] tidyr_1.3.1                          dplyr_1.1.4                         
 [7] enrichplot_1.26.5                    pheatmap_1.0.12                     
 [9] RColorBrewer_1.1-3                   geneplotter_1.84.0                  
[11] annotate_1.84.0                      XML_3.99-0.18                       
[13] lattice_0.22-6                       clusterProfiler_4.14.4              
[15] ReactomePA_1.50.0                    topGO_2.58.0                        
[17] SparseM_1.84-2                       GO.db_3.20.0                        
[19] graph_1.84.0                         arrayQualityMetrics_3.62.0          
[21] hugene10sttranscriptcluster.db_8.8.0 pd.hugene.1.0.st.v1_3.14.1          
[23] ArrayExpress_1.66.0                  ggplot2_3.5.1                       
[25] gplots_3.2.0                         hugene20sttranscriptcluster.db_8.8.0
[27] org.Hs.eg.db_3.20.0                  AnnotationDbi_1.68.0                
[29] BiocManager_1.30.25                  pd.hugene.2.0.st_3.14.1             
[31] DBI_1.2.3                            RSQLite_2.3.9                       
[33] limma_3.62.1                         affy_1.84.0                         
[35] oligo_1.70.0                         Biostrings_2.74.1                   
[37] GenomeInfoDb_1.42.1                  XVector_0.46.0                      
[39] IRanges_2.40.1                       S4Vectors_0.44.0                    
[41] Biobase_2.66.0                       oligoClasses_1.68.0                 
[43] BiocGenerics_0.52.0                 

loaded via a namespace (and not attached):
  [1] fs_1.6.5                    bitops_1.0-9               
  [3] httr_1.4.7                  tools_4.4.2                
  [5] gcrma_2.78.0                backports_1.5.0            
  [7] R6_2.5.1                    lazyeval_0.2.2             
  [9] withr_3.0.2                 graphite_1.52.0            
 [11] gridExtra_2.3               base64_2.0.2               
 [13] preprocessCore_1.68.0       cli_3.6.3                  
 [15] labeling_0.4.3              askpass_1.2.1              
 [17] systemfonts_1.1.0           yulab.utils_0.1.8          
 [19] gson_0.1.0                  foreign_0.8-87             
 [21] illuminaio_0.48.0           DOSE_4.0.0                 
 [23] svglite_2.1.3               R.utils_2.12.3             
 [25] affyPLM_1.82.0              BeadDataPackR_1.58.0       
 [27] rstudioapi_0.17.1           generics_0.1.3             
 [29] gridGraphics_0.5-1          hwriter_1.3.2.1            
 [31] gtools_3.9.5                zip_2.3.1                  
 [33] Matrix_1.7-1                interp_1.1-6               
 [35] abind_1.4-8                 R.methodsS3_1.8.2          
 [37] lifecycle_1.0.4             SummarizedExperiment_1.36.0
 [39] beadarray_2.56.0            qvalue_2.38.0              
 [41] SparseArray_1.6.0           grid_4.4.2                 
 [43] blob_1.2.4                  affxparser_1.78.0          
 [45] crayon_1.5.3                ggtangle_0.0.6             
 [47] cowplot_1.1.3               KEGGREST_1.46.0            
 [49] pillar_1.10.0               knitr_1.49                 
 [51] fgsea_1.32.2                GenomicRanges_1.58.0       
 [53] codetools_0.2-20            fastmatch_1.1-6            
 [55] glue_1.8.0                  ggfun_0.1.8                
 [57] data.table_1.16.4           vctrs_0.6.5                
 [59] png_0.1-8                   treeio_1.30.0              
 [61] gtable_0.3.6                cachem_1.1.0               
 [63] xfun_0.49                   S4Arrays_1.6.0             
 [65] tidygraph_1.3.1             survival_3.8-3             
 [67] iterators_1.0.14            statmod_1.5.0              
 [69] nlme_3.1-166                ggtree_3.14.0              
 [71] bit64_4.5.2                 affyio_1.76.0              
 [73] KernSmooth_2.23-26          rpart_4.1.23               
 [75] colorspace_2.1-1            Hmisc_5.2-1                
 [77] nnet_7.3-20                 tidyselect_1.2.1           
 [79] bit_4.5.0.1                 compiler_4.4.2             
 [81] htmlTable_2.4.3             DelayedArray_0.32.0        
 [83] checkmate_2.3.2             scales_1.3.0               
 [85] caTools_1.18.3              hexbin_1.28.5              
 [87] rappdirs_0.3.3              digest_0.6.37              
 [89] rmarkdown_2.29              htmltools_0.5.8.1          
 [91] pkgconfig_2.0.3             jpeg_0.1-10                
 [93] base64enc_0.1-3             MatrixGenerics_1.18.0      
 [95] fastmap_1.2.0               rlang_1.1.4                
 [97] htmlwidgets_1.6.4           UCSC.utils_1.2.0           
 [99] farver_2.1.2                jsonlite_1.8.9             
[101] BiocParallel_1.40.0         GOSemSim_2.32.0            
[103] R.oo_1.27.0                 magrittr_2.0.3             
[105] Formula_1.2-5               GenomeInfoDbData_1.2.13    
[107] ggplotify_0.1.2             patchwork_1.3.0            
[109] munsell_0.5.1               Rcpp_1.0.13-1              
[111] ape_5.8-1                   viridis_0.6.5              
[113] vsn_3.74.0                  stringi_1.8.4              
[115] ggraph_2.2.1                zlibbioc_1.52.0            
[117] MASS_7.3-63                 plyr_1.8.9                 
[119] parallel_4.4.2              ggrepel_0.9.6              
[121] deldir_2.0-4                graphlayouts_1.2.1         
[123] splines_4.4.2               igraph_2.1.2               
[125] reshape2_1.4.4              evaluate_1.0.1             
[127] latticeExtra_0.6-30         foreach_1.5.2              
[129] tweenr_2.0.3                openssl_2.3.0              
[131] purrr_1.0.2                 polyclip_1.10-7            
[133] ggforce_0.4.2               xtable_1.8-4               
[135] ff_4.5.0                    reactome.db_1.89.0         
[137] tidytree_0.4.6              viridisLite_0.4.2          
[139] tibble_3.2.1                aplot_0.2.4                
[141] memoise_2.0.1               setRNG_2024.2-1            
[143] cluster_2.1.8               gridSVG_1.7-5

Biobase • 647 views

ADD COMMENT • link 13 months ago Bryan H • 0

score 0 · Answer 1 · 2025-01-06

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

See Section 4. But do note that you do not need for the phenoData slot to contain the data. You can simply have a 'targets' data.frame that is consistent with your ExpressionSet that you can use to generate the design matrix.

ADD COMMENT • link 13 months ago James W. MacDonald 68k

0

Entering edit mode

Thanks for the documentation. I have most of it working, except for the most critical step - incorporating the new metadata with the dataset.

At this point I am successfully:

Importing the .CEL files into the variable raw_data
I can generate the pData, metadata, phenoData, and experimentData variables as per the documentation, but with my relevant information included.

However, when I try to merge the raw_data (e.g. microarray data) with this data, using the command:

annotated_data <- ExpressionSet(assayData=raw_data, phenoData=phenoData, experimentData=experimentData, annotation="pd.hugene.2.0.st"

I get the error:

Error: unable to find an inherited method for function 'ExpressionSet' for signature 'assayData = "GeneFeatureSet"'

ADD REPLY • link 13 months ago Bryan H • 0

0

Entering edit mode

You don't need to generate an ExpressionSet, as you already have a GeneFeatureSet, which extends eSet and is a better fit for what you are doing.

> getClass("GeneFeatureSet")
Class "GeneFeatureSet" [package "oligoClasses"]

Slots:

Name:        manufacturer
Class:          character

Name:       intensityFile
Class:          character

Name:           assayData
Class:          AssayData

Name:           phenoData
Class: AnnotatedDataFrame

Name:         featureData
Class: AnnotatedDataFrame

Name:      experimentData
Class:              MIAxE

Name:          annotation
Class:          character

Name:        protocolData
Class: AnnotatedDataFrame

Name:   .__classVersion__
Class:           Versions

Extends: 
Class "FeatureSet", directly
Class "NChannelSet", by class "FeatureSet", distance 2
Class "eSet", by class "FeatureSet", distance 3
Class "VersionedBiobase", by class "FeatureSet", distance 4
Class "Versioned", by class "FeatureSet", distance 5

I pointed you to the help for ExpressionSet because it shows how to generate a phenoData object. But maybe that is a bit too pedantic. Since you already have a GeneFeatureSet, it's easier to just pull out the existing 'phenoData' object, add your stuff, and put it back in using pData<-

> library(oligo)
## just get an example GeneFeatureSet
> example(rma)
<things happen>
## pull out the phenoData
> z <- pData(summarized)
> z
                index
9868701_532.xys     1
9868901_532.xys     2
9869001_532.xys     3
9870301_532.xys     4
9870401_532.xys     5
9870601_532.xys     6
## add stuff
> z$Treatment <- factor(rep(c("Control","Treated"), each = 3))
> z
                index Treatment
9868701_532.xys     1   Control
9868901_532.xys     2   Control
9869001_532.xys     3   Control
9870301_532.xys     4   Treated
9870401_532.xys     5   Treated
9870601_532.xys     6   Treated

## put it back in
> pData(summarized) <- z
> pData(summarized)
                index Treatment
9868701_532.xys     1   Control
9868901_532.xys     2   Control
9869001_532.xys     3   Control
9870301_532.xys     4   Treated
9870401_532.xys     5   Treated
9870601_532.xys     6   Treated