Question

Microarray - Illumina Metadata

0

Entering edit mode

RMP87 • 0

@rafael-24221

Last seen 3.5 years ago

Brazil

Hello!

I'm using a GEO dataset (GSE140830) — HumanHT-12 v4 Expression BeadChip (nuID) platform — in a study, but I am having problems regarding the metadata and so how to remove outliers verified in the study. I know that it is possible to relate the info observed in the study with the data present in the metadata using affy (for example: from the series matrix file):


Data <- ReadAffy(filenames=row.names(targets), celfile.path=datadir, phenoData = targets)

In this case, the row.names carries the sample ID, but can I do something like this with Illumina data since the expression info is in table format and the table has different codes representing the samples (for example, GSM4187774 in phenoData and 5872617031_A in raw_data file)?

I believe it would be easier to rename based on what's in the expression data. But would it be possible to load this data and establish a relationship between the files in such a way as to make it easier to create groups, remove samples, and design the statistical test?

Any suggestion?

Thank you very much!

limma GEO MicroarrayData affy • 1.6k views

ADD COMMENT • link updated 3.6 years ago by James W. MacDonald 68k • written 3.6 years ago by RMP87 • 0

score 0 · Answer 1 · 2021-09-27

It's not exactly clear to me what you are after, but I'll take a stab at it. First off you probably want to read in using the GEOquery package. You could alternately download the raw data and process yourself, but it's probably not worth it, buy ymmv.

> library(GEOquery)
## need to set this first because the data are large
> Sys.setenv("VROOM_CONNECTION_SIZE" = 1000000L)
> z <- getGEO("GSE140830")[[1]]
 > head(colnames(z))
 [1] "GSM4187774" "GSM4187775" "GSM4187776" "GSM4187777" "GSM4187778"
 [6] "GSM4187779"
 > names(pData(z))
  [1] "title"                   "geo_accession"
  [3] "status"                  "submission_date"
  [5] "last_update_date"        "type"
  [7] "channel_count"           "source_name_ch1"
  [9] "organism_ch1"            "characteristics_ch1"
 [11] "characteristics_ch1.1"   "characteristics_ch1.2"
 [13] "characteristics_ch1.3"   "characteristics_ch1.4"
 [15] "molecule_ch1"            "extract_protocol_ch1"
 [17] "label_ch1"               "label_protocol_ch1"
 [19] "taxid_ch1"               "hyb_protocol"
 [21] "scan_protocol"           "description"
 [23] "data_processing"         "data_processing.1"
 [25] "platform_id"             "contact_name"
 [27] "contact_laboratory"      "contact_institute"
 [29] "contact_address"         "contact_city"
 [31] "contact_state"           "contact_zip/postal_code"
 [33] "contact_country"         "supplementary_file"
 [35] "data_row_count"          "age_at_draw:ch1"
 [37] "batch:ch1"               "connectivity_zscore:ch1"
 [39] "diagnosis:ch1"           "Sex:ch1"
!> pData(z)[1:5,1:10]
                                                title geo_accession
 GSM4187774 Whole blood, Control, 5872617031_A  [ftd]    GSM4187774
 GSM4187775  Whole blood, Control, 5872617031_C [ftd]    GSM4187775
 GSM4187776  Whole blood, Control, 5872617031_E [ftd]    GSM4187776
 GSM4187777    Whole blood, bvFTD, 5872617031_F [ftd]    GSM4187777
 GSM4187778  Whole blood, Control, 5872641011_B [ftd]    GSM4187778
                           status submission_date last_update_date type
 GSM4187774 Public on Nov 13 2020     Nov 22 2019      Nov 13 2020  RNA
 GSM4187775 Public on Nov 13 2020     Nov 22 2019      Nov 13 2020  RNA
 GSM4187776 Public on Nov 13 2020     Nov 22 2019      Nov 13 2020  RNA
 GSM4187777 Public on Nov 13 2020     Nov 22 2019      Nov 13 2020  RNA
 GSM4187778 Public on Nov 13 2020     Nov 22 2019      Nov 13 2020  RNA
            channel_count source_name_ch1 organism_ch1 characteristics_ch1
 GSM4187774             1     Whole blood Homo sapiens  diagnosis: Control
 GSM4187775             1     Whole blood Homo sapiens  diagnosis: Control
 GSM4187776             1     Whole blood Homo sapiens  diagnosis: Control
 GSM4187777             1     Whole blood Homo sapiens    diagnosis: bvFTD
 GSM4187778             1     Whole blood Homo sapiens  diagnosis: Control

> all.equal(colnames(z), row.names(pData(z)))
 [1] TRUE

The samples are all aligned with the phenodata, automatically!