I'm having trouble with the minfi
package, specifically the read.metharray.sheet
function.
The missmethyl
package vignette loads the sample sheet from minfiData
package like so:
library(minfi) library(minfiData) baseDir <- system.file("extdata", package = "minfiData") targets <- read.metharray.sheet(baseDir)
I wanted to try the missMethyl
package on public data and the minfi
vignette shows how to load public data from NCBI:
library(GEOquery)
getGEOSuppFiles("GSE68777") # Get this accession/experiment
untar("GSE68777/GSE68777_RAW.tar", exdir = "GSE68777/idat") # untar the files
head(list.files("GSE68777/idat", pattern = "idat")) # look at the .idat files
idatFiles <- list.files("GSE68777/idat", pattern = "idat.gz$", full = TRUE)
sapply(idatFiles, gunzip, overwrite = TRUE) # decompress the .gz .idat files
library(here)
my_path <- here("GSE68777/idat")
head(list.files(my_path))
[1] "GPL13534_450K_Manifest_header_Descriptions.xlsx.gz" [2] "GPL13534_HumanMethylation450_15017482_v.1.1.bpm.txt.gz" [3] "GPL13534_HumanMethylation450_15017482_v.1.1.csv.gz" [4] "GPL13534_HumanMethylation450_15017482_v.1.2.bpm.gz" [5] "GSM1681154_5958091019_R03C02_Grn.idat" [6] "GSM1681154_5958091019_R03C02_Red.idat"
If I unzip the csv
and try read.metharray.sheet()
I get an error (code not shown) because it's not a sample sheet! For example the minfiData
sample sheet looks like this:
[Header] | ||||||||||
Investigator Name | MrNoName | |||||||||
Project Name | DNA Methylation | |||||||||
Experiment Name | Test | |||||||||
Date | ######## | |||||||||
[Data] | ||||||||||
Sample_Name | Sample_Well | Sample_Plate | Sample_Group | Pool_ID | Sentrix_ID | Sentrix_Position | person | age | sex | status |
GroupA_3 | H5 | GroupA | 5.72E+09 | R02C02 | id3 | 83 | M | normal | ||
GroupA_2 | D5 | GroupA | 5.72E+09 | R04C01 | id2 | 58 | F | normal | ||
GroupB_3 | C6 | GroupB | 5.72E+09 | R05C02 | id3 | 83 | M | cancer | ||
GroupB_1 | F7 | GroupB | 5.72E+09 | R04C02 | id1 | 75 | F | cancer | ||
GroupA_1 | G7 | GroupA | 5.72E+09 | R05C02 | id1 | 75 | F | normal | ||
GroupB_2 | H7 | GroupB | 5.72E+09 | R06C02 | id2 | 58 | F | cancer |
But GPL13534_HumanMethylation450_15017482_v.1.1.csv
looks like a manifest file:
Illumina | Inc. | |||||||||||||||||||||||||||||
[Heading] | ||||||||||||||||||||||||||||||
Descriptor File Name | BS0010894-AQP_content.bpm | |||||||||||||||||||||||||||||
Assay Format | Infinium 2 | |||||||||||||||||||||||||||||
Date Manufactured | ######## | |||||||||||||||||||||||||||||
Loci Count | 485553 | |||||||||||||||||||||||||||||
[Assay] | ||||||||||||||||||||||||||||||
IlmnID | Name | AddressA_ID | AlleleA_ProbeSeq | AddressB_ID | AlleleB_ProbeSeq | Infinium_Design_Type | Next_Base | Color_Channel | Forward_Sequence | Genome_Build | CHR | MAPINFO | SourceSeq | Chromosome_36 | Coordinate_36 | Strand | Probe_SNPs | Probe_SNPs_10 | Random_Loci | Methyl27_Loci | UCSC_RefGene_Name | UCSC_RefGene_Accession | UCSC_RefGene_Group | UCSC_CpG_Islands_Name | Relation_to_UCSC_CpG_Island | Phantom | DMR | Enhancer | HMM_Island | Regulatory_Feature_Name |
cg00035864 | cg00035864 | 31729416 | AAAACACTAACAATCTTATCCACATAAACCCTTAAATTTATCTCAAATTC | II | AATCCAAAGATGATGGAGGAGTGCCCGCTCATGATGTGAAGTACCTGCTCAGCTGGAAAC[CG]AATTTGAGATAAATTCAAGGGTCTATGTGGACAAGACTGCTAGTGTCTCTCTCTGGATTG | 37 | Y | 8553009 | AGACACTAGCAGTCTTGTCCACATAGACCCTTGAATTTATCTCAAATTCG | Y | 8613009 | F | TTTY18 | NR_001550 | TSS1500 | |||||||||||||||
cg00050873 | cg00050873 | 32735311 | ACAAAAAAACAACACACAACTATAATAATTTTTAAAATAAATAAACCCCA | 31717405 | ACGAAAAAACAACGCACAACTATAATAATTTTTAAAATAAATAAACCCCG | I | A | Red | TATCTCTGTCTGGCGAGGAGGCAACGCACAACTGTGGTGGTTTTTGGAGTGGGTGGACCC[CG]GCCAAGACGGCCTGGGCTGACCAGAGACGGGAGGCAGAAAAAGTGGGCAGGTGGTTGCAG | 37 | Y | 9363356 | CGGGGTCCACCCACTCCAAAAACCACCACAGTTGTGCGTTGCCTCCTCGC | Y | 9973356 | R | TSPY4;FAM197Y2 | NM_001164471;NR_001553 | Body;TSS1500 | chrY:9363680-9363943 | N_Shore | Y:9973136-9976273 | ||||||||
cg00061679 | cg00061679 | 28780415 | AAAACATTAAAAAACTAATTCACTACTATTTAATTACTTTATTTTCCATC | II | TCAACAAATGAGAGACATTGAAGAACTAATTCACTACTATTTGGTTACTTTATTTTCCAT[CG]AAGAAAACCTCTTTTTAAAAACTAACACATAAATAAAATGAACGAAGAACAAACTAAACG | 37 | Y | 25314171 | CGATGGAAAATAAAGTAACCAAATAGTAGTGAATTAGTTCTTCAATGTCT | Y | 23723559 | R | DAZ1;DAZ4;DAZ4 | NM_004081;NM_020420;NM_001005375 | Body;Body;Body | |||||||||||||||
cg00063477 | cg00063477 | 16712347 | TATTCTTCCACACAAAATACTAAACRTATATTTACAAAAATACTTCCATC | II | CTCCTGTACTTGTTCATTAAATAATGATTCCTTGGATATACCAAGTCTGGATAGCGGATT[CG]ATGGAAGCATTTTTGTAAATATACGTTCAGTATTTTGTGTGGAAGAACACAATCTAGCTG | 37 | Y | 22741795 | CGATGGAAGCATTTTTGTAAATATACGTTCAGTATTTTGTGTGGAAGAAC | Y | 21151183 | F | rs9341313 | rs13447379 | EIF1AY | NM_004681 | Body | chrY:22737825-22738052 | S_Shelf |
None of the other files that came with GSE68777 look like a sample sheet.
If you search under "MethylationEPIC" on NCBI GEO (array that works with missMethyl
package) you will see that the majority of datasets do not have a csv
or txt
files, and the three others I tried [GSE86829, GSE103502, GSE103505] although they had text files did not have sample sheets. So how does one get this information?