Search
Question: Minfi read.metharray.sheet error
0
gravatar for moldach
12 weeks ago by
moldach10
United States
moldach10 wrote:

I'm having trouble with the minfi package, specifically the read.metharray.sheet function.

The missmethyl package vignette loads the sample sheet from minfiData package like so:

library(minfi)
library(minfiData)
baseDir <- system.file("extdata", package = "minfiData")
targets <- read.metharray.sheet(baseDir)

I wanted to try the missMethyl package on public data and the minfi vignette shows how to load public data from NCBI:

library(GEOquery)
getGEOSuppFiles("GSE68777") # Get this accession/experiment
untar("GSE68777/GSE68777_RAW.tar", exdir = "GSE68777/idat") # untar the files
head(list.files("GSE68777/idat", pattern = "idat")) # look at the .idat files
idatFiles <- list.files("GSE68777/idat", pattern = "idat.gz$", full = TRUE)
sapply(idatFiles, gunzip, overwrite = TRUE) # decompress the .gz .idat files

library(here)
my_path <- here("GSE68777/idat")
head(list.files(my_path))
[1] "GPL13534_450K_Manifest_header_Descriptions.xlsx.gz"    
[2] "GPL13534_HumanMethylation450_15017482_v.1.1.bpm.txt.gz"
[3] "GPL13534_HumanMethylation450_15017482_v.1.1.csv.gz"    
[4] "GPL13534_HumanMethylation450_15017482_v.1.2.bpm.gz"    
[5] "GSM1681154_5958091019_R03C02_Grn.idat"                 
[6] "GSM1681154_5958091019_R03C02_Red.idat" 

If I unzip the csv and try read.metharray.sheet() I get an error (code not shown) because it's not a sample sheet! For example the minfiData sample sheet looks like this:

[Header]                    
Investigator Name MrNoName                
Project Name DNA Methylation                
Experiment Name Test                  
Date ########                  
                     
[Data]                    
Sample_Name Sample_Well Sample_Plate Sample_Group Pool_ID Sentrix_ID Sentrix_Position person age sex status
GroupA_3 H5   GroupA   5.72E+09 R02C02 id3 83 M normal
GroupA_2 D5   GroupA   5.72E+09 R04C01 id2 58 F normal
GroupB_3 C6   GroupB   5.72E+09 R05C02 id3 83 M cancer
GroupB_1 F7   GroupB   5.72E+09 R04C02 id1 75 F cancer
GroupA_1 G7   GroupA   5.72E+09 R05C02 id1 75 F normal
GroupB_2 H7   GroupB   5.72E+09 R06C02 id2 58 F cancer

 

But GPL13534_HumanMethylation450_15017482_v.1.1.csv looks like a manifest file:

 

Illumina  Inc.                                                          
[Heading]                                                          
Descriptor File Name BS0010894-AQP_content.bpm                                                      
Assay Format Infinium 2                                                        
Date Manufactured ########                                                          
Loci Count  485553                                                          
[Assay]                                                            
IlmnID Name AddressA_ID AlleleA_ProbeSeq AddressB_ID AlleleB_ProbeSeq Infinium_Design_Type Next_Base Color_Channel Forward_Sequence Genome_Build CHR MAPINFO SourceSeq Chromosome_36 Coordinate_36 Strand Probe_SNPs Probe_SNPs_10 Random_Loci Methyl27_Loci UCSC_RefGene_Name UCSC_RefGene_Accession UCSC_RefGene_Group UCSC_CpG_Islands_Name Relation_to_UCSC_CpG_Island Phantom DMR Enhancer HMM_Island Regulatory_Feature_Name
cg00035864 cg00035864 31729416 AAAACACTAACAATCTTATCCACATAAACCCTTAAATTTATCTCAAATTC II     AATCCAAAGATGATGGAGGAGTGCCCGCTCATGATGTGAAGTACCTGCTCAGCTGGAAAC[CG]AATTTGAGATAAATTCAAGGGTCTATGTGGACAAGACTGCTAGTGTCTCTCTCTGGATTG 37 Y 8553009 AGACACTAGCAGTCTTGTCCACATAGACCCTTGAATTTATCTCAAATTCG Y 8613009 F         TTTY18 NR_001550 TSS1500              
cg00050873 cg00050873 32735311 ACAAAAAAACAACACACAACTATAATAATTTTTAAAATAAATAAACCCCA 31717405 ACGAAAAAACAACGCACAACTATAATAATTTTTAAAATAAATAAACCCCG I A Red TATCTCTGTCTGGCGAGGAGGCAACGCACAACTGTGGTGGTTTTTGGAGTGGGTGGACCC[CG]GCCAAGACGGCCTGGGCTGACCAGAGACGGGAGGCAGAAAAAGTGGGCAGGTGGTTGCAG 37 Y 9363356 CGGGGTCCACCCACTCCAAAAACCACCACAGTTGTGCGTTGCCTCCTCGC Y 9973356 R         TSPY4;FAM197Y2 NM_001164471;NR_001553 Body;TSS1500 chrY:9363680-9363943 N_Shore       Y:9973136-9976273
cg00061679 cg00061679 28780415 AAAACATTAAAAAACTAATTCACTACTATTTAATTACTTTATTTTCCATC II     TCAACAAATGAGAGACATTGAAGAACTAATTCACTACTATTTGGTTACTTTATTTTCCAT[CG]AAGAAAACCTCTTTTTAAAAACTAACACATAAATAAAATGAACGAAGAACAAACTAAACG 37 Y 25314171 CGATGGAAAATAAAGTAACCAAATAGTAGTGAATTAGTTCTTCAATGTCT Y 23723559 R         DAZ1;DAZ4;DAZ4 NM_004081;NM_020420;NM_001005375 Body;Body;Body            
cg00063477 cg00063477 16712347 TATTCTTCCACACAAAATACTAAACRTATATTTACAAAAATACTTCCATC II     CTCCTGTACTTGTTCATTAAATAATGATTCCTTGGATATACCAAGTCTGGATAGCGGATT[CG]ATGGAAGCATTTTTGTAAATATACGTTCAGTATTTTGTGTGGAAGAACACAATCTAGCTG 37 Y 22741795 CGATGGAAGCATTTTTGTAAATATACGTTCAGTATTTTGTGTGGAAGAAC Y 21151183 F rs9341313 rs13447379   EIF1AY NM_004681 Body chrY:22737825-22738052 S_Shelf          

 

None of the other files that came with GSE68777 look like a sample sheet.

If you search under "MethylationEPIC" on NCBI GEO (array that works with missMethyl package) you will see that the majority of datasets do not have a csv or txt files, and the three others I tried [GSE86829, GSE103502, GSE103505] although they had text files did not have sample sheets. So how does one get this information?

ADD COMMENTlink modified 12 weeks ago by James W. MacDonald48k • written 12 weeks ago by moldach10
1
gravatar for James W. MacDonald
12 weeks ago by
United States
James W. MacDonald48k wrote:

This question doesn't really have anything to do with minfi. There is a function in minfi that will use a file output by the Illumina software to read in your data, and it works, so long as you have that file. So there's no error!

The issue at hand is that people submit data to GEO, following the instructions from the curators, and for array-based methylation data, it appears that the curators don't require that the sample sheet be included. That's not an issue with any Bioconductor packages, so is really beyond the scope of this site.

As to your question, there are three obvious answers. You could either contact the person who submitted the data and ask them for the file (by definition their email is part of the submission). Or you could try to generate a minimal file that will work. Or you could just read the data in by hand.

ADD COMMENTlink written 12 weeks ago by James W. MacDonald48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 196 users visited in the last hour