Unable to read in the IDAT files using read.metharray.exp()
1
0
Entering edit mode
shweta • 0
@shweta-24797
Last seen 3.7 years ago

Hello,

I would like to perform methylation data analysis and have data from both 450K and EPIC. I created a csv file containing the sentrix ID, sentrix position and basename which contains path to the IDAT files which are all in the same folder (including the csv).

library(methylationArrayAnalysis)
library(knitr)
library(limma)
library(minfi)
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(IlluminaHumanMethylation450kmanifest)
library(RColorBrewer)
library(missMethyl)
library(minfiData)
library(Gviz)
library(DMRcate)
library(stringr)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
library(conumee)
dataDirectory <- "C:/Users/35389/Desktop/Medullos/All_combined"
target <- read.metharray.sheet(dataDirectory, pattern = "sample_sheet_2.csv")
target
target$Basename
g_files <- paste0(target$Basename, "_Grn.idat")
all(file.exists(g_files))

output of the above code

# read in the sample sheet for the experiment
rgset <- read.metharray.exp(targets = target, recursive = TRUE, verbose = TRUE, extended = TRUE)

However, when I read in the files using read.metharray.exp() I get the following error

Timing stopped at: 0.14 0.05 0.22 Error in readIDAT(xx) : Cannot read IDAT file. File format error. Unknown magic:

Any help will be greatly appreciated! Thanks in advance :)

MethylationArrayData methy Bioconductor minfi • 3.3k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The error you see is because one or more of your files is problematic. The first thing readIDAT does is to check to see if the files are IDAT files, and you have at least one that seems not to be. You can figure this out for yourself; here's an example using some data I have in hand.

 targets <- read.metharray.sheet("../data/image_data/")
grn <- paste0(targets$Basename, "_Grn.idat")
red <- paste0(targets$Basename, "_Red.idat")
> testGrn <- sapply(grn, readChar, nchars = 4)
> testRed <- sapply(red, readChar, nchars = 4)
> testGrn[testGrn != "IDAT"]
named character(0)
> testGrn[testRed != "IDAT"]
named character(0)
> head(testGrn)
../data/image_data/203219730010/203219730010_R01C01_Grn.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R02C01_Grn.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R03C01_Grn.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R04C01_Grn.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R05C01_Grn.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R06C01_Grn.idat 
                                                      "IDAT" 
> head(testRed)
../data/image_data/203219730010/203219730010_R01C01_Red.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R02C01_Red.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R03C01_Red.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R04C01_Red.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R05C01_Red.idat 
                                                      "IDAT" 
../data/image_data/203219730010/203219730010_R06C01_Red.idat 
                                                      "IDAT"

Presumably your data will return one or more problematic files which you can then either exclude or figure out what the problem is.

ADD COMMENT
0
Entering edit mode

Thank you so much, this helped spot the incorrect files. I read in my EPIC files and 450 K files separately, and they were also of different sizes so I had to force them to be read

# read in the sample sheet for the experiment
rgset <- read.metharray.exp(targets = target_EPIC, recursive = TRUE, verbose = TRUE, extended = TRUE, force = TRUE)
head(rgset)

Output

As I understood it will merge on the basis of probes in the smallest files. But this is very very few probes quantified. I wonder what could be the reason for this and if you have any suggestions to get around it. Thanks a lot again! :)

ADD REPLY
0
Entering edit mode

Please use the ADD COMMENT button rather than the ADD ANSWER, unless of course if you are actually answering your own question.

You should read the data in separately and then use combineArrays. I have never done that sort of thing, so it's up to you to figure out if you should completely process the data to a GenomicRatioSet and then combine, or combine first and then process.

ADD REPLY

Login before adding your answer.

Traffic: 779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6