Question

Minfi::read.metharray.sheet, problem with pattern defined in non-local csv file

0

Entering edit mode

Diogenesis • 0

@02399563

Last seen 14 months ago

Germany

I need to read in a number of idat files and quickly get their locations. The minfi function read.metharray.sheet does exactly what I need, but only under a very limiting condition.

For example, suppose I have a variable CsvFile=/path_1/csvFile.csv pointing to a csv file that contains columns named "Sentrix_ID", and "Sentrix_Position". Further suppose that all of my idats are _also_ saved in /path_1/various_subdirectoris/*.idat. Under these conditions, I can run this:

This_works_great  <- read.metharray.sheet( base    = dirname(  CsvFile ),   # <--- this is /path_1/
                                           pattern = basename( CsvFile ) )  # <--- only possible if located in /path_1/

This works, but only if the csv File happens to be stored in the same location as all of my idats (/path_1/ in the above example). This seems like a strange and very limiting assumption. I'd like to use a csv file from one location that defines patterns to be matched when searching in an entirely different location. Like the following:

This_does_not_work <- read.metharray.sheet( base    = "/path_1/to/be/searched/",
                                            pattern = "/path_2/somewhere_else/CsvFile.csv" )

Surely there's some way to use a csv file that's non-local to the search directory, but the documentation just says "see list.files?" (for which the documentation there does mention a default path = "."), but I don't see any obvious way to change that.

Does anyone know how to define the pattern in one csv file to define the search in a different location ?

PatternLogic read.metharray.sheet list.files minfi • 689 views

ADD COMMENT • link updated 14 months ago by James W. MacDonald 65k • written 14 months ago by Diogenesis • 0

score 0 · Answer 1 · 2023-02-22

The normal output from processing Illumina methylation arrays is a directory that contains the csv file, as well as the Idat files, and read.metharray.sheet is a simple wrapper meant to take advantage of that. There are infinite other ways that people can spread the files around their hard drive, and it is not possible to make a simple wrapper intended to accommodate those infinite permutations.

Ideally you want to have a targets data.frame that aligns your Idat files with the phenotypes, so if I were you, I would read in the csv file and add a Basename column that accurately identifies all the subdirectories that contain the relevant Idat files, and then use read.metharray.exp. That should be relatively trivial to do.