Minfi error in reading Basename column
4
0
Entering edit mode
parap • 0
@parap-8717
Last seen 9.1 years ago
United States

Hello all, 

I am new to  Methylation data analysis and having problem in importing data in minfi package- 

After  targets <-read.450k.sheet(baseDir, pattern = "csv$"),  it reads the data sample sheet which has basic Sample_name, Slide etc, I get error in Basename column name ( which is not there in my sample sheet). 

The sample sheets of my other datasets did not have any specific "Basename" column and run fine, but while importing this data it reads Basename column with missing path - 

                                                Basename  Array       Slide
1 E:/10003885002/10003885002_R01C01 R01C01 10003885002
2                                            character(0) R02C01 10003885002

Due to these character(0) the RgSet also gives error : 

The following specified files do not exist:character(0)_Grn.idat

Can anyone please tell why I am getting character(0) under Basename?

can I manually add a Basename column with all paths in my sample sheet?

Do I need to add a Basename column in sample sheet, with path information for each sample? I am not able to find any information anywhere. 

please help!

 

methylation minfi • 10.0k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 39 minutes ago
United States

When you post a question, please use the 'Question' type, rather than 'Tutorial'. As the name might suggest, a Tutorial post is intended to provide a tutorial, rather than ask a question.

The Basename column is generated programmatically, by looking at information in your SampleSheet.csv and then inferring the file name for the corresponding Grn.idat file. In your case, the expectation is that there will be a file

E:/10003885002/10003885002_R02C01_Grn.idat

and when it isn't found, you get a character(0) returned. So you are missing at least one idat file, so you need to figure out why you are missing raw data files.

ADD COMMENT
1
Entering edit mode

Just to add my 2-bit of info in case someone come across this error.  I kept getting similar error and when I looked I realized that the filenames were incorrectly rendered because originally I used excel to make my sample sheet.  Excel treats the barcode like numbers and thus automatically sets it to scientific however this is a barcode and not number, so make make sure to change it to number with no decimal! works perfectly now after I saved it to csv.  

ADD REPLY
0
Entering edit mode

Thanks for the prompt response James! 

Sure, I will chose right category while posting next time, thanks for correcting. 

Thanks for the pointing the error, yes I checked the data and seems I received incomplete dataset, so wrong Basename column was generated. It removed few samples and it runs fine now. 

 

 

ADD REPLY
0
Entering edit mode

Hi James,

I am also encountering the same problem but all the IDAT files are available and for some reason it is not recognizing the pair IDAT file. Any thoughts?

thanks!

Cristina

ADD REPLY
0
Entering edit mode
@ankitachatterjee88-8592
Last seen 9.1 years ago
United States

Thanks James for your reply...I am also stuck at this point. I ran 4 chips in 2sets...when I am trying to read the chip data for the first set...everything is going fine...but when I tried to read data from all the four chips...the basename is showing "Character(0)".

As you mentioned I individually checked for all the idat files and not a single one was missing. At this point please help me how should I proceed? Shall I prepare a target file in .txt format and read it in R?

ADD COMMENT
0
Entering edit mode

Hi ankita - Running into the same error, and pretty sure I have all IDAT files. Where you able to solve this by bypassing excel?

 

thanks! 

ADD REPLY
0
Entering edit mode

I resolved this issue by rechecking that the idat files match with the sample sheet as the basename column is populated automatically based on the idat files and sample sheet. I would suggest re-creating another sample sheet and test this by reading in smaller sample set. That worked for me eventually. 

ADD REPLY
0
Entering edit mode
@shicheng-guo-7973
Last seen 3.4 years ago
United States

I know the reason eventually, please see the following script which was used by ChAMP or some other package to read and creat SampleSheet.csv. 

 read.metharray.sheet()

function (base, pattern = "csv$", ignore.case = TRUE, recursive = TRUE,
    verbose = TRUE)
{
    readSheet <- function(file) {
        dataheader <- grep("^\\[DATA\\]", readLines(file), ignore.case = TRUE)
        if (length(dataheader) == 0)
            dataheader <- 0
        df <- read.csv(file, stringsAsFactor = FALSE, skip = dataheader)
        if (length(nam <- grep("Sentrix_Position", names(df),
            ignore.case = TRUE, value = TRUE)) == 1) {
            df$Array <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (length(nam <- grep("Array[\\._]ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Array <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (!"Array" %in% names(df))
            warning(sprintf("Could not infer array name for file: %s",
                file))
        if (length(nam <- grep("Sentrix_ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Slide <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (length(nam <- grep("Slide[\\._]ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Slide <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        if (!"Slide" %in% names(df))
            warning(sprintf("Could not infer slide name for file: %s",
                file))
        else df[, "Slide"] <- as.character(df[, "Slide"])
        if (length(nam <- grep("Plate[\\._]ID", names(df), ignore.case = TRUE,
            value = TRUE)) == 1) {
            df$Plate <- as.character(df[, nam])
            df[, nam] <- NULL
        }
        for (nam in c("Pool_ID", "Sample_Plate", "Sample_Well")) {
            if (nam %in% names(df)) {
                df[[nam]] <- as.character(df[[nam]])
            }
        }
        if (!is.null(df$Array)) {
            patterns <- sprintf("%s_%s_Grn.idat", df$Slide, df$Array)
            allfiles <- list.files(dirname(file), recursive = recursive,
                full.names = TRUE)
            basenames <- sapply(patterns, function(xx) grep(xx,
                allfiles, value = TRUE))
            names(basenames) <- NULL
            basenames <- sub("_Grn\\.idat", "", basenames, ignore.case = TRUE)
            df$Basename <- basenames
        }
        df
    }
    if (!all(file.exists(base)))
        stop("'base' does not exists")
    info <- file.info(base)
    if (!all(info$isdir) && !all(!info$isdir))
        stop("'base needs to be either directories or files")
    if (all(info$isdir)) {
        csvfiles <- list.files(base, recursive = recursive, pattern = pattern,
            ignore.case = ignore.case, full.names = TRUE)
        if (verbose) {
            message("[read.metharray.sheet] Found the following CSV files:\n")
            print(csvfiles)
        }
    }
    else csvfiles <- list.files(base, full.names = TRUE)
    dfs <- lapply(csvfiles, readSheet)
    namesUnion <- Reduce(union, lapply(dfs, names))
    df <- do.call(rbind, lapply(dfs, function(df) {
        newnames <- setdiff(namesUnion, names(df))
        newdf <- matrix(NA, ncol = length(newnames), nrow = nrow(df),
            dimnames = list(NULL, newnames))
        cbind(df, as.data.frame(newdf))
    }))
    df
}

 

ADD COMMENT
0
Entering edit mode
@splittinginfinity-11669
Last seen 3.6 years ago
Canada

read.450k() throws the character(0)_Grn.idat error because it couldn't find the file specified in the spreadsheet.

One of the most common reason is due to sample sheet format. Look for trailing space or illegal characters in your csv file.

 

ADD COMMENT
0
Entering edit mode

I am not sure how people solved the character(0) problem for the basename, but I am stuck on it for sometime now. I checked both the csv file and saw if all the IDAT's were present, I think there is no problem with these two. Could you all please guide me to fix this? Snippet of my code: 

      

library(minfi)

baseDir <-"/home/idats"

targets=read.metharray.sheet(baseDir)

print(targets)

Output with last few columns:

 

   sex status        Array  Slide     Basename

1    M   Normal 7420085 R06C02 character(0)

2    M   Cancer 7420085 R06C02 character(0)

3    M    Normal 7420117 R06C02 character(0)

4    M    Cancer 7420117 R02C02 character(0)

 

 

 

 

 

ADD REPLY
0
Entering edit mode

I moved the CSV file into the same folder as the IDAT files and made that my working directory and badabing!  It worked.  So just move the CSV file.

Also, I made my CSV file in excel and saved as a CSV file.  If you open in a text editor you can hit return after the last item on the last row in the last column to add a carriage return and fix the "end of line" error.

 

ADD REPLY

Login before adding your answer.

Traffic: 775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6