cant read celfiles using oligo package
1
0
Entering edit mode
scipio04 • 0
@a6b1f305
Last seen 2.2 years ago
France

Dear, i have error isuees trying to read celfiles

im sure that im in the right directory containing it i have unziped a zip file containing 386 .cel files

here 's the error message :

  • Is 5513234437813052023349_Pmarg70K_A01_FIE9191A1537_S_B16.1.CEL really a CEL file? tried reading as text, gzipped text, binary, gzipped binary, command console and gzipped command console formats
  • Error in read.celfile.header(x)

i dont understand the issue here , thanks for your help !

setwd()
library(oligo)
celfiles <- list.files (pattern = ".CEL")
data <- read.celfiles(celfiles)
oligo • 4.3k views
ADD COMMENT
0
Entering edit mode

Hi, it seems that you already received an answer here: cel files with affy library

What do you want to do with this Affymetrix SNP 6.0 data? My PhD was based on this array back in 2010-12, and I know that it has probes that target both SNP and CN (copy number) variants. In which are you interested? The Affymetrix SNP 6.0 is not for gene expression analysis.

Please see my and another previous answer on Biostars:

ADD REPLY
0
Entering edit mode

i want to get a matrix in order to do population genomic analysis after

it's about SNP

the answer i received was about the right package to use but once i corrected it , im still dealing with the same errors issues

ADD REPLY
0
Entering edit mode

No problem. Can you take a look at the second answer that I posted above? - the user was also using oligo but then moved to crlmm. It seems like there is a way to make genotype calls with this package, as per also this publication (see code toward end): Using the R Package crlmm for Genotyping and Copy Number Estimation

From my first answer that I posted, probably Birdsuite (https://www.broadinstitute.org/birdsuite) is the best option to use, for now, in order to derive genotypes.

ADD REPLY
0
Entering edit mode

ok thank you for your response , im gonna take a look on crlmm package and birdsuite pipeline

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

The error message indicates that one or more of the celfiles are problematic. These days most celfiles are binary, and if you have one or more that are corrupted (could be just from unzipping), the oligo package (well, actually the affxparser package) won't be able to read it. The problem here is that you don't know if it's just one of your files or all of them. So you could randomly do something like

library(affxparser)
read.celfile.header(celfiles[1])

To check them one by one. But that's super boring, and maybe it's just one or two, in which case doing that for 386 files is super duper boring. You can instead use Martin Maechler's tryCatch.W.E to catch errors without actually erroring out, to iterate through your celfiles and see which one(s) are problematic.

tryCatch.W.E <- function(expr)
{
    W <- NULL
    w.handler <- function(w){ # warning handler
    W <<- w
    invokeRestart("muffleWarning")
    }
    list(value = withCallingHandlers(tryCatch(expr, error = function(e) e),
                     warning = w.handler),
     warning = W)
}

z <- lapply(celfiles, function(x) tryCatch.W.E(read.cel.header(x)))

celfiles[sapply(z, function(x) !is.null(x$warning))]

Which will provide a list of the borked celfiles.

ADD COMMENT
0
Entering edit mode

thank you for your comment , indeed when i try " read.celfile " for random files , it works but when i try it for all , it does'nt

im gonna try the tryCatch.W.E solution

ADD REPLY
0
Entering edit mode

tryCatch.W.E return character(0) , seems that nothing is wrong

but it's weird like sometimes i can read one cel file alone well and store it into a vector

but when i try to use some other functions of oligo or clrrmm package , it display the same error

ADD REPLY
0
Entering edit mode

Huh. Weird. You wouldn't think something like this would be random - either you can or you cannot read in a file. Where did you get the files?

ADD REPLY
0
Entering edit mode

the files come from https://mydata.ramaciotti.unsw.edu.au/s/96s5HDbb2z83Zn8

ramaciotti center for genomics in sydney

read.celfiles function can work for some files

but when i some function from oligo/crlmn R packages , the both errors wrote above come again

look like something's wrong with files header , but as i dont know anything about cel files and binaries ones i can't get it

ADD REPLY
0
Entering edit mode

I encountered one of the problematic ones, James, but affyio could read it. Seems to be Axiom

obj <- affyio::read.celfile('5513234437813052023349_Pmarg70K_A01_FIE9191A1537_S_B16.1.CEL')
str(obj)
List of 6
 $ HEADER      :List of 9
  ..$ cdfName            : chr "Axiom_Pmarg70k"
  ..$ CEL dimensions     : int [1:2] 389 389
  ..$ GridCornerUL       : int [1:2] 0 0
  ..$ GridCornerUR       : int [1:2] 388 0
  ..$ GridCornerLR       : int [1:2] 388 388
  ..$ GridCornerLL       : int [1:2] 0 388
  ..$ DatHeader          : chr ""
  ..$ Algorithm          : chr "HT Image Calibration Cell Generation"
  ..$ AlgorithmParameters: chr "Percentile:75;CellMargin:4;OutlierHigh:1.500000;OutlierLow:1.004000;AlgVersion:;FixedCellSize:TRUE;FullFeatureW"| __truncated__


affy::ReadAffy('5513234437813052023349_Pmarg70K_A01_FIE9191A1537_S_B16.1.CEL')

Error in read.celfile.header(as.character(filenames[[1]])) : 
  Is 5513234437813052023349_Pmarg70K_A01_FIE9191A1537_S_B16.1.CEL really a CEL file? tried reading as text, gzipped text, binary, gzipped binary, command console and gzipped command console formats

oligo::read.celfiles('5513234437813052023349_Pmarg70K_A01_FIE9191A1537_S_B16.1.CEL')

Error in read.celfile.header(x) : 
  Is 5513234437813052023349_Pmarg70K_A01_FIE9191A1537_S_B16.1.CEL really a CEL file? tried reading as text, gzipped text, binary, gzipped binary, command console and gzipped command console formats
ADD REPLY
0
Entering edit mode

some functions works , and others don't , what a curious thing

i tried with other data celfiles and it's working

the problem obviously come from my data files

ADD REPLY
0
Entering edit mode

meanwhil im exploring the R packages with a cel data file that work

i can't find the ' mapping250knspCrlmm ' packages , either on bioconductor or in R CRAN

do you have any idea how to get it ? many functions do'esnt work without it

ADD REPLY
1
Entering edit mode

I downloaded all the files, and here's the results

> getwd()
[1] "E:/FIE9191_PMARG70K_2021_RESULTS/FIE9191_PMARG70K_2021_RESULTS"
> dirs <- dir()
> fls <- lapply(dirs, dir, full.names = TRUE)
> fls2 <- do.call(c, fls)
> huh <- lapply(fls2, function(x) tryCatch.W.E(read.celfile(x)$HEADER$cdfName))
## somehow this doesn't do all the files?
> huhhuh <- lapply(fls2[1159:1930], function(x) tryCatch.W.E(read.celfile(x)$HEADER$cdfName))
> huhall <- c(huh, huhhuh)
> badfls <- fls2[sapply(huhall, function(x) is(x$value, "simpleError"))]
> badfls
 [1] "FIE9191_Pmarg70K_3348_P13-16_RESULTS/FIE9191_Pmarg70K_3348_Plates13-16_BP_Workflow_QC_rpt.pdf"      
 [2] "FIE9191_Pmarg70K_3348_P13-16_RESULTS/FIE9191_Pmarg70K_3348_Plates13-16_BP_Workflow_QC_table_rpt.txt"
 [3] "FIE9191_Pmarg70k_3349_P17-20_RESULTS/FIE9191_Pmarg70K_3349_Plates17-16_0_Workflow_QC_table_rpt.txt" 
 [4] "FIE9191_Pmarg70k_3349_P17-20_RESULTS/FIE9191_Pmarg70K_3349_Plates17-20_BP_Workflow_QC_rpt.pdf"      
 [5] "FIE9191_Pmarg70k_3350_P1-4_RESULTS/FIE9191_Pmarg70K_3350_Plates1-4_BP_Workflow_QC_rpt.pdf"          
 [6] "FIE9191_Pmarg70k_3350_P1-4_RESULTS/FIE9191_Pmarg70K_3350_Plates1-4_BP_Workflow_QC_table_rpt.txt"    
 [7] "FIE9191_Pmarg70k_3351_P5-8_RESULTS/FIE9191_Pmarg70K_3351_Plates5-8_BP_Workflow_QC_rpt.pdf"          
 [8] "FIE9191_Pmarg70k_3351_P5-8_RESULTS/FIE9191_Pmarg70K_3351_Plates5-8_BP_Workflow_QC_table_rpt.txt"    
 [9] "FIE9191_Pmarg70k_3352_P9-12_RESULTS/FIE9191_Pmarg70K_3352_Plates9-12_BP_Workflow_QC_rpt.pdf"        
[10] "FIE9191_Pmarg70k_3352_P9-12_RESULTS/FIE9191_Pmarg70K_3352_Plates9-12_BP_Workflow_QC_table_rpt.txt" 
> table(sapply(huhall, function(x) if(!is(x$value, "simpleError")) return(x$value) else return(NA)))

Axiom_Pmarg70k 
          1920

Apparently you have 1920 Axiom Pyrus 70K SNP arrays, and 10 pdf or txt files. And for some reason read.celfile.header won't read them

> what <- lapply(fls2, function(x) tryCatch.W.E(read.celfile.header(x)))
> sum(sapply(what, function(x) is(x$value, "simpleError")))
[1] 1930

Without fixing that problem, you won't be able to use oligo or crlmm to analyze these data. In addition, the affxparser package can't read these thing at all, and that package is base on Affy's Calvin software, so if anything should be able to read them it's affxparser.

Long story short, there is like a 0% likelihood that this will be fixed in Bioconductor. The person who wrote affyio hasn't been involved for maybe 15 years now, and while the two main authors of affxparser are still around, I sort of doubt getting it to read some Axiom files is not near the top of their TODO list. I would recommend using the Affy software to get the genotype calls, and then you can use R for further analysis if you want.

ADD REPLY
0
Entering edit mode

thank you for your response

my purpose is to get a matrix from these files do you know how to do so ?

by affy software you mean thae affy package right

ADD REPLY
0
Entering edit mode

No, by Affy software I mean software provided by Affymetrix. I would imagine the Axiom Analysis Suite is what you want, but it's been years since I've used their software, and it's completely off-topic for this site, so I am afraid you are on your own for that. But perhaps you can get help from Fisher.

ADD REPLY
0
Entering edit mode

im trying axiom analysis suite and look easy to use for genotype calling

but the thing is i'm not sure if there is a liibrary available for my data type ( 70k arrays)

ADD REPLY

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6