Search
Question: beadarray: readIllumina with sample sheet broken due to Illumina switching to 12 digit chip identifiers
0
gravatar for S. K.
6 months ago by
S. K.0
Germany, Heidelberg
S. K.0 wrote:

Hi,

I am still trying to get beadarray's readIllumina() working with recently acquired data from HT-12 v4 beadchips and using a sample sheet. No matter what I was trying in that sample sheet, I always got:

Error in analyseDirectory(dir = x, sectionNames = as.character(dirs[[x]]),  : 
  Directory does not exist.

So I started to debug the readIllumina() function and noticed that it is truncating the chip's identifier, in turn looking for the wrong directories.

Googling I found this document: https://support.illumina.com/bulletins/2016/10/important-information-about-using-digit-beadchip-barcodes-in-genomestudio-sample-sheets.html

Apparently Illumina is using longer identifiers since last October which breaks beadarray's handling of samplesheets and in turn prevents reading data!

I can provide sample files for assistance in fixing this issue, please get in touch and I'll send them along.

Thank you very much & greetings from Heidelberg

Simon

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] beadarray_2.24.0    ggplot2_2.2.1       Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10         AnnotationDbi_1.36.2 XVector_0.14.1       magrittr_1.5        
 [5] GenomicRanges_1.26.4 zlibbioc_1.20.0      IRanges_2.8.2        munsell_0.4.3       
 [9] colorspace_1.3-2     stringr_1.2.0        GenomeInfoDb_1.10.3  plyr_1.8.4          
[13] tools_3.4.0          grid_3.4.0           base64_2.0           gtable_0.2.0        
[17] DBI_0.6-1            lazyeval_0.2.0       openssl_0.9.6        digest_0.6.12       
[21] tibble_1.3.0         reshape2_1.4.2       S4Vectors_0.12.2     bitops_1.0-6        
[25] RCurl_1.95-4.8       memoise_1.1.0        RSQLite_1.1-2        limma_3.30.13       
[29] stringi_1.1.5        BeadDataPackR_1.26.0 compiler_3.4.0       scales_0.4.1        
[33] stats4_3.4.0         illuminaio_0.16.0 
ADD COMMENTlink modified 6 months ago by Mark Dunning1.0k • written 6 months ago by S. K.0
0
gravatar for S. K.
6 months ago by
S. K.0
Germany, Heidelberg
S. K.0 wrote:

I looked at readIllumina.R in more detail and it seems that the condition for the if statement in line 82 always evaluates to FALSE due to misplaced parentheses? Currently it's written as:

  if(all(nchar(allSections))>10){    

      chips <- unique(as.character(substr(allSections,1,nchar(allSections)-2)))
      
      dirs <- lapply(chips, function(x) sectionNames[which(substr(sectionNames,1,nchar(sectionNames)-2) == x)])
      
      }
      else {
        chips <- unique(as.character(strtrim(allSections, 10)))
        
        dirs <- lapply(chips, function(x) sectionNames[which(strtrim(sectionNames, 10) == x)])
        
      }

but I guess that should rather be if(all(nchar(allSections)>10)) !

I will test this change tomorrow and will report back. :-)

 

ADD COMMENTlink written 6 months ago by S. K.0
0
gravatar for Mark Dunning
6 months ago by
Mark Dunning1.0k
Cancer Research Uk Cambridge Institute, Cambridge (until October 2017)
Mark Dunning1.0k wrote:

Hi Simon,

So sorry for taking so long to get around to this. Did fixing the misplaced parentheses work for you?

I'm afraid I don't have access to data where the chip IDs are 12 digits. If you could share some with me I would be able to test the code.

Regards,

Mark

ADD COMMENTlink written 6 months ago by Mark Dunning1.0k

Hi Mark,

thanks for your reply!

I managed to get it working - at least as long as I don't specify useImages in readIllumina(). 

I had to change summarize() as well, the attached diffs work for my dataset (see below), though I can't say anything about backward compatibility with older, shorter chip IDs.

One thing that is still not working is using the TIFFs directly.

 

Diff for summarize:

*** projects/debug_beadarray/beadarray/R/beadLevelData_summarize.R    2017-04-24 22:36:22.000000000 +0200
--- projects/debug_beadarray/SimonsCustomBeadarray/R/beadLevelData_summarize.R    2017-05-02 11:29:45.702057748 +0200
***************
*** 55,63 ****
  
              ##newNames = paste(unique(strtrim(arraynms, 10)), sList, sep="_")
              if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],12)
                          else
!                             newNames = strtrim(arraynms,12)
          
          }
      }
--- 55,63 ----
  
              ##newNames = paste(unique(strtrim(arraynms, 10)), sList, sep="_")
              if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],14)
                          else
!                             newNames = strtrim(arraynms,14)
  
          }
      }
***************
*** 77,85 ****
              dupList = which(duplicated(sampleFac))
  
                          if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],12)
                          else
!                             newNames = strtrim(arraynms,12)
          }
      }
  
--- 77,85 ----
              dupList = which(duplicated(sampleFac))
  
                          if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],14)
                          else
!                             newNames = strtrim(arraynms,14)
          }
      }
  
***************
*** 441,447 ****
  
          
  
!     sampInfo <- sampInfo[sapply(newNames, function(x) grep(strtrim(x, 12), expIDs)),]    
  
      p = new("AnnotatedDataFrame", data.frame(sampInfo,row.names=newNames))
  
--- 441,447 ----
  
  
  
!     sampInfo <- sampInfo[sapply(newNames, function(x) grep(strtrim(x, 14), expIDs)),]
  
      p = new("AnnotatedDataFrame", data.frame(sampInfo,row.names=newNames))

 

Diff for readIllumina:

*** projects/debug_beadarray/beadarray/R/readIllumina.R    2017-04-24 22:36:22.000000000 +0200
--- projects/debug_beadarray/SimonsCustomBeadarray/R/readIllumina.R    2017-04-28 11:44:40.276905882 +0200
***************
*** 79,85 ****
        ##Some section names were specified. Make sure that they can be read from directories in the sample sheet. Also, there may be some directories with no Illumina data in
    #    else{
  
!       if(all(nchar(allSections))>10){    
  
        chips <- unique(as.character(substr(allSections,1,nchar(allSections)-2)))
        
--- 79,85 ----
        ##Some section names were specified. Make sure that they can be read from directories in the sample sheet. Also, there may be some directories with no Illumina data in
    #    else{
  
!       if(all(nchar(allSections)>10)){
  
        chips <- unique(as.character(substr(allSections,1,nchar(allSections)-2)))
  

 

 

ADD REPLYlink modified 6 months ago • written 6 months ago by S. K.0

Additionally, this is the structure of the zipped folders I receive from our core facility:

2017-03-16 18:43:25 ....A         1340   1024119382  200647550006/200647550006_A_Green.xml
2017-03-16 18:43:14 ....A         1316               200647550006/200647550006_A_Red.xml

2017-03-16 18:43:52 ....A       889664               200647550006/200647550006_A_beadTypeFile.txt
2017-03-16 18:43:55 ....A     32722761               200647550006/200647550006_A_perBeadFile.txt

           
2017-03-16 18:43:14 ....A         9651               200647550006/200647550006_qc.txt
2017-03-16 18:43:55 ....A         1066               200647550006/Metrics.txt
2017-03-16 18:43:19 ....A        77444               200647550006/Effective.cfg
2017-03-16 18:43:47 ....A      2579177               200647550006/200647550006_A_Grn.idat

             
2017-03-16 18:43:24 ....A      5609820               200647550006/200647550006_A-Swath1_Grn.locs
2017-03-16 18:43:18 ....A      5609820               200647550006/200647550006_A-Swath2_Grn.locs

2017-03-16 18:43:24 ....A        68254               200647550006/200647550006.sdf
2017-03-16 18:43:21 ....A    124780676               200647550006/200647550006_A-Swath1_Grn.tif
2017-03-16 18:43:28 ....A    124780676               200647550006/200647550006_A-Swath2_Grn.tif
2017-03-16 18:43:14 ....A      6922372               200647550006/200647550006_A_Focus_scan#1_swath#1_point#1_try#1.tif
2017-03-16 18:43:46 ....A      6922372               
2017-03-16 18:43:59 D....            0            0  200647550006

 

(This is just for one chip due to space constraints here in the commenting system, chips range from A to L, of course)

I think it's interesting that there are XMLs for a red channel which is AFAIR not present on Illumina's HT12 chips. I had to remove them from the directory, otherwise beadarray gets confused and tries to read 2-channel data.

 

ADD REPLYlink written 6 months ago by S. K.0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 113 users visited in the last hour