Question

beadarray: readIllumina with sample sheet broken due to Illumina switching to 12 digit chip identifiers

0

Entering edit mode

S. K. • 0

@s-k-12879

Last seen 8.5 years ago

Germany, Heidelberg

Hi,

I am still trying to get beadarray's readIllumina() working with recently acquired data from HT-12 v4 beadchips and using a sample sheet. No matter what I was trying in that sample sheet, I always got:

Error in analyseDirectory(dir = x, sectionNames = as.character(dirs[[x]]),  : 
  Directory does not exist.

So I started to debug the readIllumina() function and noticed that it is truncating the chip's identifier, in turn looking for the wrong directories.

Googling I found this document: https://support.illumina.com/bulletins/2016/10/important-information-about-using-digit-beadchip-barcodes-in-genomestudio-sample-sheets.html

Apparently Illumina is using longer identifiers since last October which breaks beadarray's handling of samplesheets and in turn prevents reading data!

I can provide sample files for assistance in fixing this issue, please get in touch and I'll send them along.

Thank you very much & greetings from Heidelberg

Simon

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] beadarray_2.24.0    ggplot2_2.2.1       Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10         AnnotationDbi_1.36.2 XVector_0.14.1       magrittr_1.5        
 [5] GenomicRanges_1.26.4 zlibbioc_1.20.0      IRanges_2.8.2        munsell_0.4.3       
 [9] colorspace_1.3-2     stringr_1.2.0        GenomeInfoDb_1.10.3  plyr_1.8.4          
[13] tools_3.4.0          grid_3.4.0           base64_2.0           gtable_0.2.0        
[17] DBI_0.6-1            lazyeval_0.2.0       openssl_0.9.6        digest_0.6.12       
[21] tibble_1.3.0         reshape2_1.4.2       S4Vectors_0.12.2     bitops_1.0-6        
[25] RCurl_1.95-4.8       memoise_1.1.0        RSQLite_1.1-2        limma_3.30.13       
[29] stringi_1.1.5        BeadDataPackR_1.26.0 compiler_3.4.0       scales_0.4.1        
[33] stats4_3.4.0         illuminaio_0.16.0

beadarray bug illumina human ht-12 v4 illumina beadchip • 2.4k views

ADD COMMENT • link updated 8.7 years ago by Mark Dunning ★ 1.1k • written 8.7 years ago by S. K. • 0

score 0 · Answer 1 · 2017-04-26

I looked at readIllumina.R in more detail and it seems that the condition for the if statement in line 82 always evaluates to FALSE due to misplaced parentheses? Currently it's written as:

  if(all(nchar(allSections))>10){    

      chips <- unique(as.character(substr(allSections,1,nchar(allSections)-2)))
      
      dirs <- lapply(chips, function(x) sectionNames[which(substr(sectionNames,1,nchar(sectionNames)-2) == x)])
      
      }
      else {
        chips <- unique(as.character(strtrim(allSections, 10)))
        
        dirs <- lapply(chips, function(x) sectionNames[which(strtrim(sectionNames, 10) == x)])
        
      }

but I guess that should rather be if(all(nchar(allSections)>10)) !

I will test this change tomorrow and will report back. :-)

score 0 · Answer 2 · 2017-05-10

0

Entering edit mode

Mark Dunning ★ 1.1k

@mark-dunning-3319

Last seen 10 months ago

Sheffield, Uk

Hi Simon,

So sorry for taking so long to get around to this. Did fixing the misplaced parentheses work for you?

I'm afraid I don't have access to data where the chip IDs are 12 digits. If you could share some with me I would be able to test the code.

Regards,

Mark

ADD COMMENT • link 8.7 years ago Mark Dunning ★ 1.1k

0

Entering edit mode

Hi Mark,

thanks for your reply!

I managed to get it working - at least as long as I don't specify useImages in readIllumina().

I had to change summarize() as well, the attached diffs work for my dataset (see below), though I can't say anything about backward compatibility with older, shorter chip IDs.

One thing that is still not working is using the TIFFs directly.

Diff for summarize:

*** projects/debug_beadarray/beadarray/R/beadLevelData_summarize.R    2017-04-24 22:36:22.000000000 +0200
--- projects/debug_beadarray/SimonsCustomBeadarray/R/beadLevelData_summarize.R    2017-05-02 11:29:45.702057748 +0200
***************
*** 55,63 ****
  
              ##newNames = paste(unique(strtrim(arraynms, 10)), sList, sep="_")
              if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],12)
                          else
!                             newNames = strtrim(arraynms,12)
          
          }
      }
--- 55,63 ----
  
              ##newNames = paste(unique(strtrim(arraynms, 10)), sList, sep="_")
              if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],14)
                          else
!                             newNames = strtrim(arraynms,14)
  
          }
      }
***************
*** 77,85 ****
              dupList = which(duplicated(sampleFac))
  
                          if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],12)
                          else
!                             newNames = strtrim(arraynms,12)
          }
      }
  
--- 77,85 ----
              dupList = which(duplicated(sampleFac))
  
                          if(any(dupList))
!                             newNames = strtrim(arraynms[-dupList],14)
                          else
!                             newNames = strtrim(arraynms,14)
          }
      }
  
***************
*** 441,447 ****
  
          
  
!     sampInfo <- sampInfo[sapply(newNames, function(x) grep(strtrim(x, 12), expIDs)),]    
  
      p = new("AnnotatedDataFrame", data.frame(sampInfo,row.names=newNames))
  
--- 441,447 ----
  
  
  
!     sampInfo <- sampInfo[sapply(newNames, function(x) grep(strtrim(x, 14), expIDs)),]
  
      p = new("AnnotatedDataFrame", data.frame(sampInfo,row.names=newNames))

Diff for readIllumina:

*** projects/debug_beadarray/beadarray/R/readIllumina.R    2017-04-24 22:36:22.000000000 +0200
--- projects/debug_beadarray/SimonsCustomBeadarray/R/readIllumina.R    2017-04-28 11:44:40.276905882 +0200
***************
*** 79,85 ****
        ##Some section names were specified. Make sure that they can be read from directories in the sample sheet. Also, there may be some directories with no Illumina data in
    #    else{
  
!       if(all(nchar(allSections))>10){    
  
        chips <- unique(as.character(substr(allSections,1,nchar(allSections)-2)))
        
--- 79,85 ----
        ##Some section names were specified. Make sure that they can be read from directories in the sample sheet. Also, there may be some directories with no Illumina data in
    #    else{
  
!       if(all(nchar(allSections)>10)){
  
        chips <- unique(as.character(substr(allSections,1,nchar(allSections)-2)))

ADD REPLY • link 8.7 years ago S. K. • 0

0

Entering edit mode

Additionally, this is the structure of the zipped folders I receive from our core facility:

2017-03-16 18:43:25 ....A         1340   1024119382  200647550006/200647550006_A_Green.xml
2017-03-16 18:43:14 ....A         1316               200647550006/200647550006_A_Red.xml

2017-03-16 18:43:52 ....A       889664               200647550006/200647550006_A_beadTypeFile.txt
2017-03-16 18:43:55 ....A     32722761               200647550006/200647550006_A_perBeadFile.txt

           
2017-03-16 18:43:14 ....A         9651               200647550006/200647550006_qc.txt
2017-03-16 18:43:55 ....A         1066               200647550006/Metrics.txt
2017-03-16 18:43:19 ....A        77444               200647550006/Effective.cfg
2017-03-16 18:43:47 ....A      2579177               200647550006/200647550006_A_Grn.idat

             
2017-03-16 18:43:24 ....A      5609820               200647550006/200647550006_A-Swath1_Grn.locs
2017-03-16 18:43:18 ....A      5609820               200647550006/200647550006_A-Swath2_Grn.locs

2017-03-16 18:43:24 ....A        68254               200647550006/200647550006.sdf
2017-03-16 18:43:21 ....A    124780676               200647550006/200647550006_A-Swath1_Grn.tif
2017-03-16 18:43:28 ....A    124780676               200647550006/200647550006_A-Swath2_Grn.tif
2017-03-16 18:43:14 ....A      6922372               200647550006/200647550006_A_Focus_scan#1_swath#1_point#1_try#1.tif
2017-03-16 18:43:46 ....A      6922372               
2017-03-16 18:43:59 D....            0            0  200647550006

(This is just for one chip due to space constraints here in the commenting system, chips range from A to L, of course)

I think it's interesting that there are XMLs for a red channel which is AFAIR not present on Illumina's HT12 chips. I had to remove them from the directory, otherwise beadarray gets confused and tries to read 2-channel data.

ADD REPLY • link 8.7 years ago S. K. • 0