Question

Cydar normalizeBatch specifying batches

1

Entering edit mode

Florent ▴ 10

@florent-20523

Last seen 5.9 years ago

BCRT, Berlin

Dear Users,

We have a CyTOF experiment containing 91 samples distributed in 10 batches, with one anchor for normalization in each of these batches.

However, I can't figure out how to make it work: I would like to tell the function which files is in which batches, and which ones are the anchors so that I can use the quantile method, which seems the most appropriate for my needs.

I'm using ncdfFlowSet to import my files in batch.x:

> batch.x
An ncdfFlowSet with 91 samples.
NCDF file : a.nc 
An object of class 'AnnotatedDataFrame'
  rowNames: EC01_Batch02.fcs EC02_Batch02.fcs ... YC23_Batch11.fcs (91 total)
  varLabels: name
  varMetadata: labelDescription

  column names:
    Time, Event_length, Y89Di, Pd102Di, Pd104Di, Pd105Di, Pd106Di, Pd108Di, Pd110Di, Ce140Di, Pr141Di, Nd142Di, Nd143Di, Nd144Di, Nd145Di, Nd146Di, Sm147Di, Nd148Di, Sm149Di, Nd150Di, Eu151Di, Sm152Di, Eu153Di, Sm154Di, Gd155Di, Gd156Di, Gd158Di, Tb159Di, Gd160Di, Dy161Di, Dy162Di, Dy163Di, Dy164Di, Ho165Di, Er166Di, Er167Di, Er168Di, Tm169Di, Er170Di, Yb171Di, Yb172Di, Yb173Di, Yb174Di, Lu175Di, Yb176Di, BCKG190Di, Ir191Di, Ir193Di, Pt195Di, Bi209Di, Center, Offset, Width, Residual

My batch.comp is a list of factors, where one factor contain the names of all files from the same batch, and one factor is used to regroup all the references samples

> batch.comp
$Batch02
[1] EC01_Batch02  EC02_Batch02  FHE01_Batch02 FHE02_Batch02 FHY01_Batch02 YC01_Batch02  YC02_Batch02 
7 Levels: EC01_Batch02 EC02_Batch02 FHE01_Batch02 FHE02_Batch02 FHY01_Batch02 ... YC02_Batch02

$Batch03
[1] EC03_Batch03  EC04_Batch03  FHE03_Batch03 FHY02_Batch03 FHY03_Batch03 YC03_Batch03 
Levels: EC03_Batch03 EC04_Batch03 FHE03_Batch03 FHY02_Batch03 FHY03_Batch03 YC03_Batch03

$Batch04
[1] EC05_Batch04 EC06_Batch04 YC05_Batch04 YC06_Batch04
Levels: EC05_Batch04 EC06_Batch04 YC05_Batch04 YC06_Batch04

$Batch05
 [1] EC07_Batch05  EC08_Batch05  EC09_Batch05  FHE04_Batch05 FHE08_Batch05 FHE09_Batch05 FHE10_Batch05 FHY07_Batch05 YC07_Batch05  YC08_Batch05 
[11] YC09_Batch05 
11 Levels: EC07_Batch05 EC08_Batch05 EC09_Batch05 FHE04_Batch05 FHE08_Batch05 ... YC09_Batch05

$Batch06
 [1] EC10_Batch06  EC11_Batch06  EC12_Batch06  EC13_Batch06  FHE07_Batch06 FHE11_Batch06 FHY08_Batch06 FHY09_Batch06 FHY10_Batch06 YC04_Batch06 
[11] YC10_Batch06 
11 Levels: EC10_Batch06 EC11_Batch06 EC12_Batch06 EC13_Batch06 FHE07_Batch06 ... YC10_Batch06

$Batch07
 [1] EC14_Batch07  EC15_Batch07  EC16_Batch07  FHE12_Batch07 FHE14_Batch07 FHE15_Batch07 FHY12_Batch07 FHY13_Batch07 FHY14_Batch07 FHY15_Batch07
[11] YC11_Batch07  YC12_Batch07 
12 Levels: EC14_Batch07 EC15_Batch07 EC16_Batch07 FHE12_Batch07 FHE14_Batch07 ... YC12_Batch07

$Batch08
 [1] EC17_Batch08  EC18_Batch08  EC19_Batch08  FHE16_Batch08 FHE20_Batch08 FHY16_Batch08 FHY17_Batch08 YC13_Batch08  YC14_Batch08  YC15_Batch08 
10 Levels: EC17_Batch08 EC18_Batch08 EC19_Batch08 FHE16_Batch08 FHE20_Batch08 ... YC15_Batch08

$Batch09
 [1] EC20_Batch09  EC21_Batch09  EC22_Batch09  FHE18_Batch09 FHE21_Batch09 FHE23_Batch09 FHY18_Batch09 FHY19_Batch09 YC16_Batch09  YC17_Batch09 
[11] YC18_Batch09 
11 Levels: EC20_Batch09 EC21_Batch09 EC22_Batch09 FHE18_Batch09 FHE21_Batch09 ... YC18_Batch09

$Batch11
 [1] EC25_Batch11  EC26_Batch11  EC27_Batch11  FHE13_Batch11 FHE24_Batch11 FHE25_Batch11 FHE26_Batch11 FHY22_Batch11 YC22_Batch11  YC23_Batch11 
10 Levels: EC25_Batch11 EC26_Batch11 EC27_Batch11 FHE13_Batch11 FHE24_Batch11 ... YC23_Batch11

$References
[1] Reference_Batch02 Reference_Batch03 Reference_Batch04 Reference_Batch05 Reference_Batch06 Reference_Batch07 Reference_Batch08
[8] Reference_Batch09 Reference_Batch11
9 Levels: Reference_Batch02 Reference_Batch03 Reference_Batch04 Reference_Batch05 ... Reference_Batch11

However, the normalizedBatch function is retourning me a length error.

> normalizeBatch(batch.x, batch.comp, mode="quantile", p=0.05, 
+     target=batch.comp$References, markers=CD14)
Error in normalizeBatch(batch.x, batch.comp, mode = "quantile", p = 0.05,  : 
  length of 'batch.x' and 'batch.comp' must be identical

The length of batch.x is 91 and the length of batch.comp is 10 (the number of batchs plus one for the references).

I read the documentation but I can't figure out how the function is working. I don't have any background in informatics, so I guess that I'm just missing the obvious. Any help would be appreciated!

Best, Florent

> sessionInfo() 
R version 3.5.2 Patched (2019-01-10 r75982)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CytoDx_1.2.1                cydar_1.6.1                 SingleCellExperiment_1.4.1  SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [6] matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.2         IRanges_2.16.0             
[11] S4Vectors_0.20.1            BiocGenerics_0.28.0         BiocParallel_1.16.6         tidyr_0.8.3                 ggplot2_3.1.0              
[16] FlowSOM_1.14.1              igraph_1.2.4                ncdfFlow_2.28.1             BH_1.69.0-1                 RcppArmadillo_0.9.300.2.0  
[21] flowCore_1.48.1            

loaded via a namespace (and not attached):
 [1] viridis_0.5.1               foreach_1.4.4               viridisLite_0.3.0           ConsensusClusterPlus_1.46.0 shiny_1.2.0                
 [6] assertthat_0.2.1            latticeExtra_0.6-28         GenomeInfoDbData_1.2.0      yaml_2.2.0                  robustbase_0.93-4          
[11] pillar_1.3.1                lattice_0.20-38             glue_1.3.1                  digest_0.6.18               RColorBrewer_1.1-2         
[16] promises_1.0.1              XVector_0.22.0              colorspace_1.4-1            htmltools_0.3.6             httpuv_1.5.0               
[21] Matrix_1.2-17               plyr_1.8.4                  pcaPP_1.9-73                XML_3.98-1.19               pkgconfig_2.0.2            
[26] tsne_0.1-3                  zlibbioc_1.28.0             purrr_0.3.2                 xtable_1.8-3                corpcor_1.6.9              
[31] mvtnorm_1.0-10              scales_1.0.0                later_0.8.0                 tibble_2.1.1                withr_2.1.2                
[36] flowViz_1.46.1              hexbin_1.27.2               lazyeval_0.2.2              magrittr_1.5                crayon_1.3.4               
[41] IDPmisc_1.1.19              mime_0.6                    doParallel_1.0.14           MASS_7.3-51.3               graph_1.60.0               
[46] tools_3.5.2                 rpart.plot_3.0.6            glmnet_2.0-16               munsell_0.5.0               cluster_2.0.7-1            
[51] compiler_3.5.2              rlang_0.3.3                 grid_3.5.2                  RCurl_1.95-4.12             iterators_1.0.10           
[56] BiocNeighbors_1.0.0         bitops_1.0-6                codetools_0.2-16            gtable_0.3.0                rrcov_1.4-7                
[61] R6_2.4.0                    gridExtra_2.3               knitr_1.22                  dplyr_0.8.0.1               KernSmooth_2.23-15         
[66] Rcpp_1.0.1                  rpart_4.1-13                DEoptimR_1.0-8              tidyselect_0.2.5            xfun_0.6

Cydar normalization normalizeBatch • 1.9k views

ADD COMMENT • link updated 5.9 years ago by Aaron Lun ★ 28k • written 5.9 years ago by Florent ▴ 10

score 1 · Answer 1 · 2019-04-15

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 7 hours ago

The city by the bay

Both your batch.x and batch.comp are incorrect.

batch.x needs to be a list of ncdfFlowSetobjects. Each element of the list should be one ncdfFlowSet containing all samples for one batch. You're currently supplying a list that contains one ncdfFlowSet object containing samples for all batches. (Alternatively, you can replace each ncdfFlowSet with a list of matrices, with one matrix per sample in that batch; effectively making batch.x a list of lists of matrices.)

batch.comp needs to be a list of the same length as batch.x. Each entry of batch.comp corresponds to the entry at the same position of batch.x. Each entry of batch.comp should contain some identifier specifying the group to which each sample in that batch belongs. This is usually a character vector of length equal to the number of samples in that batch. Each value in that vector indicates the assigned group of a sample in the batch.

Perhaps a demonstration would be more effective. Let's say we have 3 batches:

Batch 1 contains 9 samples (1 reference, 4 control, 4 treatment A)
Batch 2 contains 7 samples (1 reference, 3 control, 3 treatment B)
Batch 3 contains 10 samples (1 reference, 3 control, 3 treatment A, 3 treatment B).

Your batch.comp might then look like this:

batch.comp <- list(
    c("ref", "con", "con", "con", "con", "A", "A", "A", "A"), # batch 1
    c("ref", "con", "con", "con", "B", "B", "B"), # batch 2
    c("ref", "con", "con", "con", "A", "A", "A", "B", "B", "B") # batch 3
)

Obviously, the order of the above elements matters; the ordering of group identities within each element of batch.comp should be determined by the ordering of samples within each element of batch.x. This is analogous to specifying a design matrix in limma or edgeR; you have to get your sample identities right.

ADD COMMENT • link 5.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you for your answer, it seems to work now. However I still have a question: For the quantile mode, I need to do one group with my references samples in Batch.comp (I therefore removed the reference samples from the batches, which could be a mistake? But otherwise I'm having them twice).

> batch.comp
$Batch02.fcs
[1] EC  EC  FHE FHE FHY YC  YC 
Levels: EC FHE FHY YC

$Batch03.fcs
[1] EC  EC  FHE FHY FHY YC 
Levels: EC FHE FHY YC

$Batch04.fcs
[1] EC EC YC YC
Levels: EC YC

$Batch05.fcs
 [1] EC  EC  EC  FHE FHE FHE FHE FHY YC  YC  YC 
Levels: EC FHE FHY YC

$Batch06.fcs
 [1] EC  EC  EC  EC  FHE FHE FHY FHY FHY YC  YC 
Levels: EC FHE FHY YC

$Batch07.fcs
 [1] EC  EC  EC  FHE FHE FHE FHY FHY FHY FHY YC  YC 
Levels: EC FHE FHY YC

$Batch08.fcs
 [1] EC  EC  EC  FHE FHE FHY FHY YC  YC  YC 
Levels: EC FHE FHY YC

$Batch09.fcs
 [1] EC  EC  EC  FHE FHE FHE FHY FHY YC  YC  YC 
Levels: EC FHE FHY YC

$Batch11.fcs
 [1] EC  EC  EC  FHE FHE FHE FHE FHY YC  YC 
Levels: EC FHE FHY YC

$References
[1] Reference_Batch Reference_Batch Reference_Batch Reference_Batch Reference_Batch Reference_Batch Reference_Batch Reference_Batch
[9] Reference_Batch
Levels: Reference_Batch

And to have the same length I also did it for batch.x. (I removed the references samples from the batches to put them in a groupes aside)

> batch.x
...

$Batch09.fcs
An ncdfFlowSet with 11 samples.
NCDF file : a.nc 
An object of class 'AnnotatedDataFrame'
  rowNames: EC20_Batch09.fcs EC21_Batch09.fcs ... YC18_Batch09.fcs (11 total)
  varLabels: name
  varMetadata: labelDescription

  column names:
    Time, Event_length, Y89Di, Pd102Di, Pd104Di, Pd105Di, Pd106Di, Pd108Di, Pd110Di, Ce140Di, Pr141Di, Nd142Di, Nd143Di, Nd144Di, Nd145Di, Nd146Di, Sm147Di, Nd148Di, Sm149Di, Nd150Di, Eu151Di, Sm152Di, Eu153Di, Sm154Di, Gd155Di, Gd156Di, Gd158Di, Tb159Di, Gd160Di, Dy161Di, Dy162Di, Dy163Di, Dy164Di, Ho165Di, Er166Di, Er167Di, Er168Di, Tm169Di, Er170Di, Yb171Di, Yb172Di, Yb173Di, Yb174Di, Lu175Di, Yb176Di, BCKG190Di, Ir191Di, Ir193Di, Pt195Di, Bi209Di, Center, Offset, Width, Residual


$Batch11.fcs
An ncdfFlowSet with 10 samples.
NCDF file : a.nc 
An object of class 'AnnotatedDataFrame'
  rowNames: EC25_Batch11.fcs EC26_Batch11.fcs ... YC23_Batch11.fcs (10 total)
  varLabels: name
  varMetadata: labelDescription

  column names:
    Time, Event_length, Y89Di, Pd102Di, Pd104Di, Pd105Di, Pd106Di, Pd108Di, Pd110Di, Ce140Di, Pr141Di, Nd142Di, Nd143Di, Nd144Di, Nd145Di, Nd146Di, Sm147Di, Nd148Di, Sm149Di, Nd150Di, Eu151Di, Sm152Di, Eu153Di, Sm154Di, Gd155Di, Gd156Di, Gd158Di, Tb159Di, Gd160Di, Dy161Di, Dy162Di, Dy163Di, Dy164Di, Ho165Di, Er166Di, Er167Di, Er168Di, Tm169Di, Er170Di, Yb171Di, Yb172Di, Yb173Di, Yb174Di, Lu175Di, Yb176Di, BCKG190Di, Ir191Di, Ir193Di, Pt195Di, Bi209Di, Center, Offset, Width, Residual


$References
An ncdfFlowSet with 9 samples.
NCDF file : a.nc 
An object of class 'AnnotatedDataFrame'
  rowNames: Reference_Batch02.fcs Reference_Batch03.fcs ... Reference_Batch11.fcs (9 total)
  varLabels: name
  varMetadata: labelDescription

  column names:
    Time, Event_length, Y89Di, Pd102Di, Pd104Di, Pd105Di, Pd106Di, Pd108Di, Pd110Di, Ce140Di, Pr141Di, Nd142Di, Nd143Di, Nd144Di, Nd145Di, Nd146Di, Sm147Di, Nd148Di, Sm149Di, Nd150Di, Eu151Di, Sm152Di, Eu153Di, Sm154Di, Gd155Di, Gd156Di, Gd158Di, Tb159Di, Gd160Di, Dy161Di, Dy162Di, Dy163Di, Dy164Di, Ho165Di, Er166Di, Er167Di, Er168Di, Tm169Di, Er170Di, Yb171Di, Yb172Di, Yb173Di, Yb174Di, Lu175Di, Yb176Di, BCKG190Di, Ir191Di, Ir193Di, Pt195Di, Bi209Di, Center, Offset, Width, Residual

However, the normalize batch function is returning this error:

Error in .computeCellWeights(batch.out, batch.comp) : no level of 'batch.comp' is common to all batches
3.
stop("no level of 'batch.comp' is common to all batches")
2.
.computeCellWeights(batch.out, batch.comp)
1.
normalizeBatch(batch.x, batch.comp, mode = "quantile", p = 0.05, target = 10, markers = "Eu153Di")

Target=10 correspond to the position in the list of my references samples.

Or should I keep the reference samples in their corresponding batches and having a way to tell the function their names (reference_batch)?

Best, Florent

ADD REPLY • link 5.9 years ago Florent ▴ 10

0

Entering edit mode

I therefore removed the reference samples from the batches, which could be a mistake?

Yes, that is a mistake. You should have a reference sample in each batch, that's the reason for its existence. Each factor in batch.comp should specify which sample is the reference in that batch; see my previous example.

ADD REPLY • link 5.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

I did it as you say and it's indeed working fine, Thanks!. But I wonder how the function can use the references samples then. (Especially because it is running even when I'm removing these files). In your documentation, it says: "In such cases, users should set all control samples to the same “group” in batch.comp, while all other samples should be set to batch-specific groups (and are thus ignored during the calculation of the transformation functions)." Sorry to bother you again

ADD REPLY • link 5.9 years ago Florent ▴ 10

0

Entering edit mode

Woah. Stop. Why are you removing the reference samples?

Show me exactly what you're doing. What does your batch.comp look like? You should have a "reference" level in the factor for each batch.

ADD REPLY • link 5.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi, I was only trying to understand how the function is using the "reference" level. But it's a bit over my skills. My real batch.comp has a reference level:

> batch.comp
$Batch02.fcs
[1] EC        EC        FHE       FHE       FHY       Reference YC        YC       
Levels: EC FHE FHY Reference YC

$Batch03.fcs
[1] EC        EC        FHE       FHY       FHY       Reference YC       
Levels: EC FHE FHY Reference YC

$Batch04.fcs
[1] EC        EC        Reference YC        YC       
Levels: EC Reference YC

$Batch05.fcs
 [1] EC        EC        EC        FHE       FHE       FHE       FHE       FHY       Reference YC        YC        YC       
Levels: EC FHE FHY Reference YC

$Batch06.fcs
 [1] EC        EC        EC        EC        FHE       FHE       FHY       FHY       FHY       Reference YC        YC       
Levels: EC FHE FHY Reference YC

$Batch07.fcs
 [1] EC        EC        EC        FHE       FHE       FHE       FHY       FHY       FHY       FHY       Reference YC        YC       
Levels: EC FHE FHY Reference YC

$Batch08.fcs
 [1] EC        EC        EC        FHE       FHE       FHY       FHY       Reference YC        YC        YC       
Levels: EC FHE FHY Reference YC

$Batch09.fcs
 [1] EC        EC        EC        FHE       FHE       FHE       FHY       FHY       Reference YC        YC        YC       
Levels: EC FHE FHY Reference YC

$Batch11.fcs
 [1] EC        EC        EC        FHE       FHE       FHE       FHE       FHY       Reference YC        YC       
Levels: EC FHE FHY Reference YC

ADD REPLY • link 5.9 years ago Florent ▴ 10

0

Entering edit mode

You can see that all of your batches have the EC and YC groups, in addition to the Reference group. All of these three groups will be used for batch normalization of the intensities, under the assumption that the intensity distribution should be the same across batches. This is why normalizeBatches still works when you remove the Reference group, as the remaining EC and YC groups are used for normalization.

Whether or not this is desirable depends on how reproducible the EC and YC groups are across batches. In real settings, replicates will be subject to biological variability that makes it difficult to assume that the EC and YC samples should be the same across batches if they come from different patients/animals, etc. In such cases, you want to force the algorithm to only use the Reference group, which I presume is literally the same sample that has been run across multiple batches. This can be done by setting everything else to a batch-specific value:

for (i in names(batch.comp)) {
    current <- as.character(batch.comp[[i]])
    current[current!="Reference"] <- i
    batch.comp[[i]] <- current
}

This ensures that the only group in common across all batches is the Reference group.

ADD REPLY • link 5.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Okay, I understand now! Thank you so much for your help!

Best, Florent

ADD REPLY • link 5.9 years ago Florent ▴ 10