Dear Users,
We have a CyTOF experiment containing 91 samples distributed in 10 batches, with one anchor for normalization in each of these batches.
However, I can't figure out how to make it work: I would like to tell the function which files is in which batches, and which ones are the anchors so that I can use the quantile method, which seems the most appropriate for my needs.
I'm using ncdfFlowSet to import my files in batch.x:
> batch.x
An ncdfFlowSet with 91 samples.
NCDF file : a.nc
An object of class 'AnnotatedDataFrame'
rowNames: EC01_Batch02.fcs EC02_Batch02.fcs ... YC23_Batch11.fcs (91 total)
varLabels: name
varMetadata: labelDescription
column names:
Time, Event_length, Y89Di, Pd102Di, Pd104Di, Pd105Di, Pd106Di, Pd108Di, Pd110Di, Ce140Di, Pr141Di, Nd142Di, Nd143Di, Nd144Di, Nd145Di, Nd146Di, Sm147Di, Nd148Di, Sm149Di, Nd150Di, Eu151Di, Sm152Di, Eu153Di, Sm154Di, Gd155Di, Gd156Di, Gd158Di, Tb159Di, Gd160Di, Dy161Di, Dy162Di, Dy163Di, Dy164Di, Ho165Di, Er166Di, Er167Di, Er168Di, Tm169Di, Er170Di, Yb171Di, Yb172Di, Yb173Di, Yb174Di, Lu175Di, Yb176Di, BCKG190Di, Ir191Di, Ir193Di, Pt195Di, Bi209Di, Center, Offset, Width, Residual
My batch.comp is a list of factors, where one factor contain the names of all files from the same batch, and one factor is used to regroup all the references samples
> batch.comp
$Batch02
[1] EC01_Batch02 EC02_Batch02 FHE01_Batch02 FHE02_Batch02 FHY01_Batch02 YC01_Batch02 YC02_Batch02
7 Levels: EC01_Batch02 EC02_Batch02 FHE01_Batch02 FHE02_Batch02 FHY01_Batch02 ... YC02_Batch02
$Batch03
[1] EC03_Batch03 EC04_Batch03 FHE03_Batch03 FHY02_Batch03 FHY03_Batch03 YC03_Batch03
Levels: EC03_Batch03 EC04_Batch03 FHE03_Batch03 FHY02_Batch03 FHY03_Batch03 YC03_Batch03
$Batch04
[1] EC05_Batch04 EC06_Batch04 YC05_Batch04 YC06_Batch04
Levels: EC05_Batch04 EC06_Batch04 YC05_Batch04 YC06_Batch04
$Batch05
[1] EC07_Batch05 EC08_Batch05 EC09_Batch05 FHE04_Batch05 FHE08_Batch05 FHE09_Batch05 FHE10_Batch05 FHY07_Batch05 YC07_Batch05 YC08_Batch05
[11] YC09_Batch05
11 Levels: EC07_Batch05 EC08_Batch05 EC09_Batch05 FHE04_Batch05 FHE08_Batch05 ... YC09_Batch05
$Batch06
[1] EC10_Batch06 EC11_Batch06 EC12_Batch06 EC13_Batch06 FHE07_Batch06 FHE11_Batch06 FHY08_Batch06 FHY09_Batch06 FHY10_Batch06 YC04_Batch06
[11] YC10_Batch06
11 Levels: EC10_Batch06 EC11_Batch06 EC12_Batch06 EC13_Batch06 FHE07_Batch06 ... YC10_Batch06
$Batch07
[1] EC14_Batch07 EC15_Batch07 EC16_Batch07 FHE12_Batch07 FHE14_Batch07 FHE15_Batch07 FHY12_Batch07 FHY13_Batch07 FHY14_Batch07 FHY15_Batch07
[11] YC11_Batch07 YC12_Batch07
12 Levels: EC14_Batch07 EC15_Batch07 EC16_Batch07 FHE12_Batch07 FHE14_Batch07 ... YC12_Batch07
$Batch08
[1] EC17_Batch08 EC18_Batch08 EC19_Batch08 FHE16_Batch08 FHE20_Batch08 FHY16_Batch08 FHY17_Batch08 YC13_Batch08 YC14_Batch08 YC15_Batch08
10 Levels: EC17_Batch08 EC18_Batch08 EC19_Batch08 FHE16_Batch08 FHE20_Batch08 ... YC15_Batch08
$Batch09
[1] EC20_Batch09 EC21_Batch09 EC22_Batch09 FHE18_Batch09 FHE21_Batch09 FHE23_Batch09 FHY18_Batch09 FHY19_Batch09 YC16_Batch09 YC17_Batch09
[11] YC18_Batch09
11 Levels: EC20_Batch09 EC21_Batch09 EC22_Batch09 FHE18_Batch09 FHE21_Batch09 ... YC18_Batch09
$Batch11
[1] EC25_Batch11 EC26_Batch11 EC27_Batch11 FHE13_Batch11 FHE24_Batch11 FHE25_Batch11 FHE26_Batch11 FHY22_Batch11 YC22_Batch11 YC23_Batch11
10 Levels: EC25_Batch11 EC26_Batch11 EC27_Batch11 FHE13_Batch11 FHE24_Batch11 ... YC23_Batch11
$References
[1] Reference_Batch02 Reference_Batch03 Reference_Batch04 Reference_Batch05 Reference_Batch06 Reference_Batch07 Reference_Batch08
[8] Reference_Batch09 Reference_Batch11
9 Levels: Reference_Batch02 Reference_Batch03 Reference_Batch04 Reference_Batch05 ... Reference_Batch11
However, the normalizedBatch function is retourning me a length error.
> normalizeBatch(batch.x, batch.comp, mode="quantile", p=0.05,
+ target=batch.comp$References, markers=CD14)
Error in normalizeBatch(batch.x, batch.comp, mode = "quantile", p = 0.05, :
length of 'batch.x' and 'batch.comp' must be identical
The length of batch.x is 91 and the length of batch.comp is 10 (the number of batchs plus one for the references).
I read the documentation but I can't figure out how the function is working. I don't have any background in informatics, so I guess that I'm just missing the obvious. Any help would be appreciated!
Best, Florent
> sessionInfo()
R version 3.5.2 Patched (2019-01-10 r75982)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] CytoDx_1.2.1 cydar_1.6.1 SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[6] matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
[11] S4Vectors_0.20.1 BiocGenerics_0.28.0 BiocParallel_1.16.6 tidyr_0.8.3 ggplot2_3.1.0
[16] FlowSOM_1.14.1 igraph_1.2.4 ncdfFlow_2.28.1 BH_1.69.0-1 RcppArmadillo_0.9.300.2.0
[21] flowCore_1.48.1
loaded via a namespace (and not attached):
[1] viridis_0.5.1 foreach_1.4.4 viridisLite_0.3.0 ConsensusClusterPlus_1.46.0 shiny_1.2.0
[6] assertthat_0.2.1 latticeExtra_0.6-28 GenomeInfoDbData_1.2.0 yaml_2.2.0 robustbase_0.93-4
[11] pillar_1.3.1 lattice_0.20-38 glue_1.3.1 digest_0.6.18 RColorBrewer_1.1-2
[16] promises_1.0.1 XVector_0.22.0 colorspace_1.4-1 htmltools_0.3.6 httpuv_1.5.0
[21] Matrix_1.2-17 plyr_1.8.4 pcaPP_1.9-73 XML_3.98-1.19 pkgconfig_2.0.2
[26] tsne_0.1-3 zlibbioc_1.28.0 purrr_0.3.2 xtable_1.8-3 corpcor_1.6.9
[31] mvtnorm_1.0-10 scales_1.0.0 later_0.8.0 tibble_2.1.1 withr_2.1.2
[36] flowViz_1.46.1 hexbin_1.27.2 lazyeval_0.2.2 magrittr_1.5 crayon_1.3.4
[41] IDPmisc_1.1.19 mime_0.6 doParallel_1.0.14 MASS_7.3-51.3 graph_1.60.0
[46] tools_3.5.2 rpart.plot_3.0.6 glmnet_2.0-16 munsell_0.5.0 cluster_2.0.7-1
[51] compiler_3.5.2 rlang_0.3.3 grid_3.5.2 RCurl_1.95-4.12 iterators_1.0.10
[56] BiocNeighbors_1.0.0 bitops_1.0-6 codetools_0.2-16 gtable_0.3.0 rrcov_1.4-7
[61] R6_2.4.0 gridExtra_2.3 knitr_1.22 dplyr_0.8.0.1 KernSmooth_2.23-15
[66] Rcpp_1.0.1 rpart_4.1-13 DEoptimR_1.0-8 tidyselect_0.2.5 xfun_0.6
Thank you for your answer, it seems to work now. However I still have a question: For the quantile mode, I need to do one group with my references samples in Batch.comp (I therefore removed the reference samples from the batches, which could be a mistake? But otherwise I'm having them twice).
And to have the same length I also did it for batch.x. (I removed the references samples from the batches to put them in a groupes aside)
However, the normalize batch function is returning this error:
Target=10 correspond to the position in the list of my references samples.
Or should I keep the reference samples in their corresponding batches and having a way to tell the function their names (reference_batch)?
Best, Florent
Yes, that is a mistake. You should have a reference sample in each batch, that's the reason for its existence. Each factor in
batch.comp
should specify which sample is the reference in that batch; see my previous example.Dear Aaron,
I did it as you say and it's indeed working fine, Thanks!. But I wonder how the function can use the references samples then. (Especially because it is running even when I'm removing these files). In your documentation, it says: "In such cases, users should set all control samples to the same “group” in batch.comp, while all other samples should be set to batch-specific groups (and are thus ignored during the calculation of the transformation functions)." Sorry to bother you again
Woah. Stop. Why are you removing the reference samples?
Show me exactly what you're doing. What does your
batch.comp
look like? You should have a"reference"
level in the factor for each batch.Hi, I was only trying to understand how the function is using the "reference" level. But it's a bit over my skills. My real batch.comp has a reference level:
You can see that all of your batches have the
EC
andYC
groups, in addition to theReference
group. All of these three groups will be used for batch normalization of the intensities, under the assumption that the intensity distribution should be the same across batches. This is whynormalizeBatches
still works when you remove theReference
group, as the remainingEC
andYC
groups are used for normalization.Whether or not this is desirable depends on how reproducible the
EC
andYC
groups are across batches. In real settings, replicates will be subject to biological variability that makes it difficult to assume that theEC
andYC
samples should be the same across batches if they come from different patients/animals, etc. In such cases, you want to force the algorithm to only use theReference
group, which I presume is literally the same sample that has been run across multiple batches. This can be done by setting everything else to a batch-specific value:This ensures that the only group in common across all batches is the
Reference
group.Okay, I understand now! Thank you so much for your help!
Best, Florent