Hi,
I'm analyzing label-free proteomics data with DEP, but have run into an error of having duplicate row names.
Here's what I have done: I have generated unique identifiers as indicated in the DEP tutorial:
> head(data)
# A tibble: 6 x 26
Protein.IDs Gene.names LNP34_i2_1 LNP34_i2_2 LNP34_i2_3 LNP34_n_1 LNP34_n_2
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 P08226;A0A1B0~ Apoe 2.93e10 6.03e 9 2.79e10 3.60e10 2.63e10
2 P07724 Alb 6.36e 9 1.20e10 5.84e 9 6.28e 9 3.17e 9
3 A8DUK4;E9Q223~ Hbb-bs 2.20e 9 2.70e 9 2.85e 9 5.09e 9 2.13e 9
4 Q91VB8;P01942~ Hba-a1 1.51e 9 2.84e 9 2.00e 9 2.86e 9 2.12e 9
5 A0A075B5P6;A0~ Ighm 1.82e 9 1.28e 9 1.77e 9 4.15e 8 2.44e 9
6 Q921I1;F7BAE9~ Tf 5.06e 8 1.59e 9 5.44e 8 4.20e 8 2.53e 8
# ... with 19 more variables: LNP34_n_3 <dbl>, LNP35_i2_1 <dbl>,
# LNP35_i2_2 <dbl>, LNP35_i2_3 <dbl>, LNP35_n_1 <dbl>, LNP35_n_2 <dbl>,
# LNP35_n_3 <dbl>, LNP36_i2_1 <dbl>, LNP36_i2_2 <dbl>, LNP36_i2_3 <dbl>,
# LNP36_n_1 <dbl>, LNP36_n_2 <dbl>, LNP36_n_3 <dbl>, LNP37_i2_1 <dbl>,
# LNP37_i2_2 <dbl>, LNP37_i2_3 <dbl>, LNP37_n_1 <dbl>, LNP37_n_2 <dbl>,
# LNP37_n_3 <dbl>
> data$Gene.names %>% duplicated() %>% any()
[1] TRUE
> data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% arrange(desc(frequency)) %>% filter(frequency > 1)
# A tibble: 6 x 2
Gene.names frequency
<chr> <int>
1 _ 19
2 H2-K1 2
3 Itih4 2
4 Kng1 2
5 Sptb 2
6 Tpm3 2
> data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")
> data_unique$name %>% duplicated() %>% any()
[1] FALSE
....and would now like to generate a SummarizedExperiment by using my own experimental design. Yet, receive an error on duplicate row names and non-unique values
> experimental_design
label condition LNPcondition replicate
1 LNP34_i2_1 LNP34 LNP34_i2 1
2 LNP34_i2_2 LNP34 LNP34_i2 2
3 LNP34_i2_3 LNP34 LNP34_i2 3
4 LNP34_n_1 LNP34 LNP34_n 1
5 LNP34_n_2 LNP34 LNP34_n 2
6 LNP34_n_3 LNP34 LNP34_n 3
7 LNP35_i2_1 LNP35 LNP35_i2 1
8 LNP35_i2_2 LNP35 LNP35_i2 2
9 LNP35_i2_3 LNP35 LNP35_i2 3
10 LNP35_n_1 LNP35 LNP35_n 1
11 LNP35_n_2 LNP35 LNP35_n 2
12 LNP35_n_3 LNP35 LNP35_n 3
13 LNP36_i2_1 LNP36 LNP36_i2 1
14 LNP36_i2_2 LNP36 LNP36_i2 2
15 LNP36_i2_3 LNP36 LNP36_i2 3
16 LNP36_n_1 LNP36 LNP36_n 1
17 LNP36_n_2 LNP36 LNP36_n 2
18 LNP36_n_3 LNP36 LNP36_n 3
19 LNP37_i2_1 LNP37 LNP37_i2 1
20 LNP37_i2_2 LNP37 LNP37_i2 2
21 LNP37_i2_3 LNP37 LNP37_i2 3
22 LNP37_n_1 LNP37 LNP37_n 1
23 LNP37_n_2 LNP37 LNP37_n 2
24 LNP37_n_3 LNP37 LNP37_n 3
> LFQ_columns <- grep("^LNP", colnames(data_unique))
> data_se <- make_se(data_unique, LFQ_columns, experimental_design)
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘LNP34_1’, ‘LNP34_2’, ‘LNP34_3’, ‘LNP35_1’, ‘LNP35_2’, ‘LNP35_3’, ‘LNP36_1’, ‘LNP36_2’, ‘LNP36_3’, ‘LNP37_1’, ‘LNP37_2’, ‘LNP37_3’
I'm not sure where the error comes from (would guess that from the "condition" column of experimental_design). Though, when checking, no duplicates are found:
> any(duplicated(rownames(experimental_design)))
[1] FALSE
> any(duplicated(rownames(data_unique)))
[1] FALSE
> any(duplicated(rownames(LFQ_columns)))
[1] FALSE
> any(duplicated(colnames(experimental_design)))
[1] FALSE
> any(duplicated(colnames(data_unique)))
[1] FALSE
> any(duplicated(colnames(LFQ_columns)))
Can someone please help me on this? I have a limited knowledge in R, hence details are highly appreciated.
Sessioninfo below. Thanks!
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.5 DEP_1.10.0 readxl_1.3.1 readr_1.4.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.20.0 bitops_1.0-6
[3] matrixStats_0.58.0 doParallel_1.0.16
[5] RColorBrewer_1.1-2 GenomeInfoDb_1.24.2
[7] MSnbase_2.14.2 tools_4.0.2
[9] DT_0.17 utf8_1.2.1
[11] R6_2.5.0 affyio_1.58.0
[13] tmvtnorm_1.4-10 BiocGenerics_0.34.0
[15] colorspace_2.0-0 GetoptLong_1.0.5
[17] tidyselect_1.1.0 compiler_4.0.2
[19] preprocessCore_1.50.0 cli_2.4.0
[21] Biobase_2.48.0 DelayedArray_0.14.1
[23] sandwich_3.0-0 scales_1.1.1
[25] mvtnorm_1.1-1 affy_1.66.0
[27] digest_0.6.27 XVector_0.28.0
[29] htmltools_0.5.1.1 pkgconfig_2.0.3
[31] fastmap_1.1.0 limma_3.44.3
[33] htmlwidgets_1.5.3 rlang_0.4.10
[35] GlobalOptions_0.1.2 rstudioapi_0.13
[37] impute_1.62.0 shiny_1.6.0
[39] shape_1.4.5 generics_0.1.0
[41] zoo_1.8-9 mzID_1.26.0
[43] BiocParallel_1.22.0 RCurl_1.98-1.3
[45] magrittr_2.0.1 GenomeInfoDbData_1.2.3
[47] MALDIquant_1.19.3 Matrix_1.2-18
[49] Rcpp_1.0.6 munsell_0.5.0
[51] S4Vectors_0.26.1 fansi_0.4.2
[53] imputeLCMD_2.0 lifecycle_1.0.0
[55] vsn_3.56.0 MASS_7.3-51.6
[57] SummarizedExperiment_1.18.2 zlibbioc_1.34.0
[59] plyr_1.8.6 grid_4.0.2
[61] promises_1.2.0.1 parallel_4.0.2
[63] shinydashboard_0.7.1 crayon_1.4.1
[65] lattice_0.20-41 circlize_0.4.12
[67] hms_1.0.0 mzR_2.22.0
[69] ComplexHeatmap_2.4.3 pillar_1.5.1
[71] GenomicRanges_1.40.0 rjson_0.2.20
[73] codetools_0.2-16 stats4_4.0.2
[75] XML_3.99-0.6 glue_1.4.2
[77] pcaMethods_1.80.0 BiocManager_1.30.12
[79] httpuv_1.5.5 png_0.1-7
[81] vctrs_0.3.7 foreach_1.5.1
[83] cellranger_1.1.0 tidyr_1.1.3
[85] gtable_0.3.0 purrr_0.3.4
[87] norm_1.0-9.5 clue_0.3-58
[89] assertthat_0.2.1 ggplot2_3.3.3
[91] xfun_0.22 mime_0.10
[93] xtable_1.8-4 later_1.1.0.1
[95] ncdf4_1.17 tibble_3.1.0
[97] iterators_1.0.13 gmm_1.6-6
[99] tinytex_0.31 IRanges_2.22.2
[101] cluster_2.1.0 ellipsis_0.3.1
did you get your answer? I've been using DEP several times but the error only shows up now for some reason?
I encountered the same problem. Problem seems to be "make_se()" function that does not use variable 'label' to bind colData to assay, but instead it creates a new id column from all combinations of variables "condition" and "replicate". This does result ID's that are not unique, if there is more than one condition variable. My solution was to create single condition variable, with all combinations of conditions as levels. To me it seems that your "LNPcondition" would allready be such variable. Just rename "conditionLNP" as "condition" and it should work. Though I would have prefered, if I could have kept the condition columns separate.