Hi,
I'm analyzing label-free proteomics data with DEP, but have run into an error of having duplicate row names.
Here's what I have done: I have generated unique identifiers as indicated in the DEP tutorial:
> head(data)
# A tibble: 6 x 26
Protein.IDs Gene.names LNP34_i2_1 LNP34_i2_2 LNP34_i2_3 LNP34_n_1 LNP34_n_2
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 P08226;A0A1B0~ Apoe 2.93e10 6.03e 9 2.79e10 3.60e10 2.63e10
2 P07724 Alb 6.36e 9 1.20e10 5.84e 9 6.28e 9 3.17e 9
3 A8DUK4;E9Q223~ Hbb-bs 2.20e 9 2.70e 9 2.85e 9 5.09e 9 2.13e 9
4 Q91VB8;P01942~ Hba-a1 1.51e 9 2.84e 9 2.00e 9 2.86e 9 2.12e 9
5 A0A075B5P6;A0~ Ighm 1.82e 9 1.28e 9 1.77e 9 4.15e 8 2.44e 9
6 Q921I1;F7BAE9~ Tf 5.06e 8 1.59e 9 5.44e 8 4.20e 8 2.53e 8
# ... with 19 more variables: LNP34_n_3 <dbl>, LNP35_i2_1 <dbl>,
# LNP35_i2_2 <dbl>, LNP35_i2_3 <dbl>, LNP35_n_1 <dbl>, LNP35_n_2 <dbl>,
# LNP35_n_3 <dbl>, LNP36_i2_1 <dbl>, LNP36_i2_2 <dbl>, LNP36_i2_3 <dbl>,
# LNP36_n_1 <dbl>, LNP36_n_2 <dbl>, LNP36_n_3 <dbl>, LNP37_i2_1 <dbl>,
# LNP37_i2_2 <dbl>, LNP37_i2_3 <dbl>, LNP37_n_1 <dbl>, LNP37_n_2 <dbl>,
# LNP37_n_3 <dbl>
> data$Gene.names %>% duplicated() %>% any()
[1] TRUE
> data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% arrange(desc(frequency)) %>% filter(frequency > 1)
# A tibble: 6 x 2
Gene.names frequency
<chr> <int>
1 _ 19
2 H2-K1 2
3 Itih4 2
4 Kng1 2
5 Sptb 2
6 Tpm3 2
> data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")
> data_unique$name %>% duplicated() %>% any()
[1] FALSE
....and would now like to generate a SummarizedExperiment by using my own experimental design. Yet, receive an error on duplicate row names and non-unique values
> experimental_design
label condition LNPcondition replicate
1 LNP34_i2_1 LNP34 LNP34_i2 1
2 LNP34_i2_2 LNP34 LNP34_i2 2
3 LNP34_i2_3 LNP34 LNP34_i2 3
4 LNP34_n_1 LNP34 LNP34_n 1
5 LNP34_n_2 LNP34 LNP34_n 2
6 LNP34_n_3 LNP34 LNP34_n 3
7 LNP35_i2_1 LNP35 LNP35_i2 1
8 LNP35_i2_2 LNP35 LNP35_i2 2
9 LNP35_i2_3 LNP35 LNP35_i2 3
10 LNP35_n_1 LNP35 LNP35_n 1
11 LNP35_n_2 LNP35 LNP35_n 2
12 LNP35_n_3 LNP35 LNP35_n 3
13 LNP36_i2_1 LNP36 LNP36_i2 1
14 LNP36_i2_2 LNP36 LNP36_i2 2
15 LNP36_i2_3 LNP36 LNP36_i2 3
16 LNP36_n_1 LNP36 LNP36_n 1
17 LNP36_n_2 LNP36 LNP36_n 2
18 LNP36_n_3 LNP36 LNP36_n 3
19 LNP37_i2_1 LNP37 LNP37_i2 1
20 LNP37_i2_2 LNP37 LNP37_i2 2
21 LNP37_i2_3 LNP37 LNP37_i2 3
22 LNP37_n_1 LNP37 LNP37_n 1
23 LNP37_n_2 LNP37 LNP37_n 2
24 LNP37_n_3 LNP37 LNP37_n 3
> LFQ_columns <- grep("^LNP", colnames(data_unique))
> data_se <- make_se(data_unique, LFQ_columns, experimental_design)
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘LNP34_1’, ‘LNP34_2’, ‘LNP34_3’, ‘LNP35_1’, ‘LNP35_2’, ‘LNP35_3’, ‘LNP36_1’, ‘LNP36_2’, ‘LNP36_3’, ‘LNP37_1’, ‘LNP37_2’, ‘LNP37_3’
I'm not sure where the error comes from (would guess that from the "condition" column of experimental_design). Though, when checking, no duplicates are found:
> any(duplicated(rownames(experimental_design)))
[1] FALSE
> any(duplicated(rownames(data_unique)))
[1] FALSE
> any(duplicated(rownames(LFQ_columns)))
[1] FALSE
> any(duplicated(colnames(experimental_design)))
[1] FALSE
> any(duplicated(colnames(data_unique)))
[1] FALSE
> any(duplicated(colnames(LFQ_columns)))
Can someone please help me on this? I have a limited knowledge in R, hence details are highly appreciated.
Sessioninfo below. Thanks!
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.5 DEP_1.10.0 readxl_1.3.1 readr_1.4.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.20.0 bitops_1.0-6
[3] matrixStats_0.58.0 doParallel_1.0.16
[5] RColorBrewer_1.1-2 GenomeInfoDb_1.24.2
[7] MSnbase_2.14.2 tools_4.0.2
[9] DT_0.17 utf8_1.2.1
[11] R6_2.5.0 affyio_1.58.0
[13] tmvtnorm_1.4-10 BiocGenerics_0.34.0
[15] colorspace_2.0-0 GetoptLong_1.0.5
[17] tidyselect_1.1.0 compiler_4.0.2
[19] preprocessCore_1.50.0 cli_2.4.0
[21] Biobase_2.48.0 DelayedArray_0.14.1
[23] sandwich_3.0-0 scales_1.1.1
[25] mvtnorm_1.1-1 affy_1.66.0
[27] digest_0.6.27 XVector_0.28.0
[29] htmltools_0.5.1.1 pkgconfig_2.0.3
[31] fastmap_1.1.0 limma_3.44.3
[33] htmlwidgets_1.5.3 rlang_0.4.10
[35] GlobalOptions_0.1.2 rstudioapi_0.13
[37] impute_1.62.0 shiny_1.6.0
[39] shape_1.4.5 generics_0.1.0
[41] zoo_1.8-9 mzID_1.26.0
[43] BiocParallel_1.22.0 RCurl_1.98-1.3
[45] magrittr_2.0.1 GenomeInfoDbData_1.2.3
[47] MALDIquant_1.19.3 Matrix_1.2-18
[49] Rcpp_1.0.6 munsell_0.5.0
[51] S4Vectors_0.26.1 fansi_0.4.2
[53] imputeLCMD_2.0 lifecycle_1.0.0
[55] vsn_3.56.0 MASS_7.3-51.6
[57] SummarizedExperiment_1.18.2 zlibbioc_1.34.0
[59] plyr_1.8.6 grid_4.0.2
[61] promises_1.2.0.1 parallel_4.0.2
[63] shinydashboard_0.7.1 crayon_1.4.1
[65] lattice_0.20-41 circlize_0.4.12
[67] hms_1.0.0 mzR_2.22.0
[69] ComplexHeatmap_2.4.3 pillar_1.5.1
[71] GenomicRanges_1.40.0 rjson_0.2.20
[73] codetools_0.2-16 stats4_4.0.2
[75] XML_3.99-0.6 glue_1.4.2
[77] pcaMethods_1.80.0 BiocManager_1.30.12
[79] httpuv_1.5.5 png_0.1-7
[81] vctrs_0.3.7 foreach_1.5.1
[83] cellranger_1.1.0 tidyr_1.1.3
[85] gtable_0.3.0 purrr_0.3.4
[87] norm_1.0-9.5 clue_0.3-58
[89] assertthat_0.2.1 ggplot2_3.3.3
[91] xfun_0.22 mime_0.10
[93] xtable_1.8-4 later_1.1.0.1
[95] ncdf4_1.17 tibble_3.1.0
[97] iterators_1.0.13 gmm_1.6-6
[99] tinytex_0.31 IRanges_2.22.2
[101] cluster_2.1.0 ellipsis_0.3.1
did you get your answer? I've been using DEP several times but the error only shows up now for some reason?