Duplicate row.names when building SummarizedExperiment with make_se in DEP
0
0
Entering edit mode
hele7 ▴ 20
@hele7-15035
Last seen 24 days ago
Estonia

Hi,

I'm analyzing label-free proteomics data with DEP, but have run into an error of having duplicate row names.

Here's what I have done: I have generated unique identifiers as indicated in the DEP tutorial:

> head(data)
# A tibble: 6 x 26
  Protein.IDs    Gene.names LNP34_i2_1 LNP34_i2_2 LNP34_i2_3 LNP34_n_1 LNP34_n_2
  <chr>          <chr>           <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
1 P08226;A0A1B0~ Apoe          2.93e10    6.03e 9    2.79e10   3.60e10   2.63e10
2 P07724         Alb           6.36e 9    1.20e10    5.84e 9   6.28e 9   3.17e 9
3 A8DUK4;E9Q223~ Hbb-bs        2.20e 9    2.70e 9    2.85e 9   5.09e 9   2.13e 9
4 Q91VB8;P01942~ Hba-a1        1.51e 9    2.84e 9    2.00e 9   2.86e 9   2.12e 9
5 A0A075B5P6;A0~ Ighm          1.82e 9    1.28e 9    1.77e 9   4.15e 8   2.44e 9
6 Q921I1;F7BAE9~ Tf            5.06e 8    1.59e 9    5.44e 8   4.20e 8   2.53e 8
# ... with 19 more variables: LNP34_n_3 <dbl>, LNP35_i2_1 <dbl>,
#   LNP35_i2_2 <dbl>, LNP35_i2_3 <dbl>, LNP35_n_1 <dbl>, LNP35_n_2 <dbl>,
#   LNP35_n_3 <dbl>, LNP36_i2_1 <dbl>, LNP36_i2_2 <dbl>, LNP36_i2_3 <dbl>,
#   LNP36_n_1 <dbl>, LNP36_n_2 <dbl>, LNP36_n_3 <dbl>, LNP37_i2_1 <dbl>,
#   LNP37_i2_2 <dbl>, LNP37_i2_3 <dbl>, LNP37_n_1 <dbl>, LNP37_n_2 <dbl>,
#   LNP37_n_3 <dbl>

> data$Gene.names %>% duplicated() %>% any()
[1] TRUE

> data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% arrange(desc(frequency)) %>% filter(frequency > 1)
# A tibble: 6 x 2
  Gene.names frequency
  <chr>          <int>
1 _                 19
2 H2-K1              2
3 Itih4              2
4 Kng1               2
5 Sptb               2
6 Tpm3               2

> data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

> data_unique$name %>% duplicated() %>% any()
[1] FALSE

....and would now like to generate a SummarizedExperiment by using my own experimental design. Yet, receive an error on duplicate row names and non-unique values

> experimental_design
        label condition LNPcondition replicate
1  LNP34_i2_1     LNP34     LNP34_i2         1
2  LNP34_i2_2     LNP34     LNP34_i2         2
3  LNP34_i2_3     LNP34     LNP34_i2         3
4   LNP34_n_1     LNP34      LNP34_n         1
5   LNP34_n_2     LNP34      LNP34_n         2
6   LNP34_n_3     LNP34      LNP34_n         3
7  LNP35_i2_1     LNP35     LNP35_i2         1
8  LNP35_i2_2     LNP35     LNP35_i2         2
9  LNP35_i2_3     LNP35     LNP35_i2         3
10  LNP35_n_1     LNP35      LNP35_n         1
11  LNP35_n_2     LNP35      LNP35_n         2
12  LNP35_n_3     LNP35      LNP35_n         3
13 LNP36_i2_1     LNP36     LNP36_i2         1
14 LNP36_i2_2     LNP36     LNP36_i2         2
15 LNP36_i2_3     LNP36     LNP36_i2         3
16  LNP36_n_1     LNP36      LNP36_n         1
17  LNP36_n_2     LNP36      LNP36_n         2
18  LNP36_n_3     LNP36      LNP36_n         3
19 LNP37_i2_1     LNP37     LNP37_i2         1
20 LNP37_i2_2     LNP37     LNP37_i2         2
21 LNP37_i2_3     LNP37     LNP37_i2         3
22  LNP37_n_1     LNP37      LNP37_n         1
23  LNP37_n_2     LNP37      LNP37_n         2
24  LNP37_n_3     LNP37      LNP37_n         3

> LFQ_columns <- grep("^LNP", colnames(data_unique))
> data_se <- make_se(data_unique, LFQ_columns, experimental_design)
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘LNP34_1’, ‘LNP34_2’, ‘LNP34_3’, ‘LNP35_1’, ‘LNP35_2’, ‘LNP35_3’, ‘LNP36_1’, ‘LNP36_2’, ‘LNP36_3’, ‘LNP37_1’, ‘LNP37_2’, ‘LNP37_3’

I'm not sure where the error comes from (would guess that from the "condition" column of experimental_design). Though, when checking, no duplicates are found:

> any(duplicated(rownames(experimental_design)))
[1] FALSE
> any(duplicated(rownames(data_unique)))
[1] FALSE
> any(duplicated(rownames(LFQ_columns)))
[1] FALSE
> any(duplicated(colnames(experimental_design)))
[1] FALSE
> any(duplicated(colnames(data_unique)))
[1] FALSE
> any(duplicated(colnames(LFQ_columns)))

Can someone please help me on this? I have a limited knowledge in R, hence details are highly appreciated.

Sessioninfo below. Thanks!

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.5  DEP_1.10.0   readxl_1.3.1 readr_1.4.0 

loaded via a namespace (and not attached):
  [1] ProtGenerics_1.20.0         bitops_1.0-6               
  [3] matrixStats_0.58.0          doParallel_1.0.16          
  [5] RColorBrewer_1.1-2          GenomeInfoDb_1.24.2        
  [7] MSnbase_2.14.2              tools_4.0.2                
  [9] DT_0.17                     utf8_1.2.1                 
 [11] R6_2.5.0                    affyio_1.58.0              
 [13] tmvtnorm_1.4-10             BiocGenerics_0.34.0        
 [15] colorspace_2.0-0            GetoptLong_1.0.5           
 [17] tidyselect_1.1.0            compiler_4.0.2             
 [19] preprocessCore_1.50.0       cli_2.4.0                  
 [21] Biobase_2.48.0              DelayedArray_0.14.1        
 [23] sandwich_3.0-0              scales_1.1.1               
 [25] mvtnorm_1.1-1               affy_1.66.0                
 [27] digest_0.6.27               XVector_0.28.0             
 [29] htmltools_0.5.1.1           pkgconfig_2.0.3            
 [31] fastmap_1.1.0               limma_3.44.3               
 [33] htmlwidgets_1.5.3           rlang_0.4.10               
 [35] GlobalOptions_0.1.2         rstudioapi_0.13            
 [37] impute_1.62.0               shiny_1.6.0                
 [39] shape_1.4.5                 generics_0.1.0             
 [41] zoo_1.8-9                   mzID_1.26.0                
 [43] BiocParallel_1.22.0         RCurl_1.98-1.3             
 [45] magrittr_2.0.1              GenomeInfoDbData_1.2.3     
 [47] MALDIquant_1.19.3           Matrix_1.2-18              
 [49] Rcpp_1.0.6                  munsell_0.5.0              
 [51] S4Vectors_0.26.1            fansi_0.4.2                
 [53] imputeLCMD_2.0              lifecycle_1.0.0            
 [55] vsn_3.56.0                  MASS_7.3-51.6              
 [57] SummarizedExperiment_1.18.2 zlibbioc_1.34.0            
 [59] plyr_1.8.6                  grid_4.0.2                 
 [61] promises_1.2.0.1            parallel_4.0.2             
 [63] shinydashboard_0.7.1        crayon_1.4.1               
 [65] lattice_0.20-41             circlize_0.4.12            
 [67] hms_1.0.0                   mzR_2.22.0                 
 [69] ComplexHeatmap_2.4.3        pillar_1.5.1               
 [71] GenomicRanges_1.40.0        rjson_0.2.20               
 [73] codetools_0.2-16            stats4_4.0.2               
 [75] XML_3.99-0.6                glue_1.4.2                 
 [77] pcaMethods_1.80.0           BiocManager_1.30.12        
 [79] httpuv_1.5.5                png_0.1-7                  
 [81] vctrs_0.3.7                 foreach_1.5.1              
 [83] cellranger_1.1.0            tidyr_1.1.3                
 [85] gtable_0.3.0                purrr_0.3.4                
 [87] norm_1.0-9.5                clue_0.3-58                
 [89] assertthat_0.2.1            ggplot2_3.3.3              
 [91] xfun_0.22                   mime_0.10                  
 [93] xtable_1.8-4                later_1.1.0.1              
 [95] ncdf4_1.17                  tibble_3.1.0               
 [97] iterators_1.0.13            gmm_1.6-6                  
 [99] tinytex_0.31                IRanges_2.22.2             
[101] cluster_2.1.0               ellipsis_0.3.1
DEP • 50 views
ADD COMMENT

Login before adding your answer.

Traffic: 450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6