Question

DESeqDataSetFromMatrix Function Error

0

Entering edit mode

Kevin • 0

@99e50c5d

Last seen 2.1 years ago

United States

Hello,

I am trying to use the Bioconductor DESeq2 package but I keep running into errors. I wanted to reach out and ask for any advice on how to fix my code to use the DESeq2 package. The error message that I keep getting from DESeqDataSetFromMatrix function is "Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed." I appreciate any advice you can provide.

Code should be placed in three backticks as shown below

# Loading libraries
library( "DESeq2" )
library(ggplot2)

# I am reading in the file I downloaded locally 
library("readxl")
metadata_original <- read_excel("C:/Users/kevin/Downloads/MAYO_TCX_METADATA.xlsx")
TCX_original <- read_excel("C:/Users/kevin/Downloads/MAYO_TCX_Pipeline.xlsx")

# Here is how the metadata looks
individualID    individualIdSource  species sex race    ethnicity   yearsEducation  ageDeath    causeDeath  mannerDeath apoeGenotype    pmi pH  brainWeight diagnosis   diagnosisCriteria   CERAD   Braak   thal
11492   MayoBrainBank   Human   male    White   NA  NA  73  NA  NA  33  1   NA  NA  progressive supranuclear palsy  NA  NA  3   0
6810    MayoBrainBank   Human   male    White   NA  NA  74  NA  NA  33  1   NA  NA  progressive supranuclear palsy  NA  NA  2   2
1046    MayoBrainBank   Human   female  White   NA  NA  72  NA  NA  33  2   NA  NA  Alzheimer Disease   NA  NA  6   5
1924    MayoBrainBank   Human   female  White   NA  NA  90+ NA  NA  33  2   NA  NA  control NA  NA  2   NA

# Here is how some of the TCX data looks
ensembl_gene_id 11492   6810    1046    1924    1926    6913    892
ENSG00000227232 95  128 150 52  102 151 143
ENSG00000279457 242 407 204 367 409 510 196
ENSG00000228463 207 100 184 1   49  40  61

# The purpose of using tibble is to convert the column names into row names. 
# From tutorials I read, some converted the data into this format whereas some did not.
# I am not sure if this is necessary. An example of what this does is that it takes
# the entire gene column from TCX file and make it into row name.
library(tibble)
metadata <- data.frame(column_to_rownames(metadata_original, var = "individualID"))
TCX <- data.frame(column_to_rownames(TCX_original, var = "ensembl_gene_id"))

# For TCX and Metadata they just made one column into its row names

dds <- DESeqDataSetFromMatrix(countData=TCX, 
                              colData=metadata, 
                              design=~sex, tidy = TRUE)
# The error I get is:
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100’, ‘1000’, ‘10007’, ‘1001’, ‘1002’, ‘1003’, ‘1004’, ‘1005’, ‘10055’, ‘1007’, ‘10078’, ‘1008’, ‘1009’, ‘101’, ‘1010’, ‘1011’, ‘1012’, ‘1013’, ‘10135’, ‘1014’, ‘1016’, ‘10165’, ‘1017’, ‘10175’, ‘1018’, ‘1019’, ‘102’, ‘1020’, ‘1021’, ‘1022’, ‘1023’, ‘1024’, ‘1025’, ‘10262’, ‘10269’, ‘1027’, ‘1028’, ‘10289’, ‘1029’, ‘103’, ‘1030’, ‘1032’, ‘1034’, ‘1035’, ‘1036’, ‘1037’, ‘1038’, ‘1039’, ‘104’, ‘1040’, ‘1041’, ‘1042’, ‘1043’, ‘1044’, ‘1045’, ‘1046’, ‘10461’, ‘10466’, ‘1048’, ‘1049’, ‘105’, ‘1050’, ‘1051’, ‘1053’, ‘1054’, ‘1055’, ‘1056’, ‘1057’, ‘10585’, ‘1059’, ‘106’, ‘1061’, ‘1062’, ‘10636’, ‘1064’, ‘1065’, ‘1066’, ‘10660’, ‘1067’, ‘1068’, ‘10681’, ‘1069’, ‘107’, ‘1070’, ‘1071’, ‘1072’, ‘1073’, ‘1074’, ‘1075’, ‘1077’, ‘1078’, ‘10782’, ‘1079’, ‘108’, ‘1080’, ‘1081’, ‘10826’, ‘1083’, ‘1084’, ‘1085’, ‘10865’, ‘1087’, ‘1088’, ‘1089’, ‘109’, ‘1090’, ‘1091’, ‘1092’, ‘10926’, ‘1093’, ‘1094’, ‘1097’, ‘1099’, ‘11’, ‘110’, ‘1100’, ‘11004 [... truncated] 

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
system code page: 936

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_3.1.6                readxl_1.4.0                ggplot2_3.3.5              
 [4] DESeq2_1.34.0               SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [7] MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.1       
[10] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.3           
[13] BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.5         Rcpp_1.0.8.3           lattice_0.20-45        png_0.1-7             
 [5] Biostrings_2.62.0      assertthat_0.2.1       utf8_1.2.2             cellranger_1.1.0      
 [9] R6_2.5.1               RSQLite_2.2.11         httr_1.4.2             pillar_1.7.0          
[13] zlibbioc_1.40.0        rlang_1.0.2            rstudioapi_0.13        annotate_1.72.0       
[17] blob_1.2.2             Matrix_1.4-1           splines_4.1.2          BiocParallel_1.28.3   
[21] geneplotter_1.72.0     RCurl_1.98-1.6         bit_4.0.4              munsell_0.5.0         
[25] DelayedArray_0.20.0    compiler_4.1.2         pkgconfig_2.0.3        tidyselect_1.1.2      
[29] KEGGREST_1.34.0        GenomeInfoDbData_1.2.7 XML_3.99-0.9           fansi_1.0.3           
[33] withr_2.5.0            crayon_1.5.1           dplyr_1.0.8            bitops_1.0-7          
[37] grid_4.1.2             xtable_1.8-4           gtable_0.3.0           lifecycle_1.0.1       
[41] DBI_1.1.2              magrittr_2.0.2         scales_1.1.1           cli_3.2.0             
[45] cachem_1.0.6           XVector_0.34.0         genefilter_1.76.0      ellipsis_0.3.2        
[49] vctrs_0.3.8            generics_0.1.2         RColorBrewer_1.1-2     tools_4.1.2           
[53] bit64_4.0.5            glue_1.6.2             purrr_0.3.4            parallel_4.1.2        
[57] fastmap_1.1.0          survival_3.3-1         AnnotationDbi_1.56.2   colorspace_2.0-3      
[61] memoise_2.0.1

DESeq2 Bioconductor • 1.1k views

ADD COMMENT • link updated 2.1 years ago by Michael Love 41k • written 2.1 years ago by Kevin • 0

0

Entering edit mode

Is length(unique(rownames(TCX))==length(rownames(TCX)) TRUE ?

ADD REPLY • link 2.1 years ago Basti ▴ 780

0

Entering edit mode

Thank you so much for the response. Yes, I just checked right now that both length(unique(rownames(TCX)) and length(rownames(TCX)) return 17011 so they are in fact equal. I'm not sure why I still keep getting the error "non-unique values when setting 'row.names'"

ADD REPLY • link 2.1 years ago Kevin • 0

score 0 · Answer 1 · 2022-03-29

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Here is your clue:

non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100’, ‘1000’, ‘10007’,...

Check rownames of the count matrix and colData, and deal with duplicated(rownames(...)) first before providing to DESeq2.

ADD COMMENT • link 2.1 years ago Michael Love 41k

0

Entering edit mode

Thank you so much for the response. I double checked duplicated(rownames(TCX)) and it returned an array with all false so hopefully that means I do not have any duplicated values. However, I still keep getting the error "non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100’, ‘1000’, ‘10007’,...". Is my data formatted incorrectly?

ADD REPLY • link 2.1 years ago Kevin • 0

0

Entering edit mode

Make TCX into a matrix with rownames first, and set tidy=FALSE this may help you debug.

ADD REPLY • link 2.1 years ago Michael Love 41k