Greeetings
This is my first time using bioinformatics data and manipulating S4 objects in R so apologies if it's very elementary, but I seem to be having trouble following along the tutorial cBioPortal provides
https://cbioportal.github.io/2020-cbioportal-r-workshop/Example2.pdf
(Please note that in my Spanish session, your ’ ’ are " " for me)
When I reach
upsetSamples(LUAD_MAE)
in page 6, I get the following error
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
Además: Warning message:
non-unique values when setting 'row.names': ‘TCGA-50-5066’, ‘TCGA-50-5946’
Now after doing
LUAD_MAE@colData@rownames == "TCGA-50-5066"
LUAD_MAE@colData@rownames == "TCGA-50-5946"
I confirm that I get TRUE twice, in rows just one after another. So there seems to be a duplicate subject here that was maybe added to the data after the tutorial was uploaded. I figured, no big deal, I will be careful with overlapping data in the future but this is just to get the feel of the functions offered, I'll rename them to something made up and the rest of the object and columns should be fine, just assigned to a name that is different to the original
However, trying
"TCGA-50-5166" <- LUAD_MAE@colData@rownames == "TCGA-50-5066"
and
"TCGA-50-5266" <- LUAD_MAE@colData@rownames == "TCGA-50-5946"
to overwrite them with madeup rownames doesn't seem to work. same error returns upon trying to run upsetSamples, and the same check with LUAD_MAE@colData@rownames == "TCGA-50-5166" brings a fully FALSE output, indicating I didn't even overwrite it properly. Guess here I found out editing S4 objects isn't as trivial as with a normal dataframe.
I tried to search before asking for help here, and SlotOP seems like it would do what I want if I just get the change to rownames as a string to input into LUAD_MAE, but unfortunately https://stat.ethz.ch/R-manual/R-devel/library/base/html/slotOp.html has a broken link at 'see base for more details', wayback machine doesn't have it archived, and the simplest slotOP( LUAD_MAE@coldata@rownames<-LUAD_MAErownamescorrected)
doesn't work nor do I have easily googleable examples of use.
Please advise why the source data appears to be different now than at the time of the tutorial, and clarify if this is just a mistake in handling S4 objects and editing them that has an obvious solution I'm missing.
The traceback() output is
7: stop("duplicate 'row.names' are not allowed")
6: `.rowNamesDF<-`(x, value = value)
5: `row.names<-.data.frame`(`*tmp*`, value = value)
4: `row.names<-`(`*tmp*`, value = value)
3: `rownames<-`(`*tmp*`, value = rownames(colData(mae)))
2: `rownames<-`(`*tmp*`, value = rownames(colData(mae)))
1: upsetSamples(LUAD_MAE)
As for my session info
```sessionInfo( ) R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] UpSetR_1.4.0 ggplot2_3.3.3 stringr_1.4.0
[4] httr_1.4.2 cBioPortalData_2.2.8 MultiAssayExperiment_1.16.0
[7] SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0
[10] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[13] BiocGenerics_0.36.0 MatrixGenerics_1.2.1 matrixStats_0.58.0
[16] AnVIL_1.2.0 dplyr_1.0.5
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_4.0.5 progress_1.2.2
[4] rprojroot_2.0.2 GenomicDataCommons_1.14.0 tools_4.0.5
[7] utf8_1.2.1 R6_2.5.0 colorspace_2.0-0
[10] DBI_1.1.1 withr_2.4.1 processx_3.5.1
[13] gridExtra_2.3 tidyselect_1.1.0 prettyunits_1.1.1
[16] TCGAutils_1.10.0 bit_4.0.4 curl_4.3
[19] compiler_4.0.5 cli_2.4.0 rvest_1.0.0
[22] formatR_1.9 xml2_1.3.2 DelayedArray_0.16.3
[25] rtracklayer_1.49.5 scales_1.1.1 readr_1.4.0
[28] callr_3.6.0 askpass_1.1 rappdirs_0.3.3
[31] rapiclient_0.1.3 RCircos_1.2.1 digest_0.6.27
[34] Rsamtools_2.6.0 XVector_0.30.0 pkgconfig_2.0.3
[37] dbplyr_2.1.1 fastmap_1.1.0 limma_3.46.0
[40] rlang_0.4.10 rstudioapi_0.13 RSQLite_2.2.5
[43] generics_0.1.0 jsonlite_1.7.2 BiocParallel_1.24.1
[46] RCurl_1.98-1.3 magrittr_2.0.1 GenomeInfoDbData_1.2.4
[49] futile.logger_1.4.3 Matrix_1.3-2 munsell_0.5.0
[52] Rcpp_1.0.6 fansi_0.4.2 lifecycle_1.0.0
[55] stringi_1.5.3 yaml_2.2.1 RaggedExperiment_1.14.1
[58] RJSONIO_1.3-1.4 zlibbioc_1.36.0 pkgbuild_1.2.0
[61] plyr_1.8.6 BiocFileCache_1.14.0 grid_4.0.5
[64] blob_1.2.1 crayon_1.4.1 lattice_0.20-41
[67] Biostrings_2.58.0 splines_4.0.5 GenomicFeatures_1.42.3
[70] hms_1.0.0 ps_1.6.0 pillar_1.6.0
[73] biomaRt_2.46.3 futile.options_1.0.1 XML_3.99-0.6
[76] glue_1.4.2 remotes_2.3.0 lambda.r_1.2.4
[79] data.table_1.14.0 BiocManager_1.30.12 vctrs_0.3.7
[82] gtable_0.3.0 tidyr_1.1.3 openssl_1.4.3
[85] purrr_0.3.4 assertthat_0.2.1 cachem_1.0.4
[88] xfun_0.22 survival_3.2-10 tibble_3.1.0
[91] RTCGAToolbox_2.20.0 GenomicAlignments_1.26.0 tinytex_0.31
[94] AnnotationDbi_1.52.0 memoise_2.0.0 ellipsis_0.3.1
```
Lastly, other issues that have turned up but that I suspect are unrelated (just in case) is that I seem to lack 'removeCache' which was supposed to be good to apply to imported data in a previous tutorial. The regular install.packages("Remove Cache") for CRAN doesn't work, nor does BiocManager::install("removeCache"), and googling gave me a package called cacheflow that seemed like a likely source but remotes::install_github("alekrutkowski/cacheflow") also does not change the fact that I get informed there is no such library to be found. On my way to do this I also discovered my haven is 2.3.1 instead of the 2.4.0 that BiocManager seems to expect, and my rtracklayer 1.49.5 instead of 1.50.0, but installing them (and RSQlite along the way) doesn't seem to work. I don't mind if this part is ignored if it's indeed unrelated.
Thanks in advance
Sorry about that - I should have checked more in depth.