Iterating over a column within colData@listData
2
0
Entering edit mode
@11703f46
Last seen 16 months ago
Australia

Hi all, In this summarizedexperiment I can't figure out how to iterate over entries which are at the 'deepest' (not sure of the correct terminology but requiring the most brackets) level. My crude attempt at a for loop where treatments is a list of dataframes (121 indices long) and the 7th column within each dataframe ("treatment_type") is what I want to print/save from each dataframe,

for(i in dds@colData@listData[["treatments"]][[1:121]][["treatment_type"]]){
  print(i)
}

gets a recursive indexing failed at level 3 error. So now I'm thinking if I have to independently save each element of the "treatments" list first (where each element is it's own dataframe) it quickly becomes manually demanding which is the opposite of what scripting is supposed to be about. But I haven't found anything close searching through the forum.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.15.0            AnnotationDbi_1.58.0           EnhancedVolcano_1.14.0         ggrepel_0.9.1                  ggplot2_3.3.6                 
 [6] BiocParallel_1.30.3            DESeq2_1.36.0                  pipette_0.8.0                  tidySummarizedExperiment_1.6.1 tidybulk_1.8.0                
[11] dplyr_1.0.9                    SummarizedExperiment_1.26.1    Biobase_2.56.0                 GenomicRanges_1.48.0           GenomeInfoDb_1.32.2           
[16] IRanges_2.30.0                 S4Vectors_0.34.0               BiocGenerics_0.42.0            MatrixGenerics_1.8.0           matrixStats_0.62.0            

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3       ellipsis_0.3.2         XVector_0.36.0         rstudioapi_0.13        farver_2.1.0           bit64_4.0.5            mvtnorm_1.1-3         
  [8] RSpectra_0.16-1        fansi_1.0.3            apeglm_1.18.0          codetools_0.2-18       splines_4.2.1          cachem_1.0.6           goalie_0.6.0          
 [15] geneplotter_1.74.0     jsonlite_1.8.0         umap_0.2.8.0           annotate_1.74.0        ashr_2.2-54            png_0.1-7              readr_2.1.2           
 [22] compiler_4.2.1         httr_1.4.3             assertthat_0.2.1       Matrix_1.4-1           fastmap_1.1.0          lazyeval_0.2.2         cli_3.3.0             
 [29] htmltools_0.5.2        tools_4.2.1            coda_0.19-4            gtable_0.3.0           glue_1.6.2             GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3          
 [36] bbmle_1.0.25           vctrs_0.4.1            Biostrings_2.64.0      AcidGenerics_0.6.0     preprocessCore_1.58.0  stringr_1.4.0          ps_1.7.1              
 [43] syntactic_0.5.2        lifecycle_1.0.1        irlba_2.3.5            XML_3.99-0.10          AcidCLI_0.2.0          zlibbioc_1.42.0        MASS_7.3-57           
 [50] scales_1.2.0           hms_1.1.1              parallel_4.2.1         RColorBrewer_1.1-3     memoise_2.0.1          reticulate_1.25        emdbook_1.3.12        
 [57] bdsmatrix_1.3-6        stringi_1.7.6          RSQLite_2.2.14         SQUAREM_2021.1         genefilter_1.78.0      BiocIO_1.6.0           truncnorm_1.0-8       
 [64] rlang_1.0.2            pkgconfig_2.0.3        bitops_1.0-7           lattice_0.20-45        invgamma_1.1           purrr_0.3.4            htmlwidgets_1.5.4     
 [71] labeling_0.4.2         bit_4.0.4              processx_3.6.1         tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3         R6_2.5.1              
 [78] generics_0.1.2         DelayedArray_0.22.0    DBI_1.1.3              pillar_1.7.0           withr_2.5.0            survival_3.3-1         KEGGREST_1.36.2       
 [85] RCurl_1.98-1.7         mixsqp_0.3-43          tibble_3.1.7           crayon_1.5.1           utf8_1.2.2             plotly_4.10.0          tzdb_0.3.0            
 [92] locfit_1.5-9.5         grid_4.2.1             data.table_1.14.2      blob_1.2.3             digest_0.6.29          xtable_1.8-4           AcidBase_0.5.0        
 [99] numDeriv_2016.8-1.1    tidyr_1.2.0            openssl_2.0.2          munsell_0.5.0          viridisLite_0.4.0      askpass_1.1
DESeq2 iterating SummarizedExperiment • 1.6k views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 1 day ago
Germany

In general it is bad practice to manually access slots in an object via @. There are dedicated getter and setter functions for this. Here that is colData(dds) to get the colData DataFrame. Can you give a reproducible example of what you want to do? I cannot say I get the question. In all likelihood, a loop is not what you need.

ADD COMMENT
0
Entering edit mode

Thank you for taking the time out of your day to try and help ATpoint.

The reason I have been attempting to use @ to access this particular element of colData is that using colData(dds$treatments) returns the error "unable to find an inherited method for function 'colData' for signature '"list"'.

An attempt at being more reproducible; rangedSummarizedExperiment data structure with 'treatments' as a column of colData. 'treatments' is a list of dataframes and I want to access and save the element at [2,6] within every dataframe stored in 'treatments'. This is a TCGA dataset fwiw.

ADD REPLY
0
Entering edit mode

colData(dds) is what you want.

When you use @ you are bypassing the Bioconductor API so your code is more likely to break in the future.

ADD REPLY
0
Entering edit mode
@11703f46
Last seen 16 months ago
Australia

Thanks for your help, I have cleaned up some of my code by removing @, that was very handy to learn (side note, I think I got into the habit of using @ as the console displays @ based accessing after using View() on your data, oops).

Adding this in case it is useful to any beginners in the future. If you have a column of colData that is a list of dataframes and you want to access some element within each dataframe, a starting point for you might be as follows:

my_list <- colData(dds)$list_of_dataframes

for(i in 1:length(my_list){
  print(my_list[[i]]["name_of_column_of_interest_within_dataframe"])
}
ADD COMMENT
1
Entering edit mode

What's the use case for having a column in your colData that itself is a list of data.frames? That seems like a needlessly complex nesting of data, and may not be correctly manipulated if you subset the SummarizedExperiment object.

ADD REPLY
0
Entering edit mode

I can't speak to the motivations for that decision as it is just how the LGG SummarizedExperiment is arranged when downloaded from The Cancer Genome Atlas using TCGAbiolinks. The majority of the dataframes are filled with NA's also. If interested you can see the dataset for yourself by running the following code:

library(TCGAbiolinks)
query_TCGA = GDCquery(
   project = "TCGA-LGG",
   data.category = "Transcriptome Profiling",
   experimental.strategy = "RNA-Seq",
   workflow.type = "HTSeq - Counts")

GDCdownload(query = query_TCGA)
data <- GDCprepare(query = query_TCGA)

str(head(colData(data)$treatments))
head(colData(data)$treatments)
ADD REPLY

Login before adding your answer.

Traffic: 485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6