Question

Iterating over a column within colData@listData

0

Entering edit mode

BioInfoBeginnerrr • 0

@11703f46

Last seen 11 months ago

Australia

Hi all, In this summarizedexperiment I can't figure out how to iterate over entries which are at the 'deepest' (not sure of the correct terminology but requiring the most brackets) level. My crude attempt at a for loop where treatments is a list of dataframes (121 indices long) and the 7th column within each dataframe ("treatment_type") is what I want to print/save from each dataframe,

for(i in dds@colData@listData[["treatments"]][[1:121]][["treatment_type"]]){
  print(i)
}

gets a recursive indexing failed at level 3 error. So now I'm thinking if I have to independently save each element of the "treatments" list first (where each element is it's own dataframe) it quickly becomes manually demanding which is the opposite of what scripting is supposed to be about. But I haven't found anything close searching through the forum.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.15.0            AnnotationDbi_1.58.0           EnhancedVolcano_1.14.0         ggrepel_0.9.1                  ggplot2_3.3.6                 
 [6] BiocParallel_1.30.3            DESeq2_1.36.0                  pipette_0.8.0                  tidySummarizedExperiment_1.6.1 tidybulk_1.8.0                
[11] dplyr_1.0.9                    SummarizedExperiment_1.26.1    Biobase_2.56.0                 GenomicRanges_1.48.0           GenomeInfoDb_1.32.2           
[16] IRanges_2.30.0                 S4Vectors_0.34.0               BiocGenerics_0.42.0            MatrixGenerics_1.8.0           matrixStats_0.62.0            

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3       ellipsis_0.3.2         XVector_0.36.0         rstudioapi_0.13        farver_2.1.0           bit64_4.0.5            mvtnorm_1.1-3         
  [8] RSpectra_0.16-1        fansi_1.0.3            apeglm_1.18.0          codetools_0.2-18       splines_4.2.1          cachem_1.0.6           goalie_0.6.0          
 [15] geneplotter_1.74.0     jsonlite_1.8.0         umap_0.2.8.0           annotate_1.74.0        ashr_2.2-54            png_0.1-7              readr_2.1.2           
 [22] compiler_4.2.1         httr_1.4.3             assertthat_0.2.1       Matrix_1.4-1           fastmap_1.1.0          lazyeval_0.2.2         cli_3.3.0             
 [29] htmltools_0.5.2        tools_4.2.1            coda_0.19-4            gtable_0.3.0           glue_1.6.2             GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3          
 [36] bbmle_1.0.25           vctrs_0.4.1            Biostrings_2.64.0      AcidGenerics_0.6.0     preprocessCore_1.58.0  stringr_1.4.0          ps_1.7.1              
 [43] syntactic_0.5.2        lifecycle_1.0.1        irlba_2.3.5            XML_3.99-0.10          AcidCLI_0.2.0          zlibbioc_1.42.0        MASS_7.3-57           
 [50] scales_1.2.0           hms_1.1.1              parallel_4.2.1         RColorBrewer_1.1-3     memoise_2.0.1          reticulate_1.25        emdbook_1.3.12        
 [57] bdsmatrix_1.3-6        stringi_1.7.6          RSQLite_2.2.14         SQUAREM_2021.1         genefilter_1.78.0      BiocIO_1.6.0           truncnorm_1.0-8       
 [64] rlang_1.0.2            pkgconfig_2.0.3        bitops_1.0-7           lattice_0.20-45        invgamma_1.1           purrr_0.3.4            htmlwidgets_1.5.4     
 [71] labeling_0.4.2         bit_4.0.4              processx_3.6.1         tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3         R6_2.5.1              
 [78] generics_0.1.2         DelayedArray_0.22.0    DBI_1.1.3              pillar_1.7.0           withr_2.5.0            survival_3.3-1         KEGGREST_1.36.2       
 [85] RCurl_1.98-1.7         mixsqp_0.3-43          tibble_3.1.7           crayon_1.5.1           utf8_1.2.2             plotly_4.10.0          tzdb_0.3.0            
 [92] locfit_1.5-9.5         grid_4.2.1             data.table_1.14.2      blob_1.2.3             digest_0.6.29          xtable_1.8-4           AcidBase_0.5.0        
 [99] numDeriv_2016.8-1.1    tidyr_1.2.0            openssl_2.0.2          munsell_0.5.0          viridisLite_0.4.0      askpass_1.1

DESeq2 iterating SummarizedExperiment • 1.4k views

ADD COMMENT • link 2.0 years ago BioInfoBeginnerrr • 0

score 0 · Answer 1 · 2022-07-10

0

Entering edit mode

ATpoint ★ 4.2k

@atpoint-13662

Last seen 14 hours ago

Germany

In general it is bad practice to manually access slots in an object via @. There are dedicated getter and setter functions for this. Here that is colData(dds) to get the colData DataFrame. Can you give a reproducible example of what you want to do? I cannot say I get the question. In all likelihood, a loop is not what you need.

ADD COMMENT • link 2.0 years ago ATpoint ★ 4.2k

0

Entering edit mode

Thank you for taking the time out of your day to try and help ATpoint.

The reason I have been attempting to use @ to access this particular element of colData is that using colData(dds$treatments) returns the error "unable to find an inherited method for function 'colData' for signature '"list"'.

An attempt at being more reproducible; rangedSummarizedExperiment data structure with 'treatments' as a column of colData. 'treatments' is a list of dataframes and I want to access and save the element at [2,6] within every dataframe stored in 'treatments'. This is a TCGA dataset fwiw.

ADD REPLY • link 2.0 years ago BioInfoBeginnerrr • 0

0

Entering edit mode

colData(dds) is what you want.

When you use @ you are bypassing the Bioconductor API so your code is more likely to break in the future.

ADD REPLY • link 2.0 years ago Michael Love 42k

score 0 · Answer 2 · 2022-07-14

0

Entering edit mode

BioInfoBeginnerrr • 0

@11703f46

Last seen 11 months ago

Australia

Thanks for your help, I have cleaned up some of my code by removing @, that was very handy to learn (side note, I think I got into the habit of using @ as the console displays @ based accessing after using View() on your data, oops).

Adding this in case it is useful to any beginners in the future. If you have a column of colData that is a list of dataframes and you want to access some element within each dataframe, a starting point for you might be as follows:

my_list <- colData(dds)$list_of_dataframes

for(i in 1:length(my_list){
  print(my_list[[i]]["name_of_column_of_interest_within_dataframe"])
}

ADD COMMENT • link 2.0 years ago BioInfoBeginnerrr • 0

1

Entering edit mode

What's the use case for having a column in your colData that itself is a list of data.frames? That seems like a needlessly complex nesting of data, and may not be correctly manipulated if you subset the SummarizedExperiment object.

ADD REPLY • link 2.0 years ago James W. MacDonald 66k

0

Entering edit mode

I can't speak to the motivations for that decision as it is just how the LGG SummarizedExperiment is arranged when downloaded from The Cancer Genome Atlas using TCGAbiolinks. The majority of the dataframes are filled with NA's also. If interested you can see the dataset for yourself by running the following code:

library(TCGAbiolinks)
query_TCGA = GDCquery(
   project = "TCGA-LGG",
   data.category = "Transcriptome Profiling",
   experimental.strategy = "RNA-Seq",
   workflow.type = "HTSeq - Counts")

GDCdownload(query = query_TCGA)
data <- GDCprepare(query = query_TCGA)

str(head(colData(data)$treatments))
head(colData(data)$treatments)

ADD REPLY • link 2.0 years ago BioInfoBeginnerrr • 0