Entering edit mode
I saved 'pcaData' as a data frame for future use. I ran the following to get vector 'percentVar', however, it is empty. Do I need the data in a different format to extract percent variance?
> pcaData <- plotPCA(rld, intgroup="Groups", returnData=TRUE)
> percentVar <- round(100 * attr(pcaData, "percentVar"))
> percentVar
numeric(0)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
Matrix products: default
BLAS: /usr/lib/libblas.so.3.11.0
LAPACK: /usr/lib/liblapack.so.3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Chicago
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] lubridate_1.9.2 forcats_1.0.0
[3] stringr_1.5.0 dplyr_1.1.2
[5] purrr_1.0.2 readr_2.1.4
[7] tidyr_1.3.0 tibble_3.2.1
[9] ggplot2_3.4.3 tidyverse_2.0.0
[11] DESeq2_1.40.2 SummarizedExperiment_1.30.2
[13] Biobase_2.60.0 MatrixGenerics_1.12.3
[15] matrixStats_1.0.0 GenomicRanges_1.52.0
[17] GenomeInfoDb_1.36.1 IRanges_2.34.1
[19] S4Vectors_0.38.1 BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] gtable_0.3.3 lattice_0.21-8 tzdb_0.4.0
[4] vctrs_0.6.3 tools_4.3.1 bitops_1.0-7
[7] generics_0.1.3 parallel_4.3.1 fansi_1.0.4
[10] pkgconfig_2.0.3 Matrix_1.5-4.1 lifecycle_1.0.3
[13] GenomeInfoDbData_1.2.10 farver_2.1.1 compiler_4.3.1
[16] munsell_0.5.0 codetools_0.2-19 RCurl_1.98-1.12
[19] pillar_1.9.0 crayon_1.5.2 BiocParallel_1.34.2
[22] DelayedArray_0.26.7 abind_1.4-5 tidyselect_1.2.0
[25] locfit_1.5-9.8 stringi_1.7.12 labeling_0.4.2
[28] grid_4.3.1 colorspace_2.1-0 cli_3.6.1
[31] magrittr_2.0.3 S4Arrays_1.0.5 utf8_1.2.3
[34] withr_2.5.0 scales_1.2.1 bit64_4.0.5
[37] timechange_0.2.0 XVector_0.40.0 bit_4.0.5
[40] hms_1.1.3 rlang_1.1.1 Rcpp_1.0.11
[43] glue_1.6.2 vroom_1.6.3 R6_2.5.1
[46] zlibbioc_1.46.0
This should have been a comment. I have moved it.
Please accept my apologies if I am a little naive. From your explanation, I assume that the 'returnData=TRUE' shall give me a data frame that should have 'PC1' 'PC2' and 'percentVar' as numeric and 'condition' and 'sample names' as characters. However, the 'returnData=TRUE' provided everything except 'percentVar'. While I can go back to perform the analyses using DESeq2, I wonder if I can use some code that can extract 'percentVar' from the 'PC1' and 'PC2' data that I already have in the data frame. After all I am able to plot the PC1 and PC2 data using ggplot2 but without the percent variation.
That assumption is wrong. The data frame returned by returnData = TRUE does not contain percentVar as a column. It is there in the object, but it is hidden, and you get it with the attr function.
This is good to know that percentVar is there but hidden and the attr() should extract it. In my case the attr() is not extracting percentVar.
No need for any apologies. When you use my example code, do you get the percentVar via the attr funcrion? I am just wondering whether this 'glitch' you encounter is reproducible on your machine because on mine it is not.
The example code you provided is returning 'percentVar'. So, what do you think is wrong with my data frame that I saved while I was analyzing my own dataset? I had wrote it with 'write.csv'. I will appreciate your help.
If the made-on-the-spot data works, and yours does not, then something is wrong with your input data.
Yes, I agree that there is something wrong with my input data but I need a little help to find out what is causing this problem. I ran the following code to make-on-the-spot data. Then I wrote a '.csv' file with write_csv(). When I imported the input data back to R, the 'attr' function returned "NULL". I don't know what causes this behavior and I would greatly appreciate any help on a better way to save the data frame and import it back to R without losing the attribute "percentVar".
If you write the data out and then read it back in, the attributes of the original object are lost.
For example
Which brings to the 'original' ethos of R, which is that the code is real and the output is not. In other words, there is little benefit to saving things as CSV files or even .RDS or .RData files unless you have long running code that takes forever, in which case you should cache that as part of your .Rmd file (you are using .Rmd files, no?), like this.
And then you can just
render
your .Rmd file as needed, and you will get all the stuff you need without having to drop all this excess cruft in your working directory.If you do caching then I encourage approaches that actually invalidate the cache when something changes upstream, like
xfun::cache_rds
or at least put an option to rerun cached chunks if needed. With above code you could be using legacy objects even if upstream changes completely. +1 for RmdThis is great stuff and thank you for sharing your knowledge.
This clears everything. Thank you for your time and help.