Question

DESeq2 - how many surrogate variables from svaseq, use of lfcshrink

0

Entering edit mode

ja569116 • 0

@08ccc285

Last seen 6 hours ago

United States

Hi,

I have a microRNA dataset, and I have samples from 2 suborders, 5 families, and 7 species. I would like to find differentially expressed microRNAs to then conduct functional enrichment with like gene ontology, or KEGG pathway. I first checked the data for patterns using PCA and clustering, and I first used the normalized counts norm_cetaceans <- counts(res_27_cets, normalized = TRUE)

But later I found that it is suggested to use either VST or rlog transformation so I used VST for the exploratory analysis.

Are exploratory analysis with VST equivalent with the normalized counts from DESeq()?

Also, I checked the VST dataset, and some values are negative, is this okay, expected?

head(counts_VST_19, c(6L, 6L))

135208-DM-Ttru 135209-ET-Ttru 137808-DM-Ttru 137809-ET-Ttru 145459-Ddel 145462-Ddel
hsa-miR-324-3p -3.658031 4.448456 -3.658031 -3.658031 2.9010645 -3.658031
hsa-miR-212-3p -3.658031 -3.658031 -3.658031 -3.658031 -3.6580309 -3.658031
hsa-miR-660-5p 7.108520 6.469499 7.325406 7.607373 8.0828895 7.469345
hsa-miR-670-3p -3.658031 -3.658031 -3.658031 -3.658031 -3.6580309 -3.658031
hsa-miR-876-3p -3.658031 -3.658031 -3.658031 -3.658031 -3.6580309 -3.658031
hsa-miR-382-3p -3.658031 -3.658031 -3.658031 -3.658031 -0.7916052 -3.658031

Also, in the exploratory analysis section, it uses the code: vsd <- vst(dds, blind = FALSE)

Shouldn't one use blind=T?

In the website it says : "In the above function calls, we specified blind = FALSE, which means that differences between cell lines and treatment (the variables in the design) will not contribute to the expected variance-mean trend of the experiment", while on the R ?help states: blind logical, whether to blind the transformation to the experimental design. blind=TRUE should be used for comparing samples in a manner unbiased by prior information on samples, for example to perform sample QA (quality assurance).

In the RNA-seq workflow it mentions using svaseq to identify Removing hidden batch effects. After the exploratory analysis, some samples showed ill-behavior, not clustering with its species, far away in the dendrogram/PCA (3D). After removing those, I did PCA and clustering, and the dataset looked better. I am undecided whether it is beneficial or recommended to check for hidden batch effects. For instance, after removing the weird samples, many microRNAs were removed due to all-zeros. Here are the plots for SVA - using 3 surrogate variables suggested by . Plot of surrogate 1-3 ~ ind/sps Does the data show any pattern to suggest using 3 surrogate variables?

I also wanted to mention that I have 4 species, but I also want to analyse and find DE-microRNAs at the family (3 groups) and suborder level (2). When I add the design ~ Suborder + Family, DESEq mentions that the groups are linear combinations of others. Is this error normal, so that I have to use separate designs?

Finally, as I said, I want to get the DE-mircroRNAs, and do an enrichment analysis but I am confused about how to proceed. I read that one should use the padjust values and rank the microRNAs (genes) by lfcShrink, but my confusing is that lfcShrink also adjusts padjust values. It seems weird that I used the padjust values from results but then use the log2FoldChange from lfcShrink - which is a different "matrix" with different padjust values.

Sorry for the questions, I want to make sure that I am proceeding the right way.

Thanks

sessionInfo( )

R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 26100)

Matrix products: default

locale: 1 LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/New_York tzcode source: internal

attached base packages: 1 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 sva_3.50.0 BiocParallel_1.36.0 mgcv_1.9-3
[4] nlme_3.1-168 SWATH2stats_1.32.1 IHW_1.30.0
[7] genefilter_1.84.0 ggdendro_0.2.0 gridExtra_2.3
[10] tsne_0.1-3.1 glmpca_0.2.0 RColorBrewer_1.1-3
[13] PoiClaClu_1.0.2.1 umap_0.2.10.0 plotly_4.11.0
[16] lubridate_1.9.4 forcats_1.0.1 stringr_1.6.0
[19] dplyr_1.1.4 purrr_1.0.4 readr_2.1.6
[22] tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0
[25] factoextra_1.0.7 FactoMineR_2.12 pheatmap_1.0.13
[28] gplots_3.2.0 scales_1.4.0 ggpubr_0.6.2
[31] ggplot2_4.0.0 DESeq2_1.42.1 SummarizedExperiment_1.32.0 [34] Biobase_2.62.0 MatrixGenerics_1.14.0 matrixStats_1.5.0
[37] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8 IRanges_2.36.0
[40] S4Vectors_0.40.2 BiocGenerics_0.48.1

loaded via 1 splines_4.3.2 later_1.4.4 filelock_1.0.3 bitops_1.0-9
XML_3.99-0.20 lifecycle_1.0.4 rstatix_0.7.3
processx_3.8.6 lattice_0.22-7 MASS_7.3-60
crosstalk_1.2.2 backports_1.5.0 magrittr_2.0.4
yaml_2.3.10 remotes_2.5.0 httpuv_1.6.16
askpass_1.2.1 sessioninfo_1.2.3 pkgbuild_1.4.8
DBI_1.2.3 cowplot_1.2.0 multcomp_1.4-29
pkgload_1.4.1 zlibbioc_1.48.2 RCurl_1.98-1.17
rappdirs_0.3.3 sandwich_3.1-1 GenomeInfoDbData_1.2.11 RSpectra_0.16-2 annotate_1.80.0 codetools_0.2-20
xml2_1.5.1 DT_0.34.0 tidyselect_1.2.1
BiocFileCache_2.10.2 jsonlite_2.0.0 ellipsis_0.3.2
survival_3.8-3 emmeans_2.0.1 bbmle_1.0.25.1
tools_4.3.2 Rcpp_1.1.0 glue_1.8.0
usethis_3.2.1 withr_3.0.2 numDeriv_2016.8-1.1
fastmap_1.2.0 openssl_2.3.4 callr_3.7.6
digest_0.6.37 timechange_0.3.0 R6_2.6.1
estimability_1.5.1 colorspace_2.1-2 gtools_3.9.5
RSQLite_2.4.5 dichromat_2.0-0.1 utf8_1.2.6
ggsci_4.2.0 data.table_1.17.8 prettyunits_1.2.0
htmlwidgets_1.6.4 S4Arrays_1.2.1 scatterplot3d_0.3-44
gtable_0.3.6 blob_1.2.4 S7_0.2.0
htmltools_0.5.8.1 carData_3.0-5 profvis_0.4.0
leaps_3.2 png_0.1-8 rstudioapi_0.17.1
tzdb_0.5.0 coda_0.19-4.1 curl_7.0.0
zoo_1.8-14 cachem_1.1.0 KernSmooth_2.23-26
miniUI_0.1.2 AnnotationDbi_1.64.1 desc_1.4.3
pillar_1.11.1 grid_4.3.2 vctrs_0.6.5
urlchecker_1.0.1 promises_1.5.0 car_3.1-3
xtable_1.8-4 cluster_2.1.8.1 mvtnorm_1.3-3
locfit_1.5-9.12 compiler_4.3.2 rlang_1.1.5
ggsignif_0.6.4 fdrtool_1.2.18 labeling_0.4.3
ps_1.9.1 plyr_1.8.9 fs_1.6.6
viridisLite_0.4.2 Biostrings_2.70.3 lazyeval_0.2.2
Matrix_1.6-4 hms_1.1.4 bit64_4.6.0-1
KEGGREST_1.42.0 shiny_1.12.1 broom_1.0.11

```

sva lfcshrink DESeq2 • 22 views

ADD COMMENT • link updated 1 hour ago by aqeel • 0 • written 7 hours ago by ja569116 • 0

score 0 · Answer 1 · 2025-12-20

0

Entering edit mode

aqeel • 0

@12cc4cde

Last seen 1 hour ago

Pakistan

Discover the pinnacle of Brazilian Jiu Jitsu gear at Dragon GI.

ADD COMMENT • link 1 hour ago aqeel • 0