Hello Bioconductor comunity I'm working on an RNA-seq project where we have the following experimental setup:
View(annotation_matrix) Sample_name condition individual batch guard_cell_1 guard_cell 1 smart-seq guard_cell_2 guard_cell 2 smart-seq guard_cell_3 guard_cell 3 smart-seq meso_1 meso 1 smart-seq meso_2 meso 2 smart-seq meso_3 meso 3 smart-seq leaves_1 leaves 1 smart-seq leaves_2 leaves 2 smart-seq totalptt_1 totalptt 1 smart-seq totalptt_2 totalptt 2 smart-seq totalptt_3 totalptt 3 smart-seq P1F1Ext F1Ext 4 True-seq P2F1Ext F1Ext 5 True-seq P3F1Ext F1Ext 6 True-seq P1F1Int F1Int 10 True-seq P2F1Int F1Int 11 True-seq P3F1Int F1Int 12 True-seq P1F4Ext F4Ext 7 True-seq P2F4Ext F4Ext 8 True-seq P3F4Ext F4Ext 9 True-seq P1F4Int F4Int 13 True-seq P2F4Int F4Int 14 True-seq P3F4Int F4Int 15 True-seq
My group is interested in the DEG (particularly the upregulated genes) in: a) guard_cells (compared with meso) b) F1Ext (compared with F1Int) c) F4Ext (compared with F4Int)
For this goal I made 3 DESeq objects, each using 6 samples (guard_cell + meso, F1Ext + F1Int, F4Ext + F4Int) and performed each comparison individually. In addition, I also filtered the gene expression matrix of F1 and F4 to only include genes expressed in guard_cells. Below is the code I used to analyze the first comparison (the second and third were created in a similar way, with the exception of the inclusion of "individual" in the experimental design)
featureCounts_matrix <- read.table(file = "_featureCounts/quant_featureCounts.tsv", row.names = 1, header =T) levels <- c("meso","guard_cell") meso_guard_expression_matrix <- featureCounts_matrix[,c(6:11)] meso_guard_annotation <- annotation_matrix[c(1:6),] dd <- DESeqDataSetFromMatrix(countData = meso_guard_expression_matrix, colData = meso_guard_annotation, design = ~ individual + condition) colData(dd)$condition <- factor(colData(dd)$condition, levels = levels) #filter 0 counts rows dd <- dd[rowSums(counts(dd)) > 0, ] #perform analysis dds <- DESeq(dd) #Get DGE res <- results(dds, alpha = 0.05) #shrink the log2fold change using apeglm package (default) resLFC <- lfcShrink(dds, res = res, coef="condition_guard_vs_meso", type="apeglm") #filter the results padj.cutoff <- 0.05 lfc.cutoff <- 1 sig <- subset(data.frame(resLFC), padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff) sig <- sig[order(sig$padj),]
Regarding my approach I have two questions:
1) Should I load all the data of my experiment in a single DESeq2 object and get my DEG using
res(dds, contrast = (factor, numerator, denominator)) for each of the comparisons I want to make? And in that case, should I include batch in the design since I do not have samples from the same tissue in different batches (in addition I also performed a HC which suggested the batch effects do not play a large role).
2) Is it correct to perform log-fold change shrinkage and then filter the results based on a log-fold change thresholdd? The tutorials only mentioned plotting and ranking (the latter is also relevant to me).
Thanks in advance
sessionInfo( ) R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041) Matrix products: default locale:  LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252  LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C  LC_TIME=English_United Kingdom.1252 attached base packages:  parallel stats4 stats graphics grDevices utils datasets methods base other attached packages:  apeglm_1.10.0 RColorBrewer_1.1-2 EnhancedVolcano_1.6.0  ggrepel_0.9.1 ggplot2_3.3.3 pheatmap_1.0.12  DESeq2_1.28.1 SummarizedExperiment_1.18.2 DelayedArray_0.14.1  matrixStats_0.56.0 Biobase_2.48.0 GenomicRanges_1.40.0  GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1  BiocGenerics_0.34.0 loaded via a namespace (and not attached):  bdsmatrix_1.3-4 Rcpp_1.0.5 locfit_1.5-9.4 mvtnorm_1.1-1  lattice_0.20-41 digest_0.6.25 utf8_1.1.4 plyr_1.8.6  R6_2.5.0 emdbook_1.3.12 coda_0.19-4 RSQLite_2.2.0  pillar_1.5.1 zlibbioc_1.34.0 rlang_0.4.10 rstudioapi_0.13  annotate_1.66.0 blob_1.2.1 Matrix_1.2-18 bbmle_188.8.131.52  labeling_0.4.2 splines_4.0.2 BiocParallel_1.22.0 geneplotter_1.66.0  RCurl_1.98-1.2 bit_4.0.4 munsell_0.5.0 numDeriv_2016.8-1.1  compiler_4.0.2 pkgconfig_2.0.3 tidyselect_1.1.0 tibble_3.0.3  GenomeInfoDbData_1.2.3 XML_3.99-0.5 fansi_0.4.2 crayon_1.4.1  dplyr_1.0.2 withr_2.4.1 MASS_7.3-51.6 bitops_1.0-6  grid_4.0.2 xtable_1.8-4 gtable_0.3.0 lifecycle_1.0.0  DBI_1.1.1 magrittr_2.0.1 scales_1.1.1 cachem_1.0.4  farver_2.1.0 XVector_0.28.0 genefilter_1.70.0 ellipsis_0.3.1  vctrs_0.3.4 generics_0.1.0 tools_4.0.2 bit64_4.0.5  glue_1.4.2 purrr_0.3.4 fastmap_1.1.0 survival_3.2-3  AnnotationDbi_1.50.3 colorspace_1.4-1 memoise_2.0.0