Questions regarding DESeq2 experimental setup and lfcShrink
1
0
Entering edit mode
zroger499 • 0
@zroger499-23414
Last seen 9 months ago
Portugal

Hello Bioconductor comunity I'm working on an RNA-seq project where we have the following experimental setup:

View(annotation_matrix)

Sample_name condition   individual  batch
guard_cell_1    guard_cell  1   smart-seq
guard_cell_2    guard_cell  2   smart-seq
guard_cell_3    guard_cell  3   smart-seq
meso_1  meso    1   smart-seq
meso_2  meso    2   smart-seq
meso_3  meso    3   smart-seq
leaves_1    leaves  1   smart-seq
leaves_2    leaves  2   smart-seq
totalptt_1  totalptt    1   smart-seq
totalptt_2  totalptt    2   smart-seq
totalptt_3  totalptt    3   smart-seq
P1F1Ext F1Ext   4   True-seq
P2F1Ext F1Ext   5   True-seq
P3F1Ext F1Ext   6   True-seq
P1F1Int F1Int   10  True-seq
P2F1Int F1Int   11  True-seq
P3F1Int F1Int   12  True-seq
P1F4Ext F4Ext   7   True-seq
P2F4Ext F4Ext   8   True-seq
P3F4Ext F4Ext   9   True-seq
P1F4Int F4Int   13  True-seq
P2F4Int F4Int   14  True-seq
P3F4Int F4Int   15  True-seq

My group is interested in the DEG (particularly the upregulated genes) in: a) guard_cells (compared with meso) b) F1Ext (compared with F1Int) c) F4Ext (compared with F4Int)

For this goal I made 3 DESeq objects, each using 6 samples (guard_cell + meso, F1Ext + F1Int, F4Ext + F4Int) and performed each comparison individually. In addition, I also filtered the gene expression matrix of F1 and F4 to only include genes expressed in guard_cells. Below is the code I used to analyze the first comparison (the second and third were created in a similar way, with the exception of the inclusion of "individual" in the experimental design)

featureCounts_matrix <- read.table(file = "_featureCounts/quant_featureCounts.tsv", row.names = 1, header =T)
levels <- c("meso","guard_cell")
meso_guard_expression_matrix <- featureCounts_matrix[,c(6:11)]
meso_guard_annotation <- annotation_matrix[c(1:6),]
dd <- DESeqDataSetFromMatrix(countData = meso_guard_expression_matrix, 
                             colData = meso_guard_annotation, 
                             design = ~ individual + condition)

colData(dd)$condition <- factor(colData(dd)$condition, levels = levels)

#filter 0 counts rows
dd <- dd[rowSums(counts(dd)) > 0, ]

#perform analysis
dds <- DESeq(dd)

#Get DGE
res <- results(dds, alpha = 0.05)
#shrink the log2fold change using apeglm package (default)
resLFC <- lfcShrink(dds, res = res, coef="condition_guard_vs_meso", type="apeglm")

#filter the results 
padj.cutoff <- 0.05
lfc.cutoff <- 1 

sig <- subset(data.frame(resLFC), padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
sig <- sig[order(sig$padj),]

Regarding my approach I have two questions:

1) Should I load all the data of my experiment in a single DESeq2 object and get my DEG using res(dds, contrast = (factor, numerator, denominator)) for each of the comparisons I want to make? And in that case, should I include batch in the design since I do not have samples from the same tissue in different batches (in addition I also performed a HC which suggested the batch effects do not play a large role).

2) Is it correct to perform log-fold change shrinkage and then filter the results based on a log-fold change thresholdd? The tutorials only mentioned plotting and ranking (the latter is also relevant to me).

Thanks in advance

sessionInfo( )


R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] apeglm_1.10.0               RColorBrewer_1.1-2          EnhancedVolcano_1.6.0      
 [4] ggrepel_0.9.1               ggplot2_3.3.3               pheatmap_1.0.12            
 [7] DESeq2_1.28.1               SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
[10] matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0       
[13] GenomeInfoDb_1.24.2         IRanges_2.22.2              S4Vectors_0.26.1           
[16] BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] bdsmatrix_1.3-4        Rcpp_1.0.5             locfit_1.5-9.4         mvtnorm_1.1-1         
 [5] lattice_0.20-41        digest_0.6.25          utf8_1.1.4             plyr_1.8.6            
 [9] R6_2.5.0               emdbook_1.3.12         coda_0.19-4            RSQLite_2.2.0         
[13] pillar_1.5.1           zlibbioc_1.34.0        rlang_0.4.10           rstudioapi_0.13       
[17] annotate_1.66.0        blob_1.2.1             Matrix_1.2-18          bbmle_1.0.23.1        
[21] labeling_0.4.2         splines_4.0.2          BiocParallel_1.22.0    geneplotter_1.66.0    
[25] RCurl_1.98-1.2         bit_4.0.4              munsell_0.5.0          numDeriv_2016.8-1.1   
[29] compiler_4.0.2         pkgconfig_2.0.3        tidyselect_1.1.0       tibble_3.0.3          
[33] GenomeInfoDbData_1.2.3 XML_3.99-0.5           fansi_0.4.2            crayon_1.4.1          
[37] dplyr_1.0.2            withr_2.4.1            MASS_7.3-51.6          bitops_1.0-6          
[41] grid_4.0.2             xtable_1.8-4           gtable_0.3.0           lifecycle_1.0.0       
[45] DBI_1.1.1              magrittr_2.0.1         scales_1.1.1           cachem_1.0.4          
[49] farver_2.1.0           XVector_0.28.0         genefilter_1.70.0      ellipsis_0.3.1        
[53] vctrs_0.3.4            generics_0.1.0         tools_4.0.2            bit64_4.0.5           
[57] glue_1.4.2             purrr_0.3.4            fastmap_1.1.0          survival_3.2-3        
[61] AnnotationDbi_1.50.3   colorspace_1.4-1       memoise_2.0.0
DESeq2 RNA-seq • 996 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Q1 is a FAQ in the vignette. Choice of design I have to leave up to the user, it's worth consulting with a statistician on this.

Q2 we have two ways to introduce an lfcThreshold: in results() it is incorporated into the null hypothesis testing. In lfcShrink it will generate s-values corresponding to aggregate probabilities derived from the posterior distribution per gene (see Zhu et al reference for details).

ADD COMMENT
0
Entering edit mode

I've taken a look at Zhu et al. 2018, but my knowledge in Bayesian statistics is very narrow. If I use the second approach, the lfcThreshold doesn't need to be included as well (and then filter the results with s-value < 0.05)?

resLFC <- lfcShrink(dds, coef="condition_guard_vs_meso", type="apeglm", svalue = T, lfcThreshold = 1)

Notice that in the above command the res argument was removed.

ADD REPLY
0
Entering edit mode

That looks good to me: it will give you genes where you have good evidence of |LFC| > 1.

ADD REPLY

Login before adding your answer.

Traffic: 598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6