Entering edit mode
alice.checcucci
•
0
@alicecheccucci-20528
Last seen 5.0 years ago
Hi,
I am using DESeq2 for DEG analysis in my RNA-Seq experiment. I have 2 bacterial strains: one is the wild type and the other is a mutant (it lacks one plasmid). When I look for differentially expressed genes, plotting the results with MAplot I obtain a strange plot, which seems not to have the "classical" symmetrical shape (like the one that is reported in the tutorial).
Here is the codes:
txi.tx <- tximport(files, type = "salmon", txOut = TRUE)
library("DESeq2")
samples <- data.frame(species = c("2011","2011","2011","delta","delta","delta")) # sample metadata
ddsTxi <- DESeqDataSetFromTximport(txi.tx, colData = samples, design = ~species)
res <- t(apply(counts(ddsTxi),1, function(x)by(x, samples$species, sum)))
keep <- res[,2] > 0
ddsTxi <- ddsTxi[keep,]
keep <- rowSums(counts(ddsTxi)) >= 10
ddsTxi <- ddsTxi[keep,]
ddsTxi$species<- factor(ddsTxi$species, levels = c("2011","delta"))
ddsTxi <- DESeq(ddsTxi, fitType = "local")
res <- lfcShrink(ddsTxi, coef = "species_delta_vs_2011", type = "normal")
plotMA(res, ylim=c(-5,5))
http://i67.tinypic.com/9jhqph.png
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 LC_COLLATE=it_IT.UTF-8
[5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8 LC_PAPER=it_IT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.18.1 SummarizedExperiment_1.8.1 DelayedArray_0.4.1 matrixStats_0.54.0
[5] Biobase_2.38.0 GenomicRanges_1.30.3 GenomeInfoDb_1.14.0 Biostrings_2.46.0
[9] XVector_0.18.0 IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
[13] tximport_1.6.0 BiocInstaller_1.28.0 ggplot2_3.1.0 phyloseq_1.22.3
loaded via a namespace (and not attached):
[1] jsonlite_1.6 bit64_0.9-7 splines_3.4.3 foreach_1.4.4 Formula_1.2-3
[6] latticeExtra_0.6-28 blob_1.1.0 GenomeInfoDbData_1.0.0 pillar_1.3.0 RSQLite_2.0
[11] backports_1.1.2 lattice_0.20-35 digest_0.6.18 RColorBrewer_1.1-2 checkmate_1.9.1
[16] colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-12 plyr_1.8.4 pkgconfig_2.0.1
[21] XML_3.98-1.19 genefilter_1.60.0 zlibbioc_1.24.0 xtable_1.8-3 scales_1.0.0
[26] BiocParallel_1.12.0 htmlTable_1.13.1 tibble_1.4.2 annotate_1.56.2 mgcv_1.8-23
[31] withr_2.1.2 nnet_7.3-12 lazyeval_0.2.1 survival_2.41-3 magrittr_1.5
[36] crayon_1.3.4 memoise_1.1.0 nlme_3.1-131 MASS_7.3-48 foreign_0.8-69
[41] vegan_2.5-3 tools_3.4.3 data.table_1.11.8 stringr_1.3.1 munsell_0.5.0
[46] locfit_1.5-9.1 cluster_2.0.6 AnnotationDbi_1.40.0 ade4_1.7-13 compiler_3.4.3
[51] rlang_0.3.0.1 rhdf5_2.22.0 grid_3.4.3 RCurl_1.95-4.11 biomformat_1.6.0
[56] iterators_1.0.10 rstudioapi_0.8 htmlwidgets_1.3 igraph_1.2.2 bitops_1.0-6
[61] base64enc_0.1-3 multtest_2.34.0 gtable_0.2.0 codetools_0.2-15 DBI_0.7
[66] reshape2_1.4.3 gridExtra_2.3 knitr_1.22 bit_1.1-12 Hmisc_4.2-0
[71] permute_0.9-4 ape_5.2 stringi_1.2.4 Rcpp_1.0.0 geneplotter_1.56.0
[76] rpart_4.1-12 acepack_1.4.1 xfun_0.6
Thank you in advance! Alice
The samples are 6 (3 for the wild type and 3 for the mutant). The sequencing depth is even across samples.
I'm not sure what's going on. It looks like there are global changes to gene expression distribution, and I don't trust the in silico normalization approaches here. Can you show the boxplots for log normalized counts?
Thank you.. here the boxplots
These aren't what I was looking for, but this does indicate a problem. There seem to be massive differences across groups for the smallest count genes. Is the bioinformatic pipeline for these samples identical? I might suggest you collaborate with a bioinformatician who can figure out why you have systematic differences for many low count genes, which are not present for high count genes. In the meantime, I wouldn't trust putting this data through any standard DE tool.
Sorry if these are not what you were asking for. Can I do some histograms or boxplots to help us to understand what was going on? Unfurtunately, at the moment there is not a proper bioinformatician who can help me to figure out the problem occurred. And yes, the bioinformatic pipeline for these samples was completely identical.
I unfortunately don't have much time right now to look over the counts, but what I would do is to see what low count genes have these very large differences and figure out if they are plausibly biological or there is some technical issue.
As far as finding someone locally who may help, if you are located at a university, you can often google "[university name] bioinformatics", and this tends to pull up at least one lab or group. You can try to email one of the PIs or postdocs, and maybe they could sit down with you for 30 minutes to take a look at your data.
Ok, thank you