Question

RNAseq analysis DeSeq2 strange MAplot

0

Entering edit mode

alice.checcucci • 0

@alicecheccucci-20528

Last seen 5.0 years ago

Hi,

I am using DESeq2 for DEG analysis in my RNA-Seq experiment. I have 2 bacterial strains: one is the wild type and the other is a mutant (it lacks one plasmid). When I look for differentially expressed genes, plotting the results with MAplot I obtain a strange plot, which seems not to have the "classical" symmetrical shape (like the one that is reported in the tutorial).

Here is the codes:

txi.tx <- tximport(files, type = "salmon", txOut = TRUE)

library("DESeq2")
samples <- data.frame(species = c("2011","2011","2011","delta","delta","delta")) # sample metadata 
ddsTxi <- DESeqDataSetFromTximport(txi.tx, colData = samples, design = ~species) 

res <- t(apply(counts(ddsTxi),1, function(x)by(x, samples$species, sum)))
keep <- res[,2] > 0
ddsTxi <- ddsTxi[keep,]

keep <- rowSums(counts(ddsTxi)) >= 10
ddsTxi <- ddsTxi[keep,]

ddsTxi$species<- factor(ddsTxi$species, levels = c("2011","delta"))

ddsTxi <- DESeq(ddsTxi, fitType = "local")
res <- lfcShrink(ddsTxi, coef = "species_delta_vs_2011",  type = "normal")

plotMA(res, ylim=c(-5,5))

http://i67.tinypic.com/9jhqph.png MAplot

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.18.1              SummarizedExperiment_1.8.1 DelayedArray_0.4.1         matrixStats_0.54.0        
 [5] Biobase_2.38.0             GenomicRanges_1.30.3       GenomeInfoDb_1.14.0        Biostrings_2.46.0         
 [9] XVector_0.18.0             IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       
[13] tximport_1.6.0             BiocInstaller_1.28.0       ggplot2_3.1.0              phyloseq_1.22.3           

loaded via a namespace (and not attached):
 [1] jsonlite_1.6           bit64_0.9-7            splines_3.4.3          foreach_1.4.4          Formula_1.2-3         
 [6] latticeExtra_0.6-28    blob_1.1.0             GenomeInfoDbData_1.0.0 pillar_1.3.0           RSQLite_2.0           
[11] backports_1.1.2        lattice_0.20-35        digest_0.6.18          RColorBrewer_1.1-2     checkmate_1.9.1       
[16] colorspace_1.3-2       htmltools_0.3.6        Matrix_1.2-12          plyr_1.8.4             pkgconfig_2.0.1       
[21] XML_3.98-1.19          genefilter_1.60.0      zlibbioc_1.24.0        xtable_1.8-3           scales_1.0.0          
[26] BiocParallel_1.12.0    htmlTable_1.13.1       tibble_1.4.2           annotate_1.56.2        mgcv_1.8-23           
[31] withr_2.1.2            nnet_7.3-12            lazyeval_0.2.1         survival_2.41-3        magrittr_1.5          
[36] crayon_1.3.4           memoise_1.1.0          nlme_3.1-131           MASS_7.3-48            foreign_0.8-69        
[41] vegan_2.5-3            tools_3.4.3            data.table_1.11.8      stringr_1.3.1          munsell_0.5.0         
[46] locfit_1.5-9.1         cluster_2.0.6          AnnotationDbi_1.40.0   ade4_1.7-13            compiler_3.4.3        
[51] rlang_0.3.0.1          rhdf5_2.22.0           grid_3.4.3             RCurl_1.95-4.11        biomformat_1.6.0      
[56] iterators_1.0.10       rstudioapi_0.8         htmlwidgets_1.3        igraph_1.2.2           bitops_1.0-6          
[61] base64enc_0.1-3        multtest_2.34.0        gtable_0.2.0           codetools_0.2-15       DBI_0.7               
[66] reshape2_1.4.3         gridExtra_2.3          knitr_1.22             bit_1.1-12             Hmisc_4.2-0           
[71] permute_0.9-4          ape_5.2                stringi_1.2.4          Rcpp_1.0.0             geneplotter_1.56.0    
[76] rpart_4.1-12           acepack_1.4.1          xfun_0.6

Thank you in advance! Alice

deseq2 • 1.0k views

ADD COMMENT • link updated 5.0 years ago by Michael Love 41k • written 5.0 years ago by alice.checcucci • 0

score 0 · Answer 1 · 2019-04-15

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Can you describe your samples? How many are there? Is the sequencing depth even across samples?

ADD COMMENT • link 5.0 years ago Michael Love 41k

0

Entering edit mode

The samples are 6 (3 for the wild type and 3 for the mutant). The sequencing depth is even across samples.

> colSums(counts(ddsTxi))
[1] 5030644 3461589 5517470 5246512 1681959 4651938
> cbind(samples, colSums(counts(ddsTxi)))
  species colSums(counts(ddsTxi))
1    2011                 5030644
2    2011                 3461589
3    2011                 5517470
4   delta                 5246512
5   delta                 1681959
6   delta                 4651938

ADD REPLY • link 5.0 years ago alice.checcucci • 0

0

Entering edit mode

I'm not sure what's going on. It looks like there are global changes to gene expression distribution, and I don't trust the in silico normalization approaches here. Can you show the boxplots for log normalized counts?

ADD REPLY • link 5.0 years ago Michael Love 41k

0

Entering edit mode

Thank you.. here the boxplots enter image description here

 vsd <- vst(ddsTxi, blind=FALSE)
    rld <- rlog(ddsTxi, blind=FALSE)
    head(assay(vsd), 3)

    ntd <- normTransform(ddsTxi)
    source("https://bioconductor.org/biocLite.R")
    biocLite("vsn")
    library("vsn")
    meanSdPlot(assay(ntd))
    meanSdPlot(assay(vsd))
    meanSdPlot(assay(rld))

ADD REPLY • link 5.0 years ago alice.checcucci • 0

0

Entering edit mode

These aren't what I was looking for, but this does indicate a problem. There seem to be massive differences across groups for the smallest count genes. Is the bioinformatic pipeline for these samples identical? I might suggest you collaborate with a bioinformatician who can figure out why you have systematic differences for many low count genes, which are not present for high count genes. In the meantime, I wouldn't trust putting this data through any standard DE tool.

ADD REPLY • link 5.0 years ago Michael Love 41k

0

Entering edit mode

Sorry if these are not what you were asking for. Can I do some histograms or boxplots to help us to understand what was going on? Unfurtunately, at the moment there is not a proper bioinformatician who can help me to figure out the problem occurred. And yes, the bioinformatic pipeline for these samples was completely identical.

ADD REPLY • link 5.0 years ago alice.checcucci • 0

0

Entering edit mode

I unfortunately don't have much time right now to look over the counts, but what I would do is to see what low count genes have these very large differences and figure out if they are plausibly biological or there is some technical issue.

As far as finding someone locally who may help, if you are located at a university, you can often google "[university name] bioinformatics", and this tends to pull up at least one lab or group. You can try to email one of the PIs or postdocs, and maybe they could sit down with you for 30 minutes to take a look at your data.