Question

What does an asymmetrical MA plot indicate and how do I know if it means I've made an error?

0

Entering edit mode

chaco001 • 0

@chaco001-22993

Last seen 17 months ago

United States

Hello,

Thanks for this great community, and for the great tool DESeq2! I am new to RNASeq, and don't know whether this MA plot is acceptable. It worries me that it is very asymmetrical, but I lack to expertise to know what that means in terms of problems with my pipeline.

The MA plots are seen at this link: the one on bottom is without shrinking, the one on the top is with apeglm shrinking.

https://imgur.com/a/duBguik

Even after shrinking, it is clear that low-count genes are the ones more likely to be upregulated in my treatment, and high-count genes are more likely to be down-regulated. I can hand-wave a biological explanation for this (a treatment is likely to turn up genes that are off, and turn down genes that are on), but I might be deluding myself.

Here are some more details about my experiment and pipeline.

I am interested in asking whether the expression of a bacterium's genes change when grown in co-culture with another species. Therefore, I grew six flasks of bacteria: three monocultures (the control), and three co-cultures (the treatment). RNA was extracted and reads were obtained with Illumina. Note that the number of reads was the same in controls and treatments, meaning that read depth, on average, should be lower in treatments (because of the two species present).

Since there are two genomes in the treatment case, I aligned the reads in the control and in the treatment to different references: the monoculture-only reference for the control, and a concatenated reference containing both genomes for the treatment. I did the alignment with STAR.

I then got gene counts using htseq-count.

Since I'm focused on just one species, I edited the htseq-count files so they only contained genes of the species that was present in both treatment and control (this is also necessary for DESeq2 which reasonably expects the gene lists to match).

At this point I followed almost exactly the tutorial:

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampletable,
                                       directory = directory,
                                       design= ~ condition)

dds = DESeq(dds)
res = results(dds, alpha = 0.05)

resLFC = lfcShrink(dds,  coef = "condition_coculture_vs_monoculture", type = "apeglm") # vignette suggests apeglm
plotMA(res, ylim = c(-3,3))
plotMA(resLFC, ylim=c(-3,3))

My question is largely as framed above: are these MA plots reasonable, or does their shape suggest I went wrong somewhere? If it does suggest a problem, are there any hints the shape gives as to where my problem resides?

And more broadly, given my experimental design (monoculture vs. coculture), is the pipeline I've stated the best option? I've had colleagues wonder about using a more metagenomics-like pipeline (e.g. spades), but since I've got just two species and good references for them, I chose the STAR approach.

Happy to hear any and all suggestions, thank you!

Session Info:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] ggplot2_3.2.1               pasilla_1.14.0             
 [3] DESeq2_1.26.0               SummarizedExperiment_1.16.1
 [5] DelayedArray_0.12.2         BiocParallel_1.20.1        
 [7] matrixStats_0.55.0          Biobase_2.46.0             
 [9] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[11] IRanges_2.20.2              S4Vectors_0.24.3           
[13] BiocGenerics_0.32.0         dplyr_0.8.4                

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.6.2         
 [3] Formula_1.2-3          assertthat_0.2.1      
 [5] BiocManager_1.30.10    latticeExtra_0.6-29   
 [7] blob_1.2.1             GenomeInfoDbData_1.2.2
 [9] numDeriv_2016.8-1.1    pillar_1.4.3          
[11] RSQLite_2.2.0          backports_1.1.5       
[13] lattice_0.20-40        glue_1.3.1            
[15] bbmle_1.0.23.1         digest_0.6.25         
[17] RColorBrewer_1.1-2     XVector_0.26.0        
[19] checkmate_2.0.0        colorspace_1.4-1      
[21] plyr_1.8.5             htmltools_0.4.0       
[23] Matrix_1.2-18          XML_3.99-0.3          
[25] pkgconfig_2.0.3        emdbook_1.3.12        
[27] genefilter_1.68.0      zlibbioc_1.32.0       
[29] mvtnorm_1.1-0          purrr_0.3.3           
[31] xtable_1.8-4           scales_1.1.0          
[33] apeglm_1.8.0           jpeg_0.1-8.1          
[35] htmlTable_1.13.3       tibble_2.1.3          
[37] annotate_1.64.0        farver_2.0.3          
[39] withr_2.1.2            nnet_7.3-13           
[41] lazyeval_0.2.2         survival_3.1-8        
[43] magrittr_1.5           crayon_1.3.4          
[45] memoise_1.1.0          MASS_7.3-51.5         
[47] foreign_0.8-75         tools_3.6.2           
[49] data.table_1.12.8      lifecycle_0.1.0       
[51] stringr_1.4.0          locfit_1.5-9.1        
[53] munsell_0.5.0          cluster_2.1.0         
[55] AnnotationDbi_1.48.0   compiler_3.6.2        
[57] rlang_0.4.4            grid_3.6.2            
[59] RCurl_1.98-1.1         rstudioapi_0.11       
[61] htmlwidgets_1.5.1      labeling_0.3          
[63] bitops_1.0-6           base64enc_0.1-3       
[65] gtable_0.3.0           DBI_1.1.0             
[67] R6_2.4.1               gridExtra_2.3         
[69] bdsmatrix_1.3-4        knitr_1.28            
[71] bit_1.1-15.2           Hmisc_4.3-1           
[73] stringi_1.4.6          Rcpp_1.0.3            
[75] geneplotter_1.64.0     vctrs_0.2.3           
[77] rpart_4.1-15           acepack_1.4.1         
[79] png_0.1-7              coda_0.19-3           
[81] tidyselect_1.0.0       xfun_0.12

deseq2 plotMA rnaseq community analysis asymmetry • 955 views

ADD COMMENT • link updated 4.4 years ago by Michael Love 42k • written 4.4 years ago by chaco001 • 0

score 0 · Answer 1 · 2020-02-27

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 8 hours ago

United States

low-count genes are the ones more likely to be upregulated in my treatment, and high-count genes are more likely to be down-regulated

This is my first impression as well.

Can you look into the biological annotation of the various groups of genes? Maybe you can validate with literature or additional experiments?

ADD COMMENT • link 4.4 years ago Michael Love 42k