Question: DESeq2: Is it possible to have incomplete pairs of samples?
0
11 days ago by
microbe0
microbe0 wrote:

To the developers of DESeq2,

First of all, thank you for creating such a great tool!

I am currently trying to analyze 16S rRNA gene amplicon sequencing data of samples which were taken from the oral microbiota of patients at two timepoints: before and after treatment.

My colData is as below:

  Group  ParticipantID
1 BEFORE Participant_ONE
2 BEFORE Participant_TWO
3 BEFORE Participant_THREE
4 BEFORE Participant_FOUR
5 AFTER  Participant_ONE
6 AFTER  Participant_TWO
7 AFTER  Participant_THREE


My planned analysis is as below

dds <- DESeqDataSetFromMatrix(count, colData, design = ~ ParticipantID + Group)
dds <- DESeq(dds, fitType = "local")
res <- results(dds, contrast = c("Group", "BEFORE", "AFTER"))


I noticed that in a previous post you mentioned that "samples without pair doesn't help you estimate the treatment effect".

However, because the sample size of my study is quite small, I would like to keep the non-paired sample (sample belonging to Participant_FOUR) in the analysis instead of removing them completely.

If you could please give some idea on whether this makes sense or not in the current analysis, that would be great.

P.S. Here is the sessionInfo() if needed:

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] biom_0.3.12                DESeq2_1.16.1
[3] SummarizedExperiment_1.6.5 DelayedArray_0.2.7
[5] matrixStats_0.54.0         Biobase_2.36.2
[7] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3
[9] IRanges_2.10.5             S4Vectors_0.14.7
[11] BiocGenerics_0.22.1        optparse_1.6.0

loaded via a namespace (and not attached):
[1] bit64_0.9-7             splines_3.4.0           Formula_1.2-3
[4] assertthat_0.2.1        latticeExtra_0.6-28     blob_1.1.1
[7] GenomeInfoDbData_0.99.0 pillar_1.4.1            RSQLite_2.1.1
[10] backports_1.1.2         lattice_0.20-35         glue_1.3.1
[13] digest_0.6.15           RColorBrewer_1.1-2      XVector_0.16.0
[16] checkmate_1.9.3         colorspace_1.3-2        htmltools_0.3.6
[19] Matrix_1.2-17           plyr_1.8.4              XML_3.98-1.11
[22] pkgconfig_2.0.2         genefilter_1.58.1       zlibbioc_1.22.0
[25] purrr_0.3.2             xtable_1.8-2            scales_1.0.0
[28] getopt_1.20.2           BiocParallel_1.10.1     htmlTable_1.13.1
[31] tibble_2.1.1            annotate_1.54.0         ggplot2_3.1.1
[34] nnet_7.3-12             lazyeval_0.2.1          RJSONIO_1.3-0
[37] survival_2.42-3         magrittr_1.5            crayon_1.3.4
[40] memoise_1.1.0           foreign_0.8-70          tools_3.4.0
[43] data.table_1.12.2       stringr_1.4.0           locfit_1.5-9.1
[46] munsell_0.5.0           cluster_2.0.7-1         AnnotationDbi_1.38.2
[49] compiler_3.4.0          rlang_0.3.4             grid_3.4.0
[52] RCurl_1.95-4.10         rstudioapi_0.10         htmlwidgets_1.3
[55] bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0
[58] DBI_1.0.0               R6_2.4.0                gridExtra_2.3
[61] knitr_1.23              dplyr_0.8.1             bit_1.1-14
[64] Hmisc_4.1-1             stringi_1.4.3           Rcpp_1.0.1
[67] geneplotter_1.54.0      rpart_4.1-13            acepack_1.4.1
[70] tidyselect_0.2.5        xfun_0.7

deseq2 • 106 views
modified 11 days ago by James W. MacDonald52k • written 11 days ago by microbe0
Answer: DESeq2: Is it possible to have incomplete pairs of samples?
0
11 days ago by
United States
James W. MacDonald52k wrote:

It's not clear what you are asking. You know that having unpaired samples doesn't help, and yet you want to keep them in the analysis. While I understand your desire, that alone doesn't affect the underlying issue. You can the unpaired samples in, or remove them at your discretion, but it won't really change anything.

Also, that's a really old version of R and Bioconductor. You should update.

Dear James,

Thank you for your reply and sorry for my question being unclear.

What I should have asked was "if I have an unpaired sample, would the other paired samples still be tested for the effect of treatment (the second factor, i.e. "Group") and be controlled for the effect of participant (the first factor, i.e. "ParticipantID")?"

Also, if the answer is "NO", would it still be acceptable to analyze my sample with DESeq2 using a single factor design such as "design = ~ Group"?

Sincerely, Ken

If you want to include the unpaired sample, why not use duplicateCorrelation as suggested in the link?

Dear Michael,

Please correct me if I missed something, but duplicateCorrelation() is a function in limma and I need to work with DESeq2 for the time being for various reasons.

If it is not acceptable to keep the unpaired sample in my dataset with "design ~ ParticipantID + Group", would it be appropriate to use "design ~ Group" instead?

Sincerely, Ken

In DESeq2 the unpaired sample won’t be used to estimate treatment unless you drop the pairing variable from the design.

Dear Michael,

I now understand the point I was missing! Thanks for making it clear to me the reason why keeping unpaired samples would not make any difference in the current design.

Because my sample size is quite limited, I am thinking of dropping the pairing variable (i.e. Group), so that all of my samples can be tested for the effect of treatment, but would it be wrong to do so?

Many thanks for your kind help.

Sincerely, Ken