Question: DESeq2: Is it possible to have incomplete pairs of samples?
gravatar for microbe
11 days ago by
microbe0 wrote:

To the developers of DESeq2,

First of all, thank you for creating such a great tool!

I am currently trying to analyze 16S rRNA gene amplicon sequencing data of samples which were taken from the oral microbiota of patients at two timepoints: before and after treatment.

My colData is as below:

  Group  ParticipantID
1 BEFORE Participant_ONE
2 BEFORE Participant_TWO
3 BEFORE Participant_THREE
4 BEFORE Participant_FOUR
5 AFTER  Participant_ONE
6 AFTER  Participant_TWO
7 AFTER  Participant_THREE

My planned analysis is as below

dds <- DESeqDataSetFromMatrix(count, colData, design = ~ ParticipantID + Group)
dds <- DESeq(dds, fitType = "local")
res <- results(dds, contrast = c("Group", "BEFORE", "AFTER"))

I noticed that in a previous post you mentioned that "samples without pair doesn't help you estimate the treatment effect".

However, because the sample size of my study is quite small, I would like to keep the non-paired sample (sample belonging to Participant_FOUR) in the analysis instead of removing them completely.

If you could please give some idea on whether this makes sense or not in the current analysis, that would be great.

P.S. Here is the sessionInfo() if needed:

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /usr/appli/freeware/OpenBLAS/0.3.6.gcc.THREAD/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] biom_0.3.12                DESeq2_1.16.1
 [3] SummarizedExperiment_1.6.5 DelayedArray_0.2.7
 [5] matrixStats_0.54.0         Biobase_2.36.2
 [7] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3
 [9] IRanges_2.10.5             S4Vectors_0.14.7
[11] BiocGenerics_0.22.1        optparse_1.6.0

loaded via a namespace (and not attached):
 [1] bit64_0.9-7             splines_3.4.0           Formula_1.2-3
 [4] assertthat_0.2.1        latticeExtra_0.6-28     blob_1.1.1
 [7] GenomeInfoDbData_0.99.0 pillar_1.4.1            RSQLite_2.1.1
[10] backports_1.1.2         lattice_0.20-35         glue_1.3.1
[13] digest_0.6.15           RColorBrewer_1.1-2      XVector_0.16.0
[16] checkmate_1.9.3         colorspace_1.3-2        htmltools_0.3.6
[19] Matrix_1.2-17           plyr_1.8.4              XML_3.98-1.11
[22] pkgconfig_2.0.2         genefilter_1.58.1       zlibbioc_1.22.0
[25] purrr_0.3.2             xtable_1.8-2            scales_1.0.0
[28] getopt_1.20.2           BiocParallel_1.10.1     htmlTable_1.13.1
[31] tibble_2.1.1            annotate_1.54.0         ggplot2_3.1.1
[34] nnet_7.3-12             lazyeval_0.2.1          RJSONIO_1.3-0
[37] survival_2.42-3         magrittr_1.5            crayon_1.3.4
[40] memoise_1.1.0           foreign_0.8-70          tools_3.4.0
[43] data.table_1.12.2       stringr_1.4.0           locfit_1.5-9.1
[46] munsell_0.5.0           cluster_2.0.7-1         AnnotationDbi_1.38.2
[49] compiler_3.4.0          rlang_0.3.4             grid_3.4.0
[52] RCurl_1.95-4.10         rstudioapi_0.10         htmlwidgets_1.3
[55] bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0
[58] DBI_1.0.0               R6_2.4.0                gridExtra_2.3
[61] knitr_1.23              dplyr_0.8.1             bit_1.1-14
[64] Hmisc_4.1-1             stringi_1.4.3           Rcpp_1.0.1
[67] geneplotter_1.54.0      rpart_4.1-13            acepack_1.4.1
[70] tidyselect_0.2.5        xfun_0.7
deseq2 • 106 views
ADD COMMENTlink modified 11 days ago by James W. MacDonald52k • written 11 days ago by microbe0
Answer: DESeq2: Is it possible to have incomplete pairs of samples?
gravatar for James W. MacDonald
11 days ago by
United States
James W. MacDonald52k wrote:

It's not clear what you are asking. You know that having unpaired samples doesn't help, and yet you want to keep them in the analysis. While I understand your desire, that alone doesn't affect the underlying issue. You can the unpaired samples in, or remove them at your discretion, but it won't really change anything.

Also, that's a really old version of R and Bioconductor. You should update.

ADD COMMENTlink written 11 days ago by James W. MacDonald52k

Dear James,

Thank you for your reply and sorry for my question being unclear.

What I should have asked was "if I have an unpaired sample, would the other paired samples still be tested for the effect of treatment (the second factor, i.e. "Group") and be controlled for the effect of participant (the first factor, i.e. "ParticipantID")?"

Also, if the answer is "NO", would it still be acceptable to analyze my sample with DESeq2 using a single factor design such as "design = ~ Group"?

Many thanks in advance for your patience.

Sincerely, Ken

ADD REPLYlink written 11 days ago by microbe0

If you want to include the unpaired sample, why not use duplicateCorrelation as suggested in the link?

ADD REPLYlink written 11 days ago by Michael Love26k

Dear Michael,

Thank you for your reply!

Please correct me if I missed something, but duplicateCorrelation() is a function in limma and I need to work with DESeq2 for the time being for various reasons.

If it is not acceptable to keep the unpaired sample in my dataset with "design ~ ParticipantID + Group", would it be appropriate to use "design ~ Group" instead?

Much appreciation for your patience with me in advance.

Sincerely, Ken

ADD REPLYlink written 10 days ago by microbe0

In DESeq2 the unpaired sample won’t be used to estimate treatment unless you drop the pairing variable from the design.

ADD REPLYlink written 10 days ago by Michael Love26k

Dear Michael,

I now understand the point I was missing! Thanks for making it clear to me the reason why keeping unpaired samples would not make any difference in the current design.

Please allow me ask one last question:

Because my sample size is quite limited, I am thinking of dropping the pairing variable (i.e. Group), so that all of my samples can be tested for the effect of treatment, but would it be wrong to do so?

Many thanks for your kind help.

Sincerely, Ken

ADD REPLYlink written 10 days ago by microbe0

I'd say the optimal method would be to use duplicateCorrelation in limma, which figures out the degree to which the samples are correlated according to pair by looking across all genes. You could also diagnose this by eye using plotPCA. If the samples are highly correlated and close in the PCA plot, relative to the difference due to treatment, then you're better off dropping the unpaired sample in DESeq2.

ADD REPLYlink written 10 days ago by Michael Love26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 146 users visited in the last hour