Question

DESeq2: Is it possible to have incomplete pairs of samples?

0

Entering edit mode

microbe • 0

@microbe-22477

Last seen 4.9 years ago

To the developers of DESeq2,

First of all, thank you for creating such a great tool!

I am currently trying to analyze 16S rRNA gene amplicon sequencing data of samples which were taken from the oral microbiota of patients at two timepoints: before and after treatment.

My colData is as below:

  Group  ParticipantID
1 BEFORE Participant_ONE
2 BEFORE Participant_TWO
3 BEFORE Participant_THREE
4 BEFORE Participant_FOUR
5 AFTER  Participant_ONE
6 AFTER  Participant_TWO
7 AFTER  Participant_THREE

My planned analysis is as below

dds <- DESeqDataSetFromMatrix(count, colData, design = ~ ParticipantID + Group)
dds <- DESeq(dds, fitType = "local")
res <- results(dds, contrast = c("Group", "BEFORE", "AFTER"))

I noticed that in a previous post you mentioned that "samples without pair doesn't help you estimate the treatment effect".

However, because the sample size of my study is quite small, I would like to keep the non-paired sample (sample belonging to Participant_FOUR) in the analysis instead of removing them completely.

If you could please give some idea on whether this makes sense or not in the current analysis, that would be great.

P.S. Here is the sessionInfo() if needed:

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /usr/appli/freeware/OpenBLAS/0.3.6.gcc.THREAD/lib/libopenblas_haswellp-r0.3.6.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] biom_0.3.12                DESeq2_1.16.1
 [3] SummarizedExperiment_1.6.5 DelayedArray_0.2.7
 [5] matrixStats_0.54.0         Biobase_2.36.2
 [7] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3
 [9] IRanges_2.10.5             S4Vectors_0.14.7
[11] BiocGenerics_0.22.1        optparse_1.6.0

loaded via a namespace (and not attached):
 [1] bit64_0.9-7             splines_3.4.0           Formula_1.2-3
 [4] assertthat_0.2.1        latticeExtra_0.6-28     blob_1.1.1
 [7] GenomeInfoDbData_0.99.0 pillar_1.4.1            RSQLite_2.1.1
[10] backports_1.1.2         lattice_0.20-35         glue_1.3.1
[13] digest_0.6.15           RColorBrewer_1.1-2      XVector_0.16.0
[16] checkmate_1.9.3         colorspace_1.3-2        htmltools_0.3.6
[19] Matrix_1.2-17           plyr_1.8.4              XML_3.98-1.11
[22] pkgconfig_2.0.2         genefilter_1.58.1       zlibbioc_1.22.0
[25] purrr_0.3.2             xtable_1.8-2            scales_1.0.0
[28] getopt_1.20.2           BiocParallel_1.10.1     htmlTable_1.13.1
[31] tibble_2.1.1            annotate_1.54.0         ggplot2_3.1.1
[34] nnet_7.3-12             lazyeval_0.2.1          RJSONIO_1.3-0
[37] survival_2.42-3         magrittr_1.5            crayon_1.3.4
[40] memoise_1.1.0           foreign_0.8-70          tools_3.4.0
[43] data.table_1.12.2       stringr_1.4.0           locfit_1.5-9.1
[46] munsell_0.5.0           cluster_2.0.7-1         AnnotationDbi_1.38.2
[49] compiler_3.4.0          rlang_0.3.4             grid_3.4.0
[52] RCurl_1.95-4.10         rstudioapi_0.10         htmlwidgets_1.3
[55] bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0
[58] DBI_1.0.0               R6_2.4.0                gridExtra_2.3
[61] knitr_1.23              dplyr_0.8.1             bit_1.1-14
[64] Hmisc_4.1-1             stringi_1.4.3           Rcpp_1.0.1
[67] geneplotter_1.54.0      rpart_4.1-13            acepack_1.4.1
[70] tidyselect_0.2.5        xfun_0.7

deseq2 • 562 views

ADD COMMENT • link updated 4.9 years ago by James W. MacDonald 67k • written 4.9 years ago by microbe • 0

score 0 · Answer 1 · 2019-12-02

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

It's not clear what you are asking. You know that having unpaired samples doesn't help, and yet you want to keep them in the analysis. While I understand your desire, that alone doesn't affect the underlying issue. You can the unpaired samples in, or remove them at your discretion, but it won't really change anything.

Also, that's a really old version of R and Bioconductor. You should update.

ADD COMMENT • link 4.9 years ago James W. MacDonald 67k

0

Entering edit mode

Dear James,

Thank you for your reply and sorry for my question being unclear.

What I should have asked was "if I have an unpaired sample, would the other paired samples still be tested for the effect of treatment (the second factor, i.e. "Group") and be controlled for the effect of participant (the first factor, i.e. "ParticipantID")?"

Also, if the answer is "NO", would it still be acceptable to analyze my sample with DESeq2 using a single factor design such as "design = ~ Group"?

Many thanks in advance for your patience.

Sincerely, Ken

ADD REPLY • link 4.9 years ago microbe • 0

0

Entering edit mode

If you want to include the unpaired sample, why not use duplicateCorrelation as suggested in the link?

ADD REPLY • link 4.9 years ago Michael Love 42k

0

Entering edit mode

Dear Michael,

Thank you for your reply!

Please correct me if I missed something, but duplicateCorrelation() is a function in limma and I need to work with DESeq2 for the time being for various reasons.

If it is not acceptable to keep the unpaired sample in my dataset with "design ~ ParticipantID + Group", would it be appropriate to use "design ~ Group" instead?

Much appreciation for your patience with me in advance.

Sincerely, Ken

ADD REPLY • link 4.9 years ago microbe • 0

0

Entering edit mode

In DESeq2 the unpaired sample won’t be used to estimate treatment unless you drop the pairing variable from the design.

ADD REPLY • link 4.9 years ago Michael Love 42k

0

Entering edit mode

Dear Michael,

I now understand the point I was missing! Thanks for making it clear to me the reason why keeping unpaired samples would not make any difference in the current design.

Please allow me ask one last question:

Because my sample size is quite limited, I am thinking of dropping the pairing variable (i.e. Group), so that all of my samples can be tested for the effect of treatment, but would it be wrong to do so?

Many thanks for your kind help.

Sincerely, Ken

ADD REPLY • link 4.9 years ago microbe • 0

0

Entering edit mode

I'd say the optimal method would be to use duplicateCorrelation in limma, which figures out the degree to which the samples are correlated according to pair by looking across all genes. You could also diagnose this by eye using plotPCA. If the samples are highly correlated and close in the PCA plot, relative to the difference due to treatment, then you're better off dropping the unpaired sample in DESeq2.

ADD REPLY • link 4.9 years ago Michael Love 42k