Search
Question: About the results and contrast (DESeq2)
0
14 months ago by
The Rockefeller University. New York, USA.
David Requena20 wrote:

Hello! I want to make a differential expression analysis of some RNAseq data using DESeq2.

Essentially, I have 3 types of tissue: normal, primary tumor and metastasis. These samples were sequenced in 4 groups and I found an evident batch effect (clusters by group processed, indicated in the variable "lib_prep") and also clusters by patient.

This is my data: (sample id, patient, library preparation and sample type)

 Patient ID sample_type lib_prep nested_patient 1 P1N1 normal A A 1 P1P1 primary A A 1 P1M1 metastasis A A 1 P1M2 metastasis A A 1 P1M3 metastasis A A 1 P1M4 metastasis A A 2 P2N1 normal A B 2 P2P1 primary A B 2 P2M1 metastasis A B 2 P2M2 metastasis A B 3 P3N1 normal A C 3 P3P1 primary A C 3 P3M1 metastasis A C 3 P3M2 metastasis A C 4 P4N1 normal B A 4 P4P1 primary B A 4 P4M1 metastasis B A 4 P4M2 metastasis B A 4 P4M3 metastasis B A 5 P5N1 normal B B 5 P5P1 primary B B 6 P6N1 normal C A 6 P6P1 primary C A 6 P6M1 metastasis C A 6 P6M2 metastasis C A 7 P7N1 normal C B 7 P7P1 primary C B 7 P7M1 metastasis C B 8 P8N1 normal C C 8 P8N2 normal C C 8 P8P1 primary C C 8 P8P2 primary C C 8 P8M1 metastasis C C 9 P9N1 normal C D 9 P9P1 primary C D 9 P9M1 metastasis C D 9 P9M2 metastasis C D 10 P10N1 normal D A 10 P10M1 metastasis D A 11 P11N1 normal D B 11 P11M1 metastasis D B 11 P11M2 metastasis D B 12 P12N1 normal D C 12 P12M1 metastasis D C 12 P12M2 metastasis D C

I'm interested in the genes differentially expressed between the metastasis and primary tumor samples, using as base level the normal samples. As patient is collinear with lib_prep, I created a new variable "nested_patient".

So, I made the DESeq object:

ds.deseq <- DESeqDataSetFromMatrix(
countData = countstable.num,
colData = sampledata,
design = ~ sample_type)

colData(ds.deseq)$sample_type <- relevel(colData(ds.deseq)$sample_type, "normal")

I also provided the corresponding model matrix:

mm <- model.matrix(~ lib_prep + lib_prep:nested_patient + sample_type, colData(ds.deseq))
# To remove the columns full of zeros:
mm <- mm[ , colSums(mm) > 0]


And I ran the program as follows:

ds.deseq <- DESeq(ds.deseq, full=mm,
modelMatrixType="standard", betaPrior=FALSE,
parallel = TRUE, BPPARAM=BatchJobsParam(workers = 64))

As result, I obtained:

> resultsNames(ds.deseq)
[1] "Intercept"                 "lib_prepB"                 "lib_prepC"                 "lib_prepD"
[5] "sample_typemetastasis"     "sample_typeprimary"        "lib_prepA.nested_patientB" "lib_prepB.nested_patientB"
[9] "lib_prepC.nested_patientB" "lib_prepD.nested_patientB" "lib_prepA.nested_patientC" "lib_prepC.nested_patientC"
[13] "lib_prepC.nested_patientD" "lib_prepC.nested_patientE"

But I don't know which of these is the comparison that I want. How to select the contrast appropriately to have only the genes differentially expressed in metastasis vs primary tumor, but different than the normal samples, and controlling by library preparation and patient. Could you help me, please?

David

Samples summary:

 ID Library Normal Primary Metastasis 1 A 1 1 4 2 A 1 1 2 3 A 1 1 2 4 B 1 1 3 5 B 1 1 - 6 C 1 1 2 7 C 1 1 1 8 C 2 2 1 9 C 1 1 2 10 D 1 - 1 11 D 1 - 2 12 D 1 - 2

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] checkmate_1.8.3            pheatmap_1.0.8             tsne_0.1-3                 ggrepel_0.6.5
[5] ggplot2_2.2.1              gplots_3.0.1               RColorBrewer_1.1-2         BiocParallel_1.8.2
[9] BatchJobs_1.6              BBmisc_1.11                matrixStats_0.52.2         biomaRt_2.30.0
[13] knitr_1.17                 DESeq2_1.14.1              SummarizedExperiment_1.4.0 Biobase_2.34.0
[17] GenomicRanges_1.26.4       GenomeInfoDb_1.10.3        IRanges_2.8.2              S4Vectors_0.12.2
[21] BiocGenerics_0.20.0

modified 13 months ago • written 14 months ago by David Requena20
2
14 months ago by
Gavin Kelly560
United Kingdom / London / Francis Crick Institute
Gavin Kelly560 wrote:

For the "differentially expressed in metastasis vs primary tumor" you can get that directly by

results(ds.deseq, contrast=list(c("sample_typemetastasis"),  c("sample_typeprimary")))   (approach A)

because the sample_typemetastasis, and sample_typeprimary are relative to normal (that being the baseline level, as it is first alphabetically), so constructing the contrast this way will compare (meta/normal) / (primary/normal), giving meta/primary fold-change (the normals cancelling out).  You can also specify it using the alternative but equivalent

results(ds.deseq, contrast=c("sample_type", "metastasis", "primary"))      (approach B)

Because you've included terms for the factors you want to control for, the coefficients we're using above already adjust for these.

I'm a bit confused about the "but different than the normal samples" requirement.   The differential test above doesn't need to take into account any 'normal controls'.  But if you're really insistent on using the normals in some way, you could look at results(ds.deseq, name="sample_typemetastasis") as those are the met_vs_normal differential genes; similarly for the primaries - and then you could decide to include the meta_vs_primary genes based on there differential pattern in these two auxilliary lists.  But you're doubling up on your statistical errors, so I really wouldn't recommend it.  Instead, I'd recommend labeling your differential genes (from approach A above) by the fold-changes from the 'results(name=...)' approach, so you'd have a list of met_vs_primary significant genes, annotated by the magnitude and direction of log-fold-changes of the cancerous samples when compared to normal.

0
13 months ago by
The Rockefeller University. New York, USA.
David Requena20 wrote:

I get an error with the approach B: "Error in results(ds.deseq, contrast = c("sample_type", "metastasis", "primary"), : only list- and numeric-type contrasts are supported for user-supplied model matrices". So, I used this:

res <- results(ds.deseq, contrast=list("sample_typemetastasis", "sample_typeprimary"),
altHypothesis="greaterAbs", alpha = 0.01, pAdjustMethod = "BH")

Which is equivalent to the approach A.

And, about the comparison: could the normal samples be introducing noise? should be better to exclude them? Because, my main goal is to find differences between metastasis and primary tumors. Or is more convenient to include them? Would this result in a better fit of the coefficients in the model?

David

1

Having the normal samples in the deseq object, even if you're not using them explicitly in the contrast, adds a few extra degrees of freedom better to estimate the variance of the residuals.  So it then comes down to how much you believe the within-normals variability is consistent with the within-tumour variability.  My experience is that they are commensurate generally, sometimes they are bit less variable (if the Tolstoy maxim holds, that all normal samples are the same, but all tumour samples are tumorous in their own way), and so I generally keep them (look at a few positive controls' behaviour to get a feel)  but occasionally if there's varying contamination of normals with tumour I'll then definitely subset the deseq object to remove them from the analysis entirely.

Thanks a lot!

David