Identifying cell line-independent DEG in a time series (DESeq2)
1
0
Entering edit mode
@jhnwphillips-12017
Last seen 4.7 years ago

Hello there.

I would like to identify the cell line-independent differentially expressed genes in a time series experiment, i.e. those genes that respond similarly to the stimulus regardless of cell line background. This is the opposite scenario from the one described in the DESeq2 documentation for the fission dataset and I am unsure how to proceed. Please help!

I have three cell lines (X9342, X10742, X10743), a five point time course (0h, 4h, 8h, 16h, 24h), and three biological replicates (T1, T2, T4) for each (total 45 samples). The code is as follows:

#Display coldata
> coldata
                 cell_line time_point
X9342_0h_T1          X9342         t0
X10743_0h_T2        X10743         t0
X10743_0h_T4        X10743         t0
...
X10742_24h_T1       X10742        t24
X10742_24h_T2       X10742        t24
X10742_24h_T4       X10742        t24


#Run DESeq with the interaction term
> dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~cell_line + time_point + cell_line:time_point)
dds <- DESeq(dds)

#Define a contrast and write an output
> res <- results(dds, contrast=c("time_point", "t0", "t24"))
> res <- res[order(res$padj),]
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=T)), by="row.names", sort=F)
write.table(resdata, file="filename.txt", sep="\t", quote=F, row.names=F)

By inspection of the DEG output, this seems a bit off as some of the genes given low p-adj with large LFC don't show similar behavior when I look at the listed values in the accompanying matrix.

I suspect I have to do something similar as the fission example just with different terms. Please advise.

#fission example from DESeq2 workflow
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + minute)
resTC <- results(ddsTC)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),], 4)
#sessionInfo just in case
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.14.1              SummarizedExperiment_1.4.0 Biobase_2.34.0             GenomicRanges_1.26.4      
[5] GenomeInfoDb_1.10.3        IRanges_2.8.2              S4Vectors_0.12.2           BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
 [1] genefilter_1.56.0    locfit_1.5-9.1       splines_3.3.3        lattice_0.20-38      colorspace_1.4-1    
 [6] htmltools_0.3.6      base64enc_0.1-3      survival_2.44-1.1    XML_3.98-1.20        foreign_0.8-71      
[11] DBI_1.0.0            BiocParallel_1.8.1   RColorBrewer_1.1-2   plyr_1.8.4           stringr_1.4.0       
[16] zlibbioc_1.20.0      munsell_0.5.0        gtable_0.3.0         htmlwidgets_1.3      memoise_1.1.0       
[21] latticeExtra_0.6-28  knitr_1.23           geneplotter_1.52.0   AnnotationDbi_1.36.2 htmlTable_1.13.1    
[26] Rcpp_1.0.2           acepack_1.4.1        xtable_1.8-4         scales_1.0.0         backports_1.1.4     
[31] checkmate_1.9.4      Hmisc_4.1-1          annotate_1.52.1      XVector_0.14.1       gridExtra_2.3       
[36] ggplot2_3.2.0        digest_0.6.20        stringi_1.4.3        grid_3.3.3           tools_3.3.3         
[41] bitops_1.0-6         magrittr_1.5         lazyeval_0.2.2       RCurl_1.95-4.12      tibble_2.1.3        
[46] RSQLite_2.1.2        Formula_1.2-3        cluster_2.0.6        Matrix_1.2-12        data.table_1.12.2   
[51] assertthat_0.2.1     rpart_4.1-15         nnet_7.3-12         
deseq2 • 779 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

It seems like you could use a design of ~cell + time and perform and LRT with a reduced design of ~cell giving you the genes where there is a consistent time profile across the cell lines.

ADD COMMENT
0
Entering edit mode

This appears to have worked! Thanks so much. Code follows (please double check) in case anyone else has this issue.

#Corrected design
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~cell_line + time_point)
dds <- DESeq(dds)

#Perform LRT to identify genes with consistent profile
ddsTC <- DESeq(dds, test="LRT", reduced = ~cell_line)
resTC <- results(ddsTC, contrast = c("time_point", "t0", "t24"))
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),], 4)

#Output

#log2 fold change (MLE): time_point t0 vs t24 
#LRT p-value: '~ cell_line + time_point' vs '~ cell_line' 
#DataFrame with 4 rows and 6 columns
#       baseMean log2FoldChange     lfcSE      stat        pvalue          padj
#      <numeric>      <numeric> <numeric> <numeric>     <numeric>     <numeric>
#YPEL3 1282.4245      -3.710903 0.1829484  718.9134 2.797581e-154 4.802328e-150
#PDK2   716.8922      -1.900368 0.1099170  684.5640 7.664157e-147 6.578146e-143
#WDR45 1154.2460      -1.985449 0.1052600  558.1900 1.729407e-119 9.895665e-116
#CAPS   302.4570      -2.839824 0.1523674  526.9082 1.013159e-112 4.347973e-109
ADD REPLY
1
Entering edit mode

That’s fine, except the first DESeq() is not necessary, should be removed.

ADD REPLY
0
Entering edit mode

Thanks again, Michael. Posting correction here for posterity:

#Corrected design
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~cell_line + time_point)

#Perform LRT to identify genes with consistent profile
ddsTC <- DESeq(dds, test="LRT", reduced = ~cell_line)
resTC <- results(ddsTC, contrast = c("time_point", "t0", "t24"))
resTC$symbol <- mcols(ddsTC)$symbol
ADD REPLY

Login before adding your answer.

Traffic: 676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6