Search
Question: DESeq2 Time course analysis design - only time points WITHOUT treatments
0
gravatar for maithex
12 weeks ago by
maithex0
maithex0 wrote:

Hello,

I am doing RNA-seq analysis of differential gene expression from mare's endometrial biopsies. I do not have any treatments though, and this is the reason I am having a hard time trying to figure out the design I should use as all examples I found so far had treatments + time points.

I have the first time point of 0h which is my control (endometrial biopsy from a mare) then 24h, 48h and 72h. All I want to do is to understand what happens over the time points, without any treatment. I have 8 animals and for each animal I have the 4 timepoints. My metadata has "sample_id" column with my bam files,"timepoint" column and "horse_id" column.

My coding looks like this:

txdb <- makeTxDbFromGFF("Equus_caballus.EquCab2.89.gtf", format="gtf")
ebg <- exonsBy(txdb, by="gene")
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=TRUE,
                        fragments=TRUE )
dds <- DESeqDataSet(se, design = ~ timepoint)
dds <- DESeq(dds)
res_0hand72h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "control_0h", "timepoint_72h"))
res_0hand24h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "control_0h", "timepoint_24h"))
res_24hand72h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "timepoint_24h", "timepoint_72h"))
res_48hand72h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "timepoint_48h", "timepoint_72h"))
res_24hand48h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "timepoint_24h", "timepoint_48h"))
table(res_0hand72h_LFC1$padj < 0.1)
table(res_0hand24h_LFC1$padj < 0.1)
table(res_24hand72h_LFC1$padj < 0.1)
table(res_48hand72h_LFC1$padj < 0.1)
table(res_24hand48h_LFC1$padj < 0.1)

Running the above I got results comparing two time points at a time (false rate discovery set for 0.1% with a >2 fold change). Results look ok to me. E.g. When comparing control (0h) to 72h time point a total of 22542 genes were expressed, 824 of which were shown to be statistically significantly differentially up expressed and 785 shown to be statistically significantly differentially down expressed, both at a false discovery rate (FDR) of 0.1% with a fold change greater or equal to 2.

BUT I wanted to have one result comparing all time points, but I am not sure if that's doable/reasonable?

first doubt about my design:

I am not sure whether I need to add my horse_id to the design, maybe
dds <- DESeqDataSet(se, design = ~ treatment + horse_id)
or
dds <- DESeqDataSet(se, design = ~ treatment + horse_id + treatment:horse_id) 
I do not know whether or not to use the interaction between treatment and horse_id.

second doubt about the time series analysis:

Should I use nbimLRT() function to test the significance of multiple coefficients at once?

I tried:

ddsLRT <- DESeqDataSet(se, design = ~ treatment*horse_id)
design(ddsLRT) <- formula(~ treatment*horse_id)
ddsLRT <- estimateSizeFactors(ddsLRT)
ddsLRT <- estimateDispersions(ddsLRT)
ddsLRT <- nbinomLRT(ddsLRT, reduced = formula(~ treatment))
resLRT <- results(ddsLRT)
table(resLRT$padj < 0.1)

and the result was FALSE 22542 genes, meaning that none of the expressed genes were statistically significantly expressed, which is a results different from the pairwise analysis of time points separatedely. 

I hope I made myself clear and I really look forward to receiving any help/thoughts on this.

Many thanks,

Maithe Barros

ADD COMMENTlink modified 12 weeks ago by Michael Love15k • written 12 weeks ago by maithex0
1
gravatar for Michael Love
12 weeks ago by
Michael Love15k
United States
Michael Love15k wrote:

I'd recommend a LRT with a full design of ~horse + time and a reduced design of ~horse.

A couple notes here: I like to use short variable names without "_", this will reduce errors potentially (e.g. R doesn't allow "_" in column names, in case you ever want to store the results as a data.frame or the like).

The design above controls for differences between horse at time=0, and then tests to see if there are any differences at any time point. So you get a single p-value (although there are multiple coefficients fitted, one for each time point other than t=0). See the LRT section of the vignette and ?results which has a specific section on the LRT.

ADD COMMENTlink written 12 weeks ago by Michael Love15k

Thanks a lot for your reply, Michael. I have renamed my variables.I ran:

ddsLRT <- DESeqDataSet(se, design = ~ horse + timepoint)
ddsLRT <- estimateSizeFactors(ddsLRT)
ddsLRT <- estimateDispersions(ddsLRT)
ddsLRT <- nbinomLRT(ddsLRT, reduced = ~ horse)
resLRT <- results(ddsLRT)
table(resLRT$padj < 0.1)

and my result was 

FALSE  TRUE 
 6812 11837

log2 fold change (MLE): timepoint time72h vs control0h 
LRT p-value: '~ horse + timepoint' vs '~ horse' 
DataFrame with 26991 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat       pvalue         padj
                    <numeric>      <numeric> <numeric>  <numeric>    <numeric>    <numeric>
ENSECAG00000000001 222.153134     -0.2119773 0.1507054   5.857054 0.1187757742  0.174482033
ENSECAG00000000002 227.264520      0.2389474 0.1314716   4.081881 0.2527571757  0.336834970
ENSECAG00000000003  43.521031      0.5022090 0.2916049   8.196995 0.0421111066  0.069049381
ENSECAG00000000004 123.625756     -0.2415447 0.1482973  16.996629 0.0007078716  0.001628157
ENSECAG00000000005   8.405233     -0.7764470 0.4533185   4.009984 0.2603880344  0.345326159
...                       ...            ...       ...        ...          ...          ...
ENSECAG00000027698   137.2158      -1.056968  3.548143  0.2138203 9.753280e-01 9.856341e-01
ENSECAG00000027699  5487.5252       3.261664  0.400470 61.2228495 3.220709e-13 2.356336e-12

but when I try  resultsNames(ddsLRTthe output is character(0). Therefore I think there is something wrog.

I also tried:

ddsTC <- DESeqDataSet(se, design = ~ horse + timepoint)
ddsTC <- DESeq(ddsTC, test="LRT", full= ~ horse + timepoint,  reduced = ~ horse)
resTC <- results (ddsTC)

and the result was the same: 
FALSE  TRUE 
 6812 11837 

log2 fold change (MLE): timepoint time72h vs control0h 
LRT p-value: '~ horse + timepoint' vs '~ horse' 
DataFrame with 26991 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat       pvalue         padj
                    <numeric>      <numeric> <numeric>  <numeric>    <numeric>    <numeric>
ENSECAG00000000001 222.153134     -0.2119773 0.1507054   5.857054 0.1187757742  0.174482033
ENSECAG00000000002 227.264520      0.2389474 0.1314716   4.081881 0.2527571757  0.336834970
ENSECAG00000000003  43.521031      0.5022090 0.2916049   8.196995 0.0421111066  0.069049381
ENSECAG00000000004 123.625756     -0.2415447 0.1482973  16.996629 0.0007078716  0.001628157
ENSECAG00000000005   8.405233     -0.7764470 0.4533185   4.009984 0.2603880344  0.345326159
...                       ...            ...       ...        ...          ...          ...
ENSECAG00000027698   137.2158      -1.056968  3.548143  0.2138203 9.753280e-01 9.856341e-01
ENSECAG00000027699  5487.5252       3.261664  0.400470 61.2228495 3.220709e-13 2.356336e-12

but again with resultsNames(resTC) the output is character(0).

I did read the LRT section and ?results but I still do not understand why I do not have any info from resultsNames, to then generate results tables for individual effects, which must be individual elements of resultsNames. Is this indeed incorrect or am I missing something? 

ADD REPLYlink written 12 weeks ago by maithex0

You need to copy paste exactly the code you ran if you obtained an unexpected output. resultsNames() should be run on a DESeqDataSet (dds) not on a DESeqResults object (res). See ?resultsNames for the usage and example code.

ADD REPLYlink written 12 weeks ago by Michael Love15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 238 users visited in the last hour