I performed a Nascent RNAseq experiment, which yielded TotalCounts and NascentCounts (Tc) for each gene. I have two conditions (Control vs Tx), each with 3 biological replicates. I analyzed the data with DESeq2 using posts about ChIP-seq and Ribo-seq as a guide.
My results surprised me: the top genes with significant LFC are the result of changes in TotalCounts, while NascentCounts stayed the same. I would like to look at genes that show significant LFC due to changes in NascentCounts, while TotalCounts stay similar.
Is there a way for me to subset this data to accomplish this?
Alternatively, I could compare TotalCounts in an independent, one-factor, DESeq dataset. I could then extract the list of genes that showed no significant changes, and use this list to subset my two-factor dataset.
Thanks for any help.
P.S. I've simplified sample names and output results in the code below; there m
dds <- DESeqDataSetFromMatrix(countData = countDataBoth,
colData = LUT,
design = ~ Assay + Treatment + Assay:Treatment)
results[order(results$padj, -results$log2FoldChange),]
| Gene | log2FoldChange | padj |
|---------|----------------|-----------|
| YJR145C | -1.4184356 | 2.97E-109 |
| YMR116C | -1.5101331 | 4.91E-101 |
| YOL120C | -1.3099833 | 5.05E-96 |
| YNL209W | -1.3878594 | 6.14E-84 |
| YGL103W | -1.500131 | 1.71E-81 |
#If we look at the countData for these genes...
| Gene | Control1_Total | Control2_Total | Control3_Total | Tx1_Total | Tx2_Total | Tx3_Total | Control1_Tc | Control2_Tc | Control3_Tc | Tx1_Tc | Tx2_Tc | Tx3_Tc |
|---------|----------------|----------------|----------------|-----------|-----------|-----------|-------------|-------------|-------------|--------|--------|--------|
| YJR145C | 15143 | 12567 | 13793 | 7243 | 7225 | 6210 | 1723 | 1382 | 1435 | 1682 | 1759 | 1320 |
| YMR116C | 19678 | 17564 | 18366 | 7738 | 7754 | 6832 | 2594 | 2041 | 2165 | 2188 | 2200 | 1720 |
| YOL120C | 24257 | 19979 | 21604 | 12131 | 11822 | 10011 | 2693 | 2167 | 2326 | 2606 | 2631 | 2004 |
| YNL209W | 17890 | 15172 | 16693 | 9371 | 10003 | 7861 | 2576 | 1997 | 2191 | 2698 | 2854 | 2065 |
| YGL103W | 38008 | 32378 | 35210 | 20185 | 21474 | 16466 | 2862 | 2353 | 2501 | 3393 | 3558 | 2515 |
Hi Michael,
To set the reference level, you use relevel correct? I already do this for my Treatment group - is it possible to relevel on Assay at well?
How would you recommend I define "near 0" for the subset step? Also, do you have an easy way to extract the list of genes? My code is generating a TRUE/FALSE logical vector that lacks the gene names, and I can't figure out how to get
results@rownames
instead.Thanks!
You need to relevel before running
DESeq()
.You could use altHypothesis="lessAbs" (see vignette) for near 0.
results() tables are as long as the
dds
so you can do logical indexing:This works because res and res2 are the same number of rows.
Hi Michael. I ran in to some issues; tried to break it up with relevant code.
I constructed my dds using a reduced model, which requires test="LRT". However, it looks like altHypothesis=lessAbs requires test="Wald'. Perhaps I do not need to be using a reduced model?
Or, if I do need to use reduced model, should I construct two separate DEseq datasets? For instance:
However, this data set still appears to contain genes where there are large changes in TotalReads between Treatment groups, even though the p-values are now 1. Did this comparison generate the genes that I should ignore? Was expecting it to generate the opposite (genes of interest: genes where TotalReads do not change between Treatment group).
Yes, you need to use Wald test if you want to use
lfcThreshold
andaltHypothesis
. So justDESeq(dds)
is sufficient: noreduced
, ortest
, etc.test="Wald"
is the default so you don't need to specify.Just remember, you asked to find genes where there are not large changes. You set the alternative hypothesis that the LFC is less than in absolute value from 0.1. So the null hypothesis is that the LFC are larger than that. Large p-value --> large changes, small p-value --> LFC close to 0. You would then take the genes with
padj
< X for whatever your desired FDR bound is. These are the genes that are not changing by more than 0.1.I think I was too stringent with my lfcThreshold, which was returning zero genes with a padj < 0.05. I increased lfcThreshold to 0.5 and now have >2000 genes with padj < 0.05. Is there an appropriate statistical test that can be run to determine the ideal lfcThreshold for a given DESeqResults?
No the threshold should be biologically motivated. You can think about 2^X and think about what fold change is biological relevant (or irrelevant).