Hi, I am performing differential expression (DEG) analysis using DESeq2. I have no replicates, and I understand that without replicates, it is useless to conduct this analysis. However, I have no other option. I have created an R script for cases with no replication. Could you please review it and provide your suggestions?
It will be great helpful for me to complete my work
Load required libraries
library(DESeq2)
Read in the count data
count_data <- read.csv("count.csv", header = TRUE, row.names = 1) head(count_data)
Filter low count genes
keep <- rowSums(count_data >= 10) > 0 count_data_filtered <- count_data[keep, ]
Create metadata
metaData <- data.frame("Condition" = c("COL", "COB")) metaData$Condition = factor(metaData$Condition, levels = c("COL", "COB")) rownames(metaData) = colnames(count_data)
Check metadata
head(metaData) all(rownames(metaData) == colnames(count_data))
Create DESeqDataSet object
dds = DESeqDataSetFromMatrix(countData = count_data_filtered, colData = metaData, design = ~Condition)
Estimate size factors
dds <- estimateSizeFactors(dds)
Set a fixed dispersion value
dispersions(dds) <- 0.1
Run DESeq without dispersion estimation
dds <- nbinomWaldTest(dds)
Get results
res <- results(dds)
Order results by adjusted p-value
resOrdered <- res[order(res$padj), ]
Summary of differential expression
summary(res)
The above-mentioned code produced the following results:
out of 19980 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 2170, 11% LFC < 0 (down) : 1033, 5.2% outliers [1] : 0, 0% low counts [2] : 0, 0% (mean count < 4) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results
This has been asked many times before, for example: DESeq2/Differential expression analysis without replicates
Please understand that you likely won't get code reviews for non officially-supported ways of analysis. Just do as in the linked thread, so get vst transformed counts, get fold changes and rank by this. Maybe additionally filter for genes with decent counts in at least one condition so you protect against false positives with large fold changes due to small counts and lots of noise (despite vst should deal with that already to some extend).
Hi,
Thank you for your reply.
I am currently a PhD scholar at a government college, working without any funding. Unfortunately, this has limited my ability to perform multiple replicates. With my personal savings and borrowing from others, I was only able to afford RNA sequencing for two samples (10 samples pooled in each). I fully acknowledge the challenges of analyzing data without replicates, and I am aware that this can impact the robustness of my findings. I hope that in the future, with better funding, such as during a postdoc, I will be able to conduct more comprehensive studies with replicates.
I have consulted others who have attempted RNA-seq analysis with single sample comparisons, but I have not received any responses from them. The code I mentioned earlier was developed by researching online platforms. Still, I am getting doubts regarding the reliability of the code, as mentioned above.
Thanks again for your feedback and guidance.
I understand your situation is difficult, and investing private money for the sake of your science is honorable, however, this unfortunately all does not change the fact that results will be unreliable, one way or the other.
If you want sort of "formal" analysis, then check the edgeR user guide section "what to do without replicates" -- since they have example code similar to the one in your toplevel question. Not that this makes it more reliable, but at least you can follow documented code, and get some results. Again, will be unreliable, but if you have no other choice, at least try to use this as a "citable" analysis in the sense that you can say you got it from a reputable tool.