Dear all,
I have a question about long-read transcript quantification and differential expression using DESeq2. First, under the scenario of long-read RNA-seq, we may obtain lots of individual-specific transcripts, which were not shared across all the samples, and it may further cause the mis-quantification of transcripts, for example, in the same gene, but we obtained a seemingly correct but actually erroneous expression level for transcripts with high sequence similarity. Therefore, I quantified all the transcripts at the sample level to avoid the abovementioned question and assigned non-existent transcripts in each sample to 0.
Second, I loaded all the transcript abundance from lr-kallisto into R using tximport, and was trying to conduct DESeq2 analysis,
# !!! this is for gene expression
## DESeq2; ~ 2mins
## 1.
dds = DESeqDataSetFromTximport(txi, colData=as.data.frame(files), design=~1) ## `1` indicated just normalization, not for differential analysis
register(MulticoreParam(16))
## 2.
dds = suppressWarnings(DESeq(dds, parallel=TRUE))
normed_cts = counts(dds, normalized=TRUE)
logged_normed_cts = log2(normed_cts + 1) ##
However, it reported that:
Error in DESeqDataSetFromTximport(txi, colData = as.data.frame(files), : all(lengths > 0) is not TRUE
Traceback:
1. DESeqDataSetFromTximport(txi, colData = as.data.frame(files),
. design = ~1)
2. stopifnot(all(lengths > 0))
I was wondering how to solve this issue. Any suggestion would be appreciated!
Hongpu,
Best regards.
The preprocessing part is not within the scope of Bioconductor, and for tximport one would need the command line. Consider recent benchmark studies for example https://www.nature.com/articles/s41592-023-02026-3 to guide your analysis pipeline and DE analysis choices.