Hi,
I am writing to inquire about the input for DESeq from Salmon using tximport/tximeta.
I ran the nf-core/rnaseq
pipeline for RNA data analysis. Unfortunately, I mistakenly deleted the quant.sf
and transcriptome BAM files, and I only have the final outputs: gene_lengths
, gene_tpm
, gene_counts
, gene_counts_length_scaled.tsv
, and gene_counts_scaled.tsv
.
From my observations of the nf-core
pipeline, the process is as follows: quant.sf -> tximport -> summarizeToGene (tximeta)
.
In the code, the nf-core team defines the pattern for file names based on the quantification type and imports the transcript-level quantifications using tximport
. They also create a SummarizedExperiment
object and process gene-level data using summarizeToGene
if a mapping (tx2gene
) is available.
The nf-core team uses gene_counts_length_scaled.tsv
as the input for the following code:
dds <- DESeqDataSetFromMatrix(countData=round(counts), colData=coldata, design=~1)
dds <- estimateSizeFactors(dds)
Since the gene_counts_length_scaled.tsv
file has already been normalized based on library size and gene length, the nf-core team visualizes the heatmap and PCA without rerunning dds <- DESeq(dds)
.
In my case, I need to compare two groups based on a specific list of genes, so I need to run dds <- DESeq(dds)
manually. DESeq requires raw counts, which leads me to believe that my input should be gene_counts.tsv
.
Is my solution correct?
Best regards,
Nhu
Thank you to the author for the quick reply. If I use
gene_counts_length_scaled.tsv
as a metric that has already normalized for library size and the average transcript length, what additional components willdds <- DESeq(dds)
continue to normalize? Will this affect my final results? I think that pre-normalized counts (like length-scaled counts) can lead to over-normalization when DESeq2 applies its own methods on top of existing adjustments, potentially distorting expression values.I want to observe which genes are upregulated and downregulated, so I need the
log2FoldChange
andpadj
. Below is the nf-core code.No, only for average transcript length, not for depth or libary composition. The output is still raw counts, so you can use DESeq2 without any modifications.
Agree.
The way that DESeq2 works with average transcript length correction, it cannot "over-normalize".
Thank you both. Special thanks from Prof. Michael Love. It's rare to find tool authors who respond to users so quickly and supportively. Many thanks for your tool and your contribution.