Hi,
I used an nf-core/rnaseq
pipeline using star_salmon
default aligner, on strand specific dataset. I have a question about gene counts data obtained as a result of salmon quantification. I am interested in gene counts for downstream only rather than isoforms. It seems like the nf-core rnaseq pipeline is designed to import "counts_gene_length_scaled" reference 1 reference 2 via tximport > Deseq2 > size_factors > vst
. The pipeline generates a number of files, I would like to know which file from the shown below is best to use in edgeR
DGEList. Probably this file "salmon.merged.gene_counts.rds"?
Before using this pipeline I used to get started from the raw gene counts from featureCounts
then use in EdgeR.
salmon.merged.gene_counts_length_scaled.tsv
salmon.merged.gene_counts.rds
salmon.merged.gene_counts_scaled.rds
salmon.merged.gene_counts_scaled.tsv
salmon.merged.gene_counts.tsv
salmon.merged.gene_tpm.tsv
salmon.merged.transcript_counts.rds
salmon.merged.transcript_counts.tsv
salmon.merged.transcript_tpm.tsv
salmon_tx2gene.tsv
Thank you,
Toufiq
Thank you Gordon Smyth
My collaborator asked me to test this pipeline with
egdeR
package. We are interested at gene level analysis only. It seems likesalmon.merged.gene_counts.tsv
could be a starting point inedgeR
Does it output Salmon files (directories with
quant.sf
in them)?That would be the easiest. These files you have above are processed and not ideal. The whole point of tximport is to take Salmon output files are prepare count matrices with effective gene length offsets. The gene length offsets account for changes in transcript length as well as biases such as sample-specific variation based on amplification or fragmentation.
It does output the Salmon files, and it is documented here:
https://nf-co.re/rnaseq/output#pseudo-alignment-and-quantification
The first bulletpoint is the easiest, and is a commonly used pipeline for getting Salmon quantification into R/Bioconductor for use with downstream count based tools.
Alternatively, if you don't have access to the
quant.sf
files, you would loadsalmon.merged.gene_counts_length_scaled.tsv
and use that as the count matrix input to edgeR.Michael Love thank you. Yes, the pipeline generates
quant.sf
files too, however, those were deleted and only the above listed files were provided. As a workaround, I will usesalmon.merged.gene_counts_length_scaled.tsv
fileas the count for the input matrix inR