I used an
nf-core/rnaseq pipeline using
star_salmon default aligner, on strand specific dataset. I have a question about gene counts data obtained as a result of salmon quantification. I am interested in gene counts for downstream only rather than isoforms. It seems like the nf-core rnaseq pipeline is designed to import "counts_gene_length_scaled" reference 1 reference 2 via
tximport > Deseq2 > size_factors > vst. The pipeline generates a number of files, I would like to know which file from the shown below is best to use in
edgeR DGEList. Probably this file "salmon.merged.gene_counts.rds"?
Before using this pipeline I used to get started from the raw gene counts from
featureCounts then use in EdgeR.
salmon.merged.gene_counts_length_scaled.tsv salmon.merged.gene_counts.rds salmon.merged.gene_counts_scaled.rds salmon.merged.gene_counts_scaled.tsv salmon.merged.gene_counts.tsv salmon.merged.gene_tpm.tsv salmon.merged.transcript_counts.rds salmon.merged.transcript_counts.tsv salmon.merged.transcript_tpm.tsv salmon_tx2gene.tsv
Thank you Gordon Smyth
My collaborator asked me to test this pipeline with
egdeRpackage. We are interested at gene level analysis only. It seems like
salmon.merged.gene_counts.tsvcould be a starting point in
Does it output Salmon files (directories with
That would be the easiest. These files you have above are processed and not ideal. The whole point of tximport is to take Salmon output files are prepare count matrices with effective gene length offsets. The gene length offsets account for changes in transcript length as well as biases such as sample-specific variation based on amplification or fragmentation.
It does output the Salmon files, and it is documented here:
The first bulletpoint is the easiest, and is a commonly used pipeline for getting Salmon quantification into R/Bioconductor for use with downstream count based tools.
Alternatively, if you don't have access to the
quant.sffiles, you would load
salmon.merged.gene_counts_length_scaled.tsvand use that as the count matrix input to edgeR.
Michael Love thank you. Yes, the pipeline generates
quant.sffiles too, however, those were deleted and only the above listed files were provided. As a workaround, I will use
salmon.merged.gene_counts_length_scaled.tsvfileas the count for the input matrix in