I would like to continue a topic that was first started on this Biostars post. Essentially, in an attempt to help the OP from that topic, I brought the point of downstream data analysis with full-length RNA-seq protocol such as Smart-Seq2 when one uses
Salmon (in quasi-mapping mode) and
The main point that some of us were discussing is what should be the best protocol for using
tximport with tools such as
Seurat under these analytical conditions? (and perhaps the tximport vignette could benefit from a small section on this like how it has for DESeq2, EdgeR etc.).
A few points I brought up were:
- When I searched for tximport and single-cell/Seurat etc...,
alevinusually comes up, however, it's important to realize that a lot of the 10X genomics tech, is 3' tagged RNA-seq, and thus does not have the length biases that would be present in Smart-Seq protocol (and thus passing the raw
txi$countsas raw counts in data import for Seurat makes perfect sense). Of course, that with Smart-Seq data, we wouldn't even use
alevin, but instead just
salmonin the same way as it is done with bulk RNA-seq.
- Thus, my understanding is that the correct steps for Smart-Seq/full length protocol would be to 1) import the data with the
countsFromAbundance=lengthScaledTPMwhich would then result in counts which were normalized for sequencing depth and length and this would be stored in
txi$countswhich can then 2) be passed on to Seurat's
counts. NOTE I originally had in mind that one would likely want to do this with
txOut=FALSEto have gene-level data as I am not quite sure single-cell algorithms are sensitive enough to transcript-level analysis/DE etc... But perhaps this would be a good place to get this confirmation. 3) In Seurat, if one imports
countsFromAbundance=lengthScaledTPM, then one should likely follow the advice that has been given by the Seurat team if starting with TPMs (this info is from their GitHub issue #668 - don't think the last answer is from a Seurat team member, but it was approved by the satijalab in the reaction) which are to skip the
Seurat::NormalizeData()step, but transform the data to log scale (which is stored in
object@metadata) prior to
ScaleDataand also note that log scale in Seurat is natural log.
I believe that captures the main point from the follow-up discussion. Thanks for any advice (special thanks to Michael Love who suggested we post a question here).