Hello,
I would like to continue a topic that was first started on this Biostars post. Essentially, in an attempt to help the OP from that topic, I brought the point of downstream data analysis with full-length RNA-seq protocol such as Smart-Seq2 when one uses Salmon (in quasi-mapping mode) and tximport.
The main point that some of us were discussing is what should be the best protocol for using tximport with tools such as Seurat under these analytical conditions? (and perhaps the tximport vignette could benefit from a small section on this like how it has for DESeq2, EdgeR etc.).
A few points I brought up were:
- When I searched for tximport and single-cell/Seurat etc...,
alevinusually comes up, however, it's important to realize that a lot of the 10X genomics tech, is 3' tagged RNA-seq, and thus does not have the length biases that would be present in Smart-Seq protocol (and thus passing the rawtxi$countsas raw counts in data import for Seurat makes perfect sense). Of course, that with Smart-Seq data, we wouldn't even usealevin, but instead justsalmonin the same way as it is done with bulk RNA-seq. - Thus, my understanding is that the correct steps for Smart-Seq/full length protocol would be to 1) import the data with the
tximportsettingcountsFromAbundance=lengthScaledTPMwhich would then result in counts which were normalized for sequencing depth and length and this would be stored intxi$countswhich can then 2) be passed on to Seurat'sCreateSeuratObjectincounts. NOTE I originally had in mind that one would likely want to do this withtxOut=FALSEto have gene-level data as I am not quite sure single-cell algorithms are sensitive enough to transcript-level analysis/DE etc... But perhaps this would be a good place to get this confirmation. 3) In Seurat, if one importstxi$countsgenerated withcountsFromAbundance=lengthScaledTPM, then one should likely follow the advice that has been given by the Seurat team if starting with TPMs (this info is from their GitHub issue #668 - don't think the last answer is from a Seurat team member, but it was approved by the satijalab in the reaction) which are to skip theSeurat::NormalizeData()step, but transform the data to log scale (which is stored inobject@metadata) prior toScaleDataand also note that log scale in Seurat is natural log.
I believe that captures the main point from the follow-up discussion. Thanks for any advice (special thanks to Michael Love who suggested we post a question here).

Hi Michael, makes sense, thanks! On the last point in particular, I appreciate the clarification (got confused with the TPM input), but it makes sense when I think about how we utilize it in bulk RNA-seq with DESeq2 (e.g.: even thought we import with
DESeqDataSetFromTximport, DESeq2 still performs library seq. depth normalization).Thanks again!
Sure!
You’ll notice the column sum of the counts matrices is always the same: the number of mapped reads.