Question

RSEM - tximport; where is effective_length information used? Implications for scRNA?

0

Entering edit mode

Kelen ▴ 10

@kelen-24047

Last seen 2.6 years ago

United Kingdom

Hi!

With bulk-RNA I am used to using STAR-RSEM (gene-level)-tximport-DESeq2 as a standard workflow; it is well documented and via tximport the effective_length from RSEM is also 'taken into account'. Now, I am working with scRNA and I am getting confused with how to import the RSEM counts for analysis; looking at the source-code of tximport, I don't see the effective_length being used for anything other than just being included as a variable in the resulting object.

I would like to know when, where and how is the effective_length information from RSEM incorporated? How/if this benefits scRNA, or what is the advised way of using RSEM count estimates in single-cell RNAseq?

data: full-length data; smart-seq2

I am very grateful for any input on this, thanks!

RSEM tximport scRNAseq DESeq2 • 2.1k views

ADD COMMENT • link 2.7 years ago Kelen ▴ 10

score 0 · Answer 1 · 2022-06-24

0

Entering edit mode

ATpoint ★ 4.7k

@atpoint-13662

Last seen 15 hours ago

Germany

What kind of scRNA-seq is this? Full-length or end-tagged?

ADD COMMENT • link 2.7 years ago ATpoint ★ 4.7k

1

Entering edit mode

ATpoint is correct -- it's not enough to say single-cell. In the tximport vignette we have a note about transcript length correction for 3' tagged scRNA-seq, which is worth a read:

<https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html

ADD REPLY • link 2.7 years ago Michael Love 43k

0

Entering edit mode

Hi!

Thank you both for the comments. It is indeed full-length (I have updated the question).

ADD REPLY • link 2.7 years ago Kelen ▴ 10

0

Entering edit mode

It depends on the downstream tool, but the default pipelines in tximport or tximeta vignettes incorporate the effective length as a statistical offset (in DESeq2, edgeR for example). What tool were you planning on using downstream of import? Are you planning gene or isoform level analysis?

ADD REPLY • link 2.7 years ago Michael Love 43k

0

Entering edit mode

Thank you. Gene-level analysis. I don't yet have a specific downstream pipeline in mind - I am trying to understand what limitations I might run into, or what tools I can/can't use. My intention with this question was just to get a better understanding of processes that happen between RSEM-tximport-DESeq2 in the context of scRNA; am I correct in interpreting that tximport itself doesn't use the length information, but this is used within DESeq2's normalization?

More specific context, which prompted me to look into it: I am following the OSCA books for analysis guidance (tximported RSEM counts; smartseq2; no spike-in), and my normalization results with scran keep coming back wonky (I end up introducing a plate-wise batch effect on the Tsne for plates that consist of the same cell clones (essentially "replicates"; I have a number of clones and cells from each clone were sorted onto two plates), which is not present in Tsne for non-normalized counts). So I though perhaps there are upstream steps I am not counting for e.g. length-offset during normalization, like with bulk data through DESeq2.

ADD REPLY • link 2.7 years ago Kelen ▴ 10

1

Entering edit mode

tximport imports the length information, then if you see the pipelines in the vignette, the information is passed off to the different packages for it to be included in the offset in the GLM. Likewise for tximeta and its vignette.

I don't think the effective length will be the cause for the issue you observe. Typically the offset doesn't do very much (because there is not drastic and systematic isoform switching in most experiments). The offset comes into play when in fact there is dramatic isoform switching, so that gene-level differential expression analysis can be performed in a robust manner.

ADD REPLY • link 2.7 years ago Michael Love 43k

0

Entering edit mode

Perfect, thanks!

ADD REPLY • link 2.7 years ago Kelen ▴ 10

0

Entering edit mode

Thanks ATpoint! The data is full-length, but regardless of this, the bits that confuse me (highlighted questions) are quite general I think.

ADD REPLY • link 2.7 years ago Kelen ▴ 10