Question

tximport from RSEM expected counts: is it possible to import from a table rather than a file?

0

Entering edit mode

Ahdee ▴ 50

@ahdee-8938

Last seen 18 months ago

United States

Hi based on this tutorial, https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#rsem it looks like its possible to import expected counts from RSEM however files are not always available, for example GTEx set from Xena UCSC only has expected counts row by genes and columns sample. Is there a way to use tximport with this table instead?

thanks!

tximport limma rnaseq • 2.1k views

ADD COMMENT • link updated 3.1 years ago by Aubrey • 0 • written 3.9 years ago by Ahdee ▴ 50

score 1 · Answer 1 · 2020-05-15

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

No, its just designed for per sample. Why not just use read.delim()? What do you need from tximport?

ADD COMMENT • link 3.9 years ago Michael Love 41k

0

Entering edit mode

yes thanks; I usually just do that. Was just wondering if there was something tximport do differently.

ADD REPLY • link 3.7 years ago Ahdee ▴ 50

0

Entering edit mode

So am I correct in hearing that tximport doesnt do anything with transcript lengths at this stage?

Per Kevin Blighe, the log2(x)+1 normalization still needs to be reversed before running DESeq2 on, e.g. https://dev.xenabrowser.net/datapages/?dataset=gtex_Kallisto_est_counts&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443

ADD REPLY • link 3.1 years ago Aubrey • 0

0

Entering edit mode

I don't know what you're asking. In the above thread, we decided tximport cannot be used if the sample data is not available. What case are you referring to?

If you have access to aggregated, scaled, transformed data only, I wouldn't recommend tximport or DESeq2, as these are explicitly designed for when per-sample count data is available.

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

Ah, sorry. That was a bit unclear. I -think- that the Xena dataset that I linked is actually per-sample data, just aggregated into a single file.

As I understand it, that link is the result of 'cbind'-ing the first column (estimated counts) from Kallisto's h5dump.

Question 1: Assuming that the values are actually per-sample Kallisto "estimated counts" fields, would read.delim and DESeq2 be apppropriate?

Question 2: is there some feature (normalization, etc?) in tximport to be gained by splitting this merged data.frame into one-file-per-sample TSVs in the format: "ENST+est_counts+feature length" and using tximport, over just using read.delim?

ADD REPLY • link 3.1 years ago Aubrey • 0

0

Entering edit mode

1) if you want to collapse to gene-level and perform testing, you should be taking account of the gene length, either through an offset or scaledTPM. So then if you have counts but not length or abundance, I wouldn't recommend this as input to DESeq2 (or, it's not the tximport recommended input).

2) Yes, read the tximport paper.

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

Okay, seems like it should work. I'll go and do it correctly. I did skim the paper, but obviously I should read the entirety.

ADD REPLY • link 3.1 years ago Aubrey • 0