Entering edit mode
Hi based on this tutorial, https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#rsem it looks like its possible to import expected counts from RSEM however files are not always available, for example GTEx set from Xena UCSC only has expected counts row by genes and columns sample. Is there a way to use tximport with this table instead?
thanks!
yes thanks; I usually just do that. Was just wondering if there was something tximport do differently.
So am I correct in hearing that tximport doesnt do anything with transcript lengths at this stage?
Per Kevin Blighe, the log2(x)+1 normalization still needs to be reversed before running DESeq2 on, e.g. https://dev.xenabrowser.net/datapages/?dataset=gtex_Kallisto_est_counts&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443
I don't know what you're asking. In the above thread, we decided tximport cannot be used if the sample data is not available. What case are you referring to?
If you have access to aggregated, scaled, transformed data only, I wouldn't recommend tximport or DESeq2, as these are explicitly designed for when per-sample count data is available.
Ah, sorry. That was a bit unclear. I -think- that the Xena dataset that I linked is actually per-sample data, just aggregated into a single file.
As I understand it, that link is the result of 'cbind'-ing the first column (estimated counts) from Kallisto's h5dump.
Question 1: Assuming that the values are actually per-sample Kallisto "estimated counts" fields, would read.delim and DESeq2 be apppropriate?
Question 2: is there some feature (normalization, etc?) in tximport to be gained by splitting this merged data.frame into one-file-per-sample TSVs in the format: "ENST+est_counts+feature length" and using tximport, over just using read.delim?
1) if you want to collapse to gene-level and perform testing, you should be taking account of the gene length, either through an offset or scaledTPM. So then if you have counts but not length or abundance, I wouldn't recommend this as input to DESeq2 (or, it's not the tximport recommended input).
2) Yes, read the tximport paper.
Okay, seems like it should work. I'll go and do it correctly. I did skim the paper, but obviously I should read the entirety.