Question: tximport question on the values contained in "counts" and "abundance" matrices
0
5 months ago by
HKS0
HKS0 wrote:

I want to ask you about the matrices that are generated when using txOut = FALSE (default) because I want the values from StringTie quantification but gene level summarized (as they are more robust). I am using "StringTie" output files "tdata.ctab" as input files for tximport. tximport gives the lists with matrices, “abundance”, “counts”, and “length” where the transcript level information is summarized to the gene-level. I want to ask whether 1. txi$counts gives raw counts? If yes, then where do they come from - I can not find the raw counts in the tdata.ctab file? 2. Do the matrix txi$abundance contains TPM or FPKM values? (I used the default value of countsFromAbundance = i.e. "no") because again I can not find TPM values in t_data.ctab files however, there are FPKM values in that file and if they are TPM values, then where do they come from? and if they are FPKM values, then how do I get TPM values (not Scaled or lengthscaled ones)?

Please let me know as it is not clear to me from the research paper. Looking forward for your reply. I also tried with gene.tsv files which contains FPKM and TPM values but I got this error:

> tx2gene <- tmp[, c("Gene ID", "Gene Name")]
> txi <- tximport(files1, type = "stringtie", tx2gene = tx2gene)
1 Warning: 59043 parsing failures.
row      col               expected  actual                                                                        file
1 Strand   an integer             +       '/scratch/neocircle-samples-20190118/S006493/l.r.m.c.lib.g/k2.a/t/gene.tsv'
1 Coverage no trailing characters .714223 '/scratch/neocircle-samples-20190118/S006493/l.r.m.c.lib.g/k2.a/t/gene.tsv'
1 FPKM     no trailing characters .219789 '/scratch/neocircle-samples-20190118/S006493/l.r.m.c.lib.g/k2.a/t/gene.tsv'
2 Strand   an integer             +       '/scratch/neocircle-samples-20190118/S006493/l.r.m.c.lib.g/k2.a/t/gene.tsv'
2 Coverage no trailing characters .669177 '/scratch/neocircle-samples-20190118/S006493/l.r.m.c.lib.g/k2.a/t/gene.tsv'
... ........ ...................... ....... ...........................................................................
See problems(...) for more details.

Error in tximport(files1, type = "stringtie", tx2gene = tx2gene) :
all(c(lengthCol, abundanceCol) %in% names(raw)) is not TRUE
Unnamed col_types should have the same length as col_names. Using smaller of the two.


seems to me that tximport will only work with "t_data.ctab" files.

tximport • 189 views
modified 5 months ago by James W. MacDonald50k • written 5 months ago by HKS0

I have a question:

tid chr strand start end tname numexons length geneid gene_name cov FPKM 15869 chr12 - 9067712 9116157 ENST00000318602.11 36 4844 ENSG00000175899.14 A2M 336.726562 134.45993 15870 chr12 - 9110314 9116229 ENST00000404455.2 6 623 ENSG00000175899.14 A2M 0.041283 0.016485

So how will you calculate counts at gene and transcript level for this gene here : A2M - it is a t_data.ctab file. Please let me know. Looking forward to hear from you. Thanks

t_id    chr strand  start   end t_name  num_exons   length  gene_id gene_name   cov FPKM
15869   chr12   -   9067712 9116157 ENST00000318602.11  36  4844    ENSG00000175899.14  A2M 336.726562  134.45993
15870   chr12   -   9110314 9116229 ENST00000404455.2   6   623 ENSG00000175899.14  A2M 0.041283    0.016485


Will it be : 336.726562 * (9116157-9067712)/ 4844 ?? but this result does not match to the tximport output tx$counts.. ADD REPLYlink written 5 months ago by HKS0 The transcript length is 4844. The start and end are genomic coordinates, so that includes introns. The read length is a parameter in Stringtie, and we have a special tximport argument for Stringtie so you can set it. ADD REPLYlink written 5 months ago by Michael Love24k yes, i have used the Stringtie argument in tximport: So as you mentioned counts will be calculated by cov * average transcript length / read length so for this gene will it be : 336.726562*4844/? the answer in the output file is : 21748.3891418267 I just want to understand how it is calculated at fundamental level. ADD REPLYlink written 5 months ago by HKS0 What do you get for 336 * 4844 / read length, where you fill in read length with the value that you provided (or the default value that you can look up in ?tximport if you did not provide tximport with the read length)? ADD REPLYlink written 5 months ago by Michael Love24k Yes got it.. the default value for readLength in tximport is 75. Thank you .. ADD REPLYlink written 5 months ago by HKS0 Answer: tximport question on the values contained in "counts" and "abundance" matrices 0 5 months ago by Michael Love24k United States Michael Love24k wrote: txi$counts gives our best estimate of the original counts for Stringtie, which is cov * average transcript length / read length (as suggested by the Stringtie authors).

Abundance gives back the FPKM column from Stringtie. Abundance with all methods gives back what the software estimates. You can generate TPM easily from FPKM: divide each column by its sum, and then multiply by 1e6.