Question: tximport not working on tsv files
0
4 months ago by
HKS0
HKS0 wrote:

Hi I am using tximport to make a matrix of all TPM values, I have TSV files for many samples, which looks like:

Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM ENSG00000187961.13 KLHL17 chr1 + 960587 965715 4,714223 2,219789 5,030543 ENSG00000187583.10 PLEKHN1 chr1 + 966497 975865 3,669177 2,59782 5,887248 ENSG00000187642.9 PERM1 chr1 - 975204 982093 1,092457 0,444985 1,008436

I have performed these commands:

all(file.exists(samples)) [1] TRUE tmp <- readtsv(samples[1]) Parsed with column specification: cols( Gene ID = colcharacter(), Gene Name = colcharacter(), Reference = colcharacter(), Strand = colcharacter(), Start = coldouble(), End = coldouble(), Coverage = coldouble(), FPKM = coldouble(), TPM = coldouble() ) tx2gene <- tmp[, c("Gene ID", "Gene Name")] txi <- tximport(samples, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE) Note: importing abundance.h5 is typically faster than abundance.tsv reading in files with read_tsv 1 Warning: 78724 parsing failures. row col expected actual file ... ......... .......... ......... ......................................................................... See problems(...) for more details.

Error in tximport(samples, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE) : all(c(abundanceCol, countsCol, lengthCol) %in% names(raw)) is not TRUE In addition: Warning message: Unnamed col_types should have the same length as col_names. Using smaller of the two.

After running this command, I have noticed that the TSV files have different row lengths i.e. some files contains 19682 genes, some contains 19684 genes,, so the number of rows are different in each file..

So I want to ask if there is a way to ignore this parameter or any other solution to this problem? Thanks

tximport • 177 views
modified 4 months ago • written 4 months ago by HKS0

In total, I have only 9 columns in my TSV files : Gene ID, Gene Name, Reference, Strand, Start, End, Coverage, FPKM, TPM.. I do not have countsCol and lengthCol in stead I have Coverage .. I want to extract the last two columns in my TSV files i.e. FPKM and TPM .. to form a matrix..

Answer: tximport not working on tsv files
0
4 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Those are not original abundance files, but have been modified. For example the effective lengths are missing, which are needed to run tximport.

yes, these files are generated in a different way, Is there a way to extract the last column(s) and form a matrix?

You can loop over the files with base R to extract values from each file, and store those in a matrix.

The tximport method involves the effective lengths, so you can use tximport without that information.

Ok, so can I use countsFromAbundance = "ScaledTPM" if I have .ctab files from StringTies Function to generate a matrix of TPM values? and not FPKM values? In the same manner can I use countsFromAbundance = "no" to generate raw counts? Because further I want to use DESeq2 and use raw counts (un-normalized counts) for it.. Also want to use TPM values for feature extraction.. so I want to generate two types of matrices.. TPM and raw counts

Can you obtain the output of the quantification method, rather than these post-processed files?

That will just make everything easier.

I don't know what the values of these columns mean so I can't give you any reasonable advice. I don't know what "coverage" means. I don't see effective transcript lengths or expected counts. You can't use scaledTPM without expected counts and TPM. You can't use the un-normalized counts approach without effective transcript lengths, expected counts and TPM.

If you want to have a correct analysis, just provide the expected input to the software you use, so you avoid guesswork that could lead to errors. The expected input is the default files that are output by the methods that are listed in type.

No, I also have t_data.ctab files for each sample from StringTie output and I am able to form a matrix out of these files through tximport.. But I want two types : raw counts and TPM values, can tximport give those? I do not want FPKM values..

Why are you using type="kallisto" instead of type="stringtie"? I'm really confused here.