tximport not working on tsv files
1
0
Entering edit mode
HKS • 0
@hks-19681
Last seen 4.6 years ago

Hi I am using tximport to make a matrix of all TPM values, I have TSV files for many samples, which looks like:

Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM ENSG00000187961.13 KLHL17 chr1 + 960587 965715 4,714223 2,219789 5,030543 ENSG00000187583.10 PLEKHN1 chr1 + 966497 975865 3,669177 2,59782 5,887248 ENSG00000187642.9 PERM1 chr1 - 975204 982093 1,092457 0,444985 1,008436

I have performed these commands:

all(file.exists(samples)) [1] TRUE tmp <- readtsv(samples[1]) Parsed with column specification: cols( Gene ID = colcharacter(), Gene Name = colcharacter(), Reference = colcharacter(), Strand = colcharacter(), Start = coldouble(), End = coldouble(), Coverage = coldouble(), FPKM = coldouble(), TPM = coldouble() ) tx2gene <- tmp[, c("Gene ID", "Gene Name")] txi <- tximport(samples, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE) Note: importing abundance.h5 is typically faster than abundance.tsv reading in files with read_tsv 1 Warning: 78724 parsing failures. row col expected actual file ... ......... .......... ......... ......................................................................... See problems(...) for more details.

Error in tximport(samples, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE) : all(c(abundanceCol, countsCol, lengthCol) %in% names(raw)) is not TRUE In addition: Warning message: Unnamed col_types should have the same length as col_names. Using smaller of the two.

After running this command, I have noticed that the TSV files have different row lengths i.e. some files contains 19682 genes, some contains 19684 genes,, so the number of rows are different in each file..

So I want to ask if there is a way to ignore this parameter or any other solution to this problem? Thanks

tximport • 1.9k views
ADD COMMENT
0
Entering edit mode

In total, I have only 9 columns in my TSV files : Gene ID, Gene Name, Reference, Strand, Start, End, Coverage, FPKM, TPM.. I do not have countsCol and lengthCol in stead I have Coverage .. I want to extract the last two columns in my TSV files i.e. FPKM and TPM .. to form a matrix..

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

Those are not original abundance files, but have been modified. For example the effective lengths are missing, which are needed to run tximport.

ADD COMMENT
0
Entering edit mode

yes, these files are generated in a different way, Is there a way to extract the last column(s) and form a matrix?

ADD REPLY
0
Entering edit mode

You can loop over the files with base R to extract values from each file, and store those in a matrix.

The tximport method involves the effective lengths, so you can use tximport without that information.

ADD REPLY
0
Entering edit mode

Ok, so can I use countsFromAbundance = "ScaledTPM" if I have .ctab files from StringTies Function to generate a matrix of TPM values? and not FPKM values? In the same manner can I use countsFromAbundance = "no" to generate raw counts? Because further I want to use DESeq2 and use raw counts (un-normalized counts) for it.. Also want to use TPM values for feature extraction.. so I want to generate two types of matrices.. TPM and raw counts

ADD REPLY
0
Entering edit mode

Can you obtain the output of the quantification method, rather than these post-processed files?

That will just make everything easier.

I don't know what the values of these columns mean so I can't give you any reasonable advice. I don't know what "coverage" means. I don't see effective transcript lengths or expected counts. You can't use scaledTPM without expected counts and TPM. You can't use the un-normalized counts approach without effective transcript lengths, expected counts and TPM.

If you want to have a correct analysis, just provide the expected input to the software you use, so you avoid guesswork that could lead to errors. The expected input is the default files that are output by the methods that are listed in type.

ADD REPLY
0
Entering edit mode

No, I also have t_data.ctab files for each sample from StringTie output and I am able to form a matrix out of these files through tximport.. But I want two types : raw counts and TPM values, can tximport give those? I do not want FPKM values..

ADD REPLY
0
Entering edit mode

Why are you using type="kallisto" instead of type="stringtie"? I'm really confused here.

ADD REPLY

Login before adding your answer.

Traffic: 774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6