tximport error with rsem files
1
0
Entering edit mode
apolitics • 0
@633e8cff
Last seen 3 months ago
Romania

Dear community, I have encountered a problem when trying to import the data using tximport

From what I know about the data, it is obtained through RSEM technology, but the columns are a little different from what I read in the forum. First of all, I have 100 files of .gene.fpkm.csv and another 100 of .transcript.fpkm.csv "nature", which from the beginning are rather different from the tximport and tximportData vignette files.

What do I mean?

  1. Files in tximportData look like .genes.results.gz and .isoforms.results.gz
  2. my files, even if they are .csv are tab-delimited
  3. they don't have an "effective_length" column only the "length" column, no "TPM" column only "FPKM" column, and the "gene_id" column is a number (question 1 - does the "gene_id" column have to be character datatype? question 2 - is there a problem that I don't have the "effective_length" column?
  4. Header looks like this: gene_id transcript_id(s) length expected_count FPKM SymbolID Cellular Component Molecular Function Biological Process Kegg Orthology Nr Description Desc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
  5. str() -- of any file is very convoluted and not smooth like tximportData files - if it is necessary I'll show
  6. Do these files need to be cleaned more?? If Yes, how?
  7. I know that the tximport pipeline is recommended, but if I can extract "expected_count" columns from the files using another method and use them with DESeqDataSetFromMatrix, is it a problem? (some other guy told me that the expected_count column is not the same as raw_counts so it's not the same).

library(readr)
library(tximportData)
library(tximport)


dir <- "data/CountData"
list.files(dir)

samples <- read.csv(file.path("data", "colData.csv"), header = TRUE)
samples

files <- file.path(dir, paste0(samples$Sample, "_PBMC_24hLPS.gene.fpkm.csv"))
files

names(files) <- paste0(samples$Sample)

txi.rsem <- tximport(files, type = "none", txIn = F, txOut = F, 
                     geneIdCol = "gene_id", abundanceCol = "FPKM", 
                     lengthCol = "length", countsCol = "expected_count")
output:
reading in files with read_tsv
1  --- >"silent!!!"


txi.rsem <- tximport(files, type = "rsem", txIn = F, txOut = F)

output:
reading in files with read_tsv
1 Error in computeRsemGeneLevel(files, importer, geneIdCol, abundanceCol,  : 
  all(c(geneIdCol, abundanceCol, lengthCol) %in% names(raw)) is not TRUE
In addition: Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat)
rsem tximport • 329 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.0k
@atpoint-13662
Last seen 20 hours ago
Germany

they don't have an "effective_length" column only the "length" column, no "TPM" column only "FPKM" column, and the "gene_id" column is a number (question 1 - does the "gene_id" column have to be character datatype? question 2 - is there a problem that I don't have the "effective_length" column?

tximport expects input data as produced by the respective tools that are supported. You seem to have some custom format here with additional columns, and columns renamed.

Do these files need to be cleaned more?? If Yes, how?

I don't think it is the idea of the support site to provide code for custom data cleaning. Use RSEM output if you want to use tximport.

I know that the tximport pipeline is recommended, but if I can extract "expected_count" columns from the files using another method and use them with DESeqDataSetFromMatrix, is it a problem? (some other guy told me that the expected_count column is not the same as raw_counts so it's not the same).

Yes, rounded gene-level expected counts from RSEM was previously recommended to use for DESeq2 if you don't have anything else like tximport output.

See for example: Applying DESeq on RSEM output

ADD COMMENT

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6