Parsing error creating tx2gene object for tximport
1
0
Entering edit mode
knholm • 0
@knholm-18825
Last seen 3.3 years ago

I am doing DE analysis using DESeq2, and found a new error when I re-ran my commands from tximport to make the raw count file again.

I am getting a new parsing error that results in a failure to select columns from the loaded feature_table to create the tx2gene object.

Based on the error message, it looks like it fails when it encounters genes on the "X" and "Y" chromosomes and is anticipating a double object - the alphabetical object is returning an error?

This did not happen before, and my code has not changed, but I would like to sort it out so I can be sure my results are reproducible.

Below is the code:

>feat_table <- read_tsv('GCF_000001405.39_GRCh38.p13_feature_table.txt')
Parsed with column specification:
cols(
  .default = col_character(),
  chromosome = col_double(),
  start = col_double(),
  end = col_double(),
  `non-redundant_refseq` = col_logical(),
  GeneID = col_double(),
  locus_tag = col_logical(),
  feature_interval_length = col_double(),
  product_length = col_double()
)
See spec(...) for full column specifications.
|=================================================================================| 100%   62 MB
Warning: 12784 parsing failures.
   row        col expected actual                                            file
317601 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317602 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317603 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317604 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
317605 chromosome a double      X 'GCF_000001405.39_GRCh38.p13_feature_table.txt'
...... .......... ........ ...... ...............................................
See problems(...) for more details.

> feat_table <- dplyr::select(feat_table, feat_table$product_accession, feat_table$symbol, feat_table$GeneID)
Error: Can't subset columns that don't exist.
x The columns NA, etc. don't exist.
Run `rlang::last_error()` to see where the error occurred.

The remainder of my code, to create the tx2gene object for use with tximport and my quant.sf (salmon) alignment files is below:

table(quant$Name %in% feat_table$product_accession)
write.table(feat_table, "humantx2gene.tsv", quote = F, row.names = F, sep = "\t")


tx2gene <- read_tsv("humantx2gene.tsv")
colnames(tx2gene)
unique(x = tx2gene$symbol)
tx2gene <- dplyr::select(tx2gene, "product_accession", "symbol")

#load quant.sf files by directory location in the files$files column of samples doc
fx <- tximport(files = files$files, type = "salmon", tx2gene = tx2gene) #, ignoreTxVersion=TRUE) 
fxcounts <- fx$counts

# change rownames for provenance
colnames(fxcounts) <- files$sample

# write to file
write.csv(fxcounts, "temp/4.13.20_3primeTagSeq_Salmon_nonribo_counts_attempttorecreate_01.csv", quote = F, row.names = T)
tximport GCF feature table parsing error chromosome • 774 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I dont have suggestions for parsing the file, so however you can read in a table that matches transcripts to genes, that will work.

ADD COMMENT
0
Entering edit mode

OK I'll continue looking, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6