Question

error reading ballgown object

0

Entering edit mode

l.j.leach • 0

@ljleach-14914

Last seen 7.4 years ago

University of Birmingham

Please could someone advise on how to solve the following error when trying to read in a ballgown object.

Tue Jan 30 13:15:04 2018
Tue Jan 30 13:15:04 2018: Reading linking tables
Tue Jan 30 13:15:05 2018: Reading intron data files
Tue Jan 30 13:15:08 2018: Merging intron data
Tue Jan 30 13:15:09 2018: Reading exon data files
Tue Jan 30 13:15:17 2018: Merging exon data
Tue Jan 30 13:15:18 2018: Reading transcript data files
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 70810 did not have 12 elements
In addition: Warning messages:
1: package ‘dplyr’ was built under R version 3.3.3
2: package ‘devtools’ was built under R version 3.3.3

The traceback was not helpful to me as follows...
> traceback()
8: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
flush = flush, encoding = encoding, skipNul = skipNul)
7: read.table(file, header = TRUE, sep = "\t", colClasses = cc,
quote = "")
6: .readTranscript(f, meas)
5: ballgown(samples = samples, pData = pheno_data) at rnaseq_ballgownLL.R#16
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("rnaseq_ballgownLL.R")

The phenotype data file was read into R correctly as

> pheno_data
ids population
1 HJ China1
2 HL China2
3 LS China3
4 V7 China4

I have checked the gtf files for each sample and cannot find a problem with the format.

The code used was as follows:

Also, I was able to run the tutorial data and read in the bg object correctly, so the problem must relate to my input files somehow.

library(ballgown)
library(RSkittleBrewer)
library(genefilter)
library(dplyr)
library(devtools)

inputdir="C:/Users/leachlj/Documents/Data/Potato/RNA_seq_Round2/BallGownAnalysis/"
pheno_data_file=paste(inputdir,"round2_rnaseq.txt", sep="");

## Read phenotype sample data
pheno_data <- read.csv(pheno_data_file)

## Read in expression data to create a BallGown object
samples<-c("HJ","HL","LS","V7")
bg_pot <- ballgown(samples=samples, pData=pheno_data)

Many thanks for any advice you can give.

Lindsey

R ballgown • 2.1k views

ADD COMMENT • link 7.4 years ago l.j.leach • 0

0

Entering edit mode

l.j.leach • 0

@ljleach-14914

Last seen 7.4 years ago

University of Birmingham

Dear Alyssa,

Thank you so much for your rapid reply. The input data was created using stringtie using the same commands in the tutorial procedure from the Pertea paper (2016). This worked well for me on the tutorial data and could be read by ballgown.

I do have all the .ctab files in addition to the gtf for each sample (potato RNAseq), and will try to check their format now and get back to you.

ADD COMMENT • link 7.4 years ago l.j.leach • 0

score 2 · Accepted Answer · 2018-01-31

2

Entering edit mode

Alyssa Frazee ▴ 210

@alyssa-frazee-6710

Last seen 4.6 years ago

San Francisco, CA, USA

How was the ballgown input data created? (do you have .ctab files? did you use cufflinks, stringtie, or something else)?

This looks like an issue with the formatting of one of the .ctab files used as ballgown input -- it's expecting TSV, but finding something with characters it can't parse (possibly quotes, too many tabs, etc). If we can figure out that issue, either we can try to make ballgown handle it (since this is a bug if it's valid TSV) or we can be sure the input programs are all producing valid TSVs.

ADD COMMENT • link 7.4 years ago Alyssa Frazee ▴ 210

0

Entering edit mode

Dear Alyssa,

Thank you so much for your rapid reply. The input data was created using stringtie using the same commands in the tutorial procedure from the Pertea paper (2016). This worked well for me on the tutorial data and could be read by ballgown.

I do have all the .ctab files in addition to the gtf for each sample (potato RNAseq), and will try to check their format now and get back to you.

ADD REPLY • link 7.4 years ago l.j.leach • 0

0

Entering edit mode

I have figured it out now thanks to your response :-) thank you so much for helping to think it through!

I checked the t_data.ctab files produced by stringtie and they are tab separated with the correct number of fields.

However, at line 70810 it was:

70810 ST4.03ch12 - 4082665 4086448 PGSC0003DMT400000957 5 897 MSTRG.24856 B5 #5 (cytochrome b5 family protein #5) 0.528714 0.065659

I somehow thought that maybe ballgown did not like the # as part of the gene name so I replaced it with "num" as follows:

original gene name: B5 #5 (cytochrome b5 family protein #5)

replacement gene name: B5 num5 (cytochrome b5 family protein num5)

with these changes then the ballgown object is read in no problem :-)

ADD REPLY • link 7.4 years ago l.j.leach • 0