Please could someone advise on how to solve the following error when trying to read in a ballgown object.
Tue Jan 30 13:15:04 2018
Tue Jan 30 13:15:04 2018: Reading linking tables
Tue Jan 30 13:15:05 2018: Reading intron data files
Tue Jan 30 13:15:08 2018: Merging intron data
Tue Jan 30 13:15:09 2018: Reading exon data files
Tue Jan 30 13:15:17 2018: Merging exon data
Tue Jan 30 13:15:18 2018: Reading transcript data files
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 70810 did not have 12 elements
In addition: Warning messages:
1: package ‘dplyr’ was built under R version 3.3.3
2: package ‘devtools’ was built under R version 3.3.3
The traceback was not helpful to me as follows...
> traceback()
8: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
flush = flush, encoding = encoding, skipNul = skipNul)
7: read.table(file, header = TRUE, sep = "\t", colClasses = cc,
quote = "")
6: .readTranscript(f, meas)
5: ballgown(samples = samples, pData = pheno_data) at rnaseq_ballgownLL.R#16
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("rnaseq_ballgownLL.R")
The phenotype data file was read into R correctly as
> pheno_data
ids population
1 HJ China1
2 HL China2
3 LS China3
4 V7 China4
I have checked the gtf files for each sample and cannot find a problem with the format.
The code used was as follows:
Also, I was able to run the tutorial data and read in the bg object correctly, so the problem must relate to my input files somehow.
library(ballgown)
library(RSkittleBrewer)
library(genefilter)
library(dplyr)
library(devtools)
inputdir="C:/Users/leachlj/Documents/Data/Potato/RNA_seq_Round2/BallGownAnalysis/"
pheno_data_file=paste(inputdir,"round2_rnaseq.txt", sep="");
## Read phenotype sample data
pheno_data <- read.csv(pheno_data_file)
## Read in expression data to create a BallGown object
samples<-c("HJ","HL","LS","V7")
bg_pot <- ballgown(samples=samples, pData=pheno_data)
Many thanks for any advice you can give.
Lindsey
Dear Alyssa,
Thank you so much for your rapid reply. The input data was created using stringtie using the same commands in the tutorial procedure from the Pertea paper (2016). This worked well for me on the tutorial data and could be read by ballgown.
I do have all the .ctab files in addition to the gtf for each sample (potato RNAseq), and will try to check their format now and get back to you.
I have figured it out now thanks to your response :-) thank you so much for helping to think it through!
I checked the t_data.ctab files produced by stringtie and they are tab separated with the correct number of fields.
However, at line 70810 it was:
70810 ST4.03ch12 - 4082665 4086448 PGSC0003DMT400000957 5 897 MSTRG.24856 B5 #5 (cytochrome b5 family protein #5) 0.528714 0.065659
I somehow thought that maybe ballgown did not like the # as part of the gene name so I replaced it with "num" as follows:
original gene name: B5 #5 (cytochrome b5 family protein #5)
replacement gene name: B5 num5 (cytochrome b5 family protein num5)
with these changes then the ballgown object is read in no problem :-)