Question

error in frameCounting in riboSeqR

0

Entering edit mode

alper.celik • 0

@alpercelik-9702

Last seen 8.2 years ago

Hi,

I am working on a ribosome profiling data set and I wanted to give riboSeqR a shot. I have my alignment files as specified in the vignette and the readRibodata worked no problem.

I wanted to use a gff file for annotations (converted into a Granges object) like below

gff<-read.table("C:/Users/Alper Celik/Documents/analysis files/new-set/clean.gff.txt", header=F, sep="\t", as.is=T)

colnames(gff)<-c("chrom", "source", "type", "start", "end", "score", "strand", "phase", "name")

gff_gr<-makeGRangesFromDataFrame(gff, keep.extra.columns=T, ignore.strand=F, seqnames.field="chrom", start.field="start", end.field="end", strand.field="strand", starts.in.df.are.0based=F)

When I tried the frameCounting function I kept getting an error. I have used portions of this gff like just "gene" or just "CDS" but I always get same error (see below).

here is a snapshot of a subset gff as a GRanges object

GRanges object with 6600 ranges and 3 metadata columns:
seqnames ranges strand | source type name
<Rle> <IRanges> <Rle> | <character> <character> <character>
[1] chrVI [ 53, 535] + | SGD gene YFL068W
[2] chrV [264, 4097] - | SGD gene YEL077C
[3] chrII [280, 2658] - | SGD gene YBL113C
[4] chrXVI [280, 6007] - | SGD gene YPL283C
[5] chrI [335, 649] + | SGD gene YAL069W
... ... ... ... ... ... ... ...
[6596] chrIV [1523249, 1523611] + | SGD gene YDR542W
[6597] chrIV [1524634, 1524933] - | SGD gene YDR543C
[6598] chrIV [1525095, 1525523] - | SGD gene YDR544C
[6599] chrIV [1526321, 1531711] + | SGD gene YDR545W
[6600] chrIV [1530863, 1531342] - | SGD gene YDR545C-A

and this is the error I'm getting no matter how I try to sort the "gff" file (by chromosome, then start location, by start location alone doesnt matter)

Calling frames...Error in findInterval(spl27.f[[ii]], splfr0e[[ii]]) :
'vec' must be sorted non-decreasingly and not contain NAs
In addition: Warning messages:
1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
3: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
4: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
5: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated

BTW there are not duplicated values (at least in the $name section)

thanks in advance

Alper

riboseqr ribosome profiling • 1.6k views

ADD COMMENT • link updated 8.1 years ago by Thomas J Hardcastle ▴ 180 • written 8.2 years ago by alper.celik • 0

score 1 · Answer 1 · 2016-03-09

Sorry, the first query got lost in my inbox. frameCounting is expecting the frame of the coding sequence, relative to the first base of the RNA sequence to be contained in the GRanges object supplied. This can be calculated by taking the start of the coding sequence modulo 3; e.g.

> values(gff_gr)$frame <- start(gff_gr) %% 3

However, I should also point out that riboSeqR is generally working on the assumption that you have aligned to the transcriptome, not the genome, and the GRanges object should be defining the location of coding sequences within the transcriptome, as in the vignette example. If no transcriptome is available for your organism, I suggest using cufflinks/tophat to construct one from your sequencing data and using the findCDS function to define potential coding sequences based on start/stop codon presence. Using genomic alignments and coordinates may lead to undefined behaviours.

Best wishes,

Tom Hardcastle

score 0 · Answer 2 · 2016-03-04

0

Entering edit mode

mauricio_a_reynoso • 0

@mauricio_a_reynoso-9647

Last seen 6.2 years ago

Hi Alper,

Did you solve the error? Can you share the solution?

Thanks,

Mauricio

ADD COMMENT • link 8.1 years ago mauricio_a_reynoso • 0