Search
Question: error in frameCounting in riboSeqR
0
gravatar for alper.celik
2.8 years ago by
alper.celik0 wrote:

Hi, 

I am working on a ribosome profiling data set and I wanted to give riboSeqR a shot. I have my alignment files as specified in the vignette and the readRibodata worked no problem. 

I wanted to use a gff file for annotations (converted into a Granges object) like below

gff<-read.table("C:/Users/Alper Celik/Documents/analysis files/new-set/clean.gff.txt", header=F, sep="\t", as.is=T)

colnames(gff)<-c("chrom", "source", "type", "start", "end", "score", "strand", "phase", "name")

gff_gr<-makeGRangesFromDataFrame(gff, keep.extra.columns=T, ignore.strand=F, seqnames.field="chrom", start.field="start", end.field="end", strand.field="strand", starts.in.df.are.0based=F)

When I tried the frameCounting function I kept getting an error. I have used portions of this gff like just "gene" or just "CDS" but I always get same error (see below). 

here is a snapshot of a subset gff as a GRanges object

GRanges object with 6600 ranges and 3 metadata columns:
         seqnames             ranges strand   |      source        type        name
            <Rle>          <IRanges>  <Rle>   | <character> <character> <character>
     [1]    chrVI        [ 53,  535]      +   |         SGD        gene     YFL068W
     [2]     chrV        [264, 4097]      -   |         SGD        gene     YEL077C
     [3]    chrII        [280, 2658]      -   |         SGD        gene     YBL113C
     [4]   chrXVI        [280, 6007]      -   |         SGD        gene     YPL283C
     [5]     chrI        [335,  649]      +   |         SGD        gene     YAL069W
     ...      ...                ...    ... ...         ...         ...         ...
  [6596]    chrIV [1523249, 1523611]      +   |         SGD        gene     YDR542W
  [6597]    chrIV [1524634, 1524933]      -   |         SGD        gene     YDR543C
  [6598]    chrIV [1525095, 1525523]      -   |         SGD        gene     YDR544C
  [6599]    chrIV [1526321, 1531711]      +   |         SGD        gene     YDR545W
  [6600]    chrIV [1530863, 1531342]      -   |         SGD        gene   YDR545C-A

and this is the error I'm getting no matter how I try to sort the "gff" file (by chromosome, then start location, by start location alone doesnt matter)

Calling frames...Error in findInterval(spl27.f[[ii]], splfr0e[[ii]]) : 
  'vec' must be sorted non-decreasingly and not contain NAs
In addition: Warning messages:
1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
3: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
4: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
5: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

BTW there are not duplicated values (at least in the $name section)

 

thanks in advance

Alper

 

ADD COMMENTlink modified 2.8 years ago by Thomas J Hardcastle180 • written 2.8 years ago by alper.celik0
1
gravatar for Thomas J Hardcastle
2.8 years ago by
United Kingdom
Thomas J Hardcastle180 wrote:

Sorry, the first query got lost in my inbox. frameCounting is expecting the frame of the coding sequence, relative to the first base of the RNA sequence to be contained in the GRanges object supplied. This can be calculated by taking the start of the coding sequence modulo 3; e.g.

> values(gff_gr)$frame <- start(gff_gr) %% 3

However, I should also point out that riboSeqR is generally working on the assumption that you have aligned to the transcriptome, not the genome, and the GRanges object should be defining the location of coding sequences within the transcriptome, as in the vignette example. If no transcriptome is available for your organism, I suggest using cufflinks/tophat to construct one from your sequencing data and using the findCDS function to define potential coding sequences based on start/stop codon presence. Using genomic alignments and coordinates may lead to undefined behaviours.

Best wishes,

Tom Hardcastle

ADD COMMENTlink written 2.8 years ago by Thomas J Hardcastle180
0
gravatar for mauricio_a_reynoso
2.8 years ago by
mauricio_a_reynoso0 wrote:

Hi Alper,

Did you solve the error? Can you share the solution?

Thanks,

Mauricio 

 

ADD COMMENTlink written 2.8 years ago by mauricio_a_reynoso0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 294 users visited in the last hour