error in frameCounting in riboSeqR
2
0
Entering edit mode
@alpercelik-9702
Last seen 8.2 years ago

Hi, 

I am working on a ribosome profiling data set and I wanted to give riboSeqR a shot. I have my alignment files as specified in the vignette and the readRibodata worked no problem. 

I wanted to use a gff file for annotations (converted into a Granges object) like below

gff<-read.table("C:/Users/Alper Celik/Documents/analysis files/new-set/clean.gff.txt", header=F, sep="\t", as.is=T)

colnames(gff)<-c("chrom", "source", "type", "start", "end", "score", "strand", "phase", "name")

gff_gr<-makeGRangesFromDataFrame(gff, keep.extra.columns=T, ignore.strand=F, seqnames.field="chrom", start.field="start", end.field="end", strand.field="strand", starts.in.df.are.0based=F)

When I tried the frameCounting function I kept getting an error. I have used portions of this gff like just "gene" or just "CDS" but I always get same error (see below). 

here is a snapshot of a subset gff as a GRanges object

GRanges object with 6600 ranges and 3 metadata columns:
         seqnames             ranges strand   |      source        type        name
            <Rle>          <IRanges>  <Rle>   | <character> <character> <character>
     [1]    chrVI        [ 53,  535]      +   |         SGD        gene     YFL068W
     [2]     chrV        [264, 4097]      -   |         SGD        gene     YEL077C
     [3]    chrII        [280, 2658]      -   |         SGD        gene     YBL113C
     [4]   chrXVI        [280, 6007]      -   |         SGD        gene     YPL283C
     [5]     chrI        [335,  649]      +   |         SGD        gene     YAL069W
     ...      ...                ...    ... ...         ...         ...         ...
  [6596]    chrIV [1523249, 1523611]      +   |         SGD        gene     YDR542W
  [6597]    chrIV [1524634, 1524933]      -   |         SGD        gene     YDR543C
  [6598]    chrIV [1525095, 1525523]      -   |         SGD        gene     YDR544C
  [6599]    chrIV [1526321, 1531711]      +   |         SGD        gene     YDR545W
  [6600]    chrIV [1530863, 1531342]      -   |         SGD        gene   YDR545C-A

and this is the error I'm getting no matter how I try to sort the "gff" file (by chromosome, then start location, by start location alone doesnt matter)

Calling frames...Error in findInterval(spl27.f[[ii]], splfr0e[[ii]]) : 
  'vec' must be sorted non-decreasingly and not contain NAs
In addition: Warning messages:
1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
3: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
4: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
5: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

BTW there are not duplicated values (at least in the $name section)

 

thanks in advance

Alper

 

riboseqr ribosome profiling • 1.6k views
ADD COMMENT
1
Entering edit mode
@thomas-j-hardcastle-3860
Last seen 6.5 years ago
United Kingdom

Sorry, the first query got lost in my inbox. frameCounting is expecting the frame of the coding sequence, relative to the first base of the RNA sequence to be contained in the GRanges object supplied. This can be calculated by taking the start of the coding sequence modulo 3; e.g.

> values(gff_gr)$frame <- start(gff_gr) %% 3

However, I should also point out that riboSeqR is generally working on the assumption that you have aligned to the transcriptome, not the genome, and the GRanges object should be defining the location of coding sequences within the transcriptome, as in the vignette example. If no transcriptome is available for your organism, I suggest using cufflinks/tophat to construct one from your sequencing data and using the findCDS function to define potential coding sequences based on start/stop codon presence. Using genomic alignments and coordinates may lead to undefined behaviours.

Best wishes,

Tom Hardcastle

ADD COMMENT
0
Entering edit mode
@mauricio_a_reynoso-9647
Last seen 6.2 years ago

Hi Alper,

Did you solve the error? Can you share the solution?

Thanks,

Mauricio 

 

ADD COMMENT

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6