Question: Error in frameCounting using riboSeqR
0
gravatar for yingyingguo0228
2.1 years ago by
yingyingguo02280 wrote:

hi

I am using riboSeqR to analyze ribosome profling data. However, after framecounting, all the frame counts are 0.

Here is what I have done from the beginning. I removed the linker sequence and timmed the reads first, then align all the reads to ncRNA.fq and collect unaligned reads. Then I used bowtie to align all the norrna reads to human transcriptome. After alignment, I used riboSeqR.

Below is my R console.  Anyone can tell me what is going on? I will really appreciate. 

 

> library("riboSeqR")

> humanFasta <- paste("/Users/yingying/Desktop/Johnlab/Exps/Exp1/humancdna.fa")

> fastaCDS <- findCDS(fastaFile=humanFasta,startCodon=c("ATG"),stopCodon=c("TAG","TAA","TGA"))

Read 5174699 items

> ribofiles <- paste("/Users/yingying/Desktop/Johnlab/Exps/Exp1/A549Con.bowtie.out")

> riboDat <- readRibodata ( ribofiles, replicates=c("con"))

Reading ribosomal files....done!

> fCs <- frameCounting(riboDat, fastaCDS, lengths=24:33)

Calling frames..........done!

> fS <- readingFrame(rC=fCs)

> fS

         26 27 28 29 30

          0  0  0  0  0

          0  0  0  0  0

          0  0  0  0  0

frame.ML  0  0  0  0  0

> fastaCDS

GRanges object with 2865412 ranges and 6 metadata columns:

                      seqnames       ranges strand |     frame  startCodon   stopCodon     context      minus3       plus1

                         <Rle>    <IRanges>  <Rle> | <numeric> <character> <character> <character> <character> <character>

        [1] ENST00000603326.1      [ 6, 19]      * |         2         ATG         TAC     ACTATGG           A           G

        [2] ENST00000390583.1      [ 9, 31]      * |         2         ATG         AAC     ACTATGG           A           G

        [3] ENST00000390575.1      [13, 20]      * |         0         ATG         TAC     GCTATGG           G           G

        [4] ENST00000390571.1      [ 9, 31]      * |         2         ATG         TAC     ACTATGA           A           A

        [5] ENST00000390588.1      [13, 20]      * |         0         ATG         TAC     GCTATGG           G           G

        ...                ...          ...    ... .       ...         ...         ...         ...         ...         ...

  [2865408] ENST00000638565.1  [ 872,  892]      * |         1         ATG         TGA     CAGATGC           C           C

  [2865409] ENST00000638565.1  [1102, 1113]      * |         0         ATG         TAA     AAAATGC           A           C

  [2865410] ENST00000638565.1  [1162, 1185]      * |         0         ATG         TGA     CTGATGA           C           A

  [2865411] ENST00000638565.1  [1294, 1331]      * |         0         ATG         TGA     CAAATGC           C           C

  [2865412] ENST00000638565.1  [1328, 1331]      * |         1         ATG         TGA     CCCATGA           C           A

  -------

  seqinfo: 180869 sequences from an unspecified genome; no seqlengths

> readRibodata ( ribofiles, replicates=c("con"))

Reading ribosomal files....done!

An object of class "riboData"

Slot "riboGR":

$A549Con.bowtie.out

GRanges object with 621370 ranges and 0 metadata columns:

                     seqnames     ranges strand

                        <Rle>  <IRanges>  <Rle>

       [1] ENST00000265620.11 [152, 181]      +

       [2]  ENST00000592420.1 [129, 157]      +

       [3]  ENST00000620941.1 [447, 475]      +

       [4]  ENST00000616743.1 [395, 423]      +

       [5]  ENST00000573216.5 [ 68,  96]      +

       ...                ...        ...    ...

  [621366]  ENST00000392730.2 [433, 462]      +

  [621367]  ENST00000571730.1 [ 19,  48]      +

  [621368]  ENST00000351989.7 [ 43,  71]      +

  [621369]  ENST00000368984.7 [170, 198]      +

  [621370]  ENST00000377698.3 [750, 777]      +

  -------

  seqinfo: 63130 sequences from an unspecified genome; no seqlengths

Slot "rnaGR":

list()

Slot "replicates":

[1] con

Levels: con

Thanks a lot!

Yingying

R riboseqr riboseq • 623 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by yingyingguo02280
Answer: Error in frameCounting using riboSeqR
0
gravatar for Thomas J Hardcastle
2.1 years ago by
United Kingdom
Thomas J Hardcastle180 wrote:

Hi Yingying,

This almost always happens because of a mismatch between the headers in the fasta file and the gene names in the bowtie file - which in turn, almost always happens because bowtie drops everything after the first space in the header file, whereas riboSeqR uses the full header. I suggest you take a look at the fasta file and see if you can update it so that it uses the same names as the bowtie output.

Best wishes,

Tom

ADD COMMENTlink written 2.1 years ago by Thomas J Hardcastle180
Answer: Error in frameCounting using riboSeqR
0
gravatar for yingyingguo0228
2.1 years ago by
yingyingguo02280 wrote:

Hi Tom,

Thank you for your reply!

At the begining, I found people having the same problems as me. They fixed their problem by making header name of their fastq file and bowtie output the same. so I checked mine and noticed that the header of my fastaq file is different from bowtie output.  So I used command

"sed 's/cdna.*$//' Homo_sapiens.GRCh38.cdna.all.fa > humancdna.fa"  to remove all the information after the first space in header of fastq. 

Here is the old header of fastq file. 

>ENST00000448914.1 cdna chromosome:GRCh38:14:22449113:22449125:1 gene:ENSG00000228985.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD3 description:T-cell receptor delta diversity 3 [Source:HGNC Symbol;Acc:HGNC:12256]​

after sed, the header is 

>ENST00000448914.1​

This is the same as my bowtie output.

However, it still gives me 0 for framecounting. 

Did I do something wrong?  

I really appreciate for your help 

Best,

Yingying

ADD COMMENTlink written 2.1 years ago by yingyingguo02280
Answer: Error in frameCounting using riboSeqR
0
gravatar for yingyingguo0228
2.1 years ago by
yingyingguo02280 wrote:

Hi Tom,

I just tried again. It turns out that there is a blank space after each header of fastq file. So I should use command

"sed 's/ cdna.*$//' Homo_sapiens.GRCh38.cdna.all.fa > humancdna.fa"​ instead of 

"sed 's/cdna.*$//' Homo_sapiens.GRCh38.cdna.all.fa > humancdna.fa"​. 

I did not realize that this blank space would cause such a big difference. 

Thank you so much for your suggestion! I am new to bioinformatic and this is the first time I post question here. I did not expect that you reply so fast. Thanks agian!

Best,

Yingying

ADD COMMENTlink written 2.1 years ago by yingyingguo02280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 383 users visited in the last hour