Question

Error in frameCounting using riboSeqR

0

Entering edit mode

yingyingguo0228 • 0

@yingyingguo0228-13829

Last seen 6.7 years ago

hi

I am using riboSeqR to analyze ribosome profling data. However, after framecounting, all the frame counts are 0.

Here is what I have done from the beginning. I removed the linker sequence and timmed the reads first, then align all the reads to ncRNA.fq and collect unaligned reads. Then I used bowtie to align all the norrna reads to human transcriptome. After alignment, I used riboSeqR.

Below is my R console. Anyone can tell me what is going on? I will really appreciate.

> library("riboSeqR")

> humanFasta <- paste("/Users/yingying/Desktop/Johnlab/Exps/Exp1/humancdna.fa")

> fastaCDS <- findCDS(fastaFile=humanFasta,startCodon=c("ATG"),stopCodon=c("TAG","TAA","TGA"))

Read 5174699 items

> ribofiles <- paste("/Users/yingying/Desktop/Johnlab/Exps/Exp1/A549Con.bowtie.out")

> riboDat <- readRibodata ( ribofiles, replicates=c("con"))

Reading ribosomal files....done!

> fCs <- frameCounting(riboDat, fastaCDS, lengths=24:33)

Calling frames..........done!

> fS <- readingFrame(rC=fCs)

> fS

26 27 28 29 30

0 0 0 0 0

frame.ML 0 0 0 0 0

> fastaCDS

GRanges object with 2865412 ranges and 6 metadata columns:

seqnames ranges strand | frame startCodon stopCodon context minus3 plus1

[1] ENST00000603326.1 [ 6, 19] * | 2 ATG TAC ACTATGG A G

[2] ENST00000390583.1 [ 9, 31] * | 2 ATG AAC ACTATGG A G

[3] ENST00000390575.1 [13, 20] * | 0 ATG TAC GCTATGG G G

[4] ENST00000390571.1 [ 9, 31] * | 2 ATG TAC ACTATGA A A

[5] ENST00000390588.1 [13, 20] * | 0 ATG TAC GCTATGG G G

... ... ... ... . ... ... ... ... ... ...

[2865408] ENST00000638565.1 [ 872, 892] * | 1 ATG TGA CAGATGC C C

[2865409] ENST00000638565.1 [1102, 1113] * | 0 ATG TAA AAAATGC A C

[2865410] ENST00000638565.1 [1162, 1185] * | 0 ATG TGA CTGATGA C A

[2865411] ENST00000638565.1 [1294, 1331] * | 0 ATG TGA CAAATGC C C

[2865412] ENST00000638565.1 [1328, 1331] * | 1 ATG TGA CCCATGA C A

-------

seqinfo: 180869 sequences from an unspecified genome; no seqlengths

> readRibodata ( ribofiles, replicates=c("con"))

Reading ribosomal files....done!

An object of class "riboData"

Slot "riboGR":

$A549Con.bowtie.out

GRanges object with 621370 ranges and 0 metadata columns:

seqnames ranges strand

[1] ENST00000265620.11 [152, 181] +

[2] ENST00000592420.1 [129, 157] +

[3] ENST00000620941.1 [447, 475] +

[4] ENST00000616743.1 [395, 423] +

[5] ENST00000573216.5 [ 68, 96] +

... ... ... ...

[621366] ENST00000392730.2 [433, 462] +

[621367] ENST00000571730.1 [ 19, 48] +

[621368] ENST00000351989.7 [ 43, 71] +

[621369] ENST00000368984.7 [170, 198] +

[621370] ENST00000377698.3 [750, 777] +

-------

seqinfo: 63130 sequences from an unspecified genome; no seqlengths

Slot "rnaGR":

list()

Slot "replicates":

[1] con

Levels: con

Thanks a lot!

Yingying

riboseq R riboseqr • 1.4k views

ADD COMMENT • link 6.7 years ago yingyingguo0228 • 0

score 0 · Answer 1 · 2017-08-25

Hi Yingying,

This almost always happens because of a mismatch between the headers in the fasta file and the gene names in the bowtie file - which in turn, almost always happens because bowtie drops everything after the first space in the header file, whereas riboSeqR uses the full header. I suggest you take a look at the fasta file and see if you can update it so that it uses the same names as the bowtie output.

Best wishes,

Tom

score 0 · Answer 2 · 2017-08-25

Hi Tom,

Thank you for your reply!

At the begining, I found people having the same problems as me. They fixed their problem by making header name of their fastq file and bowtie output the same. so I checked mine and noticed that the header of my fastaq file is different from bowtie output. So I used command

"sed 's/cdna.*$//' Homo_sapiens.GRCh38.cdna.all.fa > humancdna.fa" to remove all the information after the first space in header of fastq.

Here is the old header of fastq file.

>ENST00000448914.1 cdna chromosome:GRCh38:14:22449113:22449125:1 gene:ENSG00000228985.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD3 description:T-cell receptor delta diversity 3 [Source:HGNC Symbol;Acc:HGNC:12256]

after sed, the header is

>ENST00000448914.1

This is the same as my bowtie output.

However, it still gives me 0 for framecounting.

Did I do something wrong?

I really appreciate for your help

Best,

Yingying

score 0 · Answer 3 · 2017-08-25

Hi Tom,

I just tried again. It turns out that there is a blank space after each header of fastq file. So I should use command

"sed 's/ cdna.*$//' Homo_sapiens.GRCh38.cdna.all.fa > humancdna.fa" instead of

"sed 's/cdna.*$//' Homo_sapiens.GRCh38.cdna.all.fa > humancdna.fa".

I did not realize that this blank space would cause such a big difference.

Thank you so much for your suggestion! I am new to bioinformatic and this is the first time I post question here. I did not expect that you reply so fast. Thanks agian!

Best，

Yingying