Boxplot function not showing all reads length on x-axis
1
0
Entering edit mode
Raito92 ▴ 60
@raito92-20399
Last seen 2.5 years ago
Italy

Good morning to everyone, I'm using the boxplot function as suggested in the RnaSeqEdgeRQL to show a summary graph of the quality of my input .FASTQ files, which contain my RNASeq input reads (Go to the Paragraph 'Accuracy of base-calling'). As you can see in the provided example, the x-axis shows the bases positions of the analysed reads. So, in the suggested example, with reads up to 100 bp in length, the scales ranges from 1 to 100.

In my case, my reads are from Roche454 pyrosequencing, so the reads are longer (up to 400 bp, 150-200 on average).

I used the following code:

QS <- qualityScores("pathofmyfile/FD.fastq")
boxplot(QS, ylab="Quality score", xlab="Base position", main="pathofmyfile/FD.fastq", cex=0.25, col="red")

To get the resulting plot:

enter image description here

But my reads are longer than 150 bp... why aren't all the positions shown? Am I missing something? Could it depend on the previous qualityScores function? I get this warning. I don't think it's related since it involves the y-axis, but I'm reporting it anyway.

qualityScores Rsubread 1.32.4

Scan the input file...
Totally 105191 reads were scanned; the sampling interval is 10.
Now extract read quality information...
Warning: the Phred score offset (33) seems wrong : 0.

Completed successfully. Quality scores for 10000 reads (equally spaced in the file) are returned.
However, the Phred score offset (33) seemed to be wrong. The quality scores can be meaningless.

Any idea why my x-axis isn't complete? Thanks in advance!

boxplot reads fastq sequencing rsubread • 1.8k views
ADD COMMENT
1
Entering edit mode

By default, qualityScores extracts quality scores from 10000 reads in a fastq file. You can let qualityScores to extract quality scores from all reads in your fastq file by setting nreads=105191, to see if more base positions will be shown in your boxplot.

QS <- qualityScores("pathofmyfile/FD.fastq", nreads=105191)
ADD REPLY
0
Entering edit mode

Hello, I had already tried what you suggested, and oddly enough the x-axis includes a little more bases.. but it's still far from covering the full length range... In addition, the end part of the graph looks slightly different...

I'm going to attach a pastebin file of the first lines of my input FD.fastq file: as you can see, some reads are far longer than 161 bp... you can find it here.

For comparison:

The plot i get with the parameter nreads=105191

enter image description here

The previous one (I have just rerun the script)

enter image description here

ADD REPLY
1
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne

The problem is caused by the truncation of columns when qualityScores reads in the Phred score file generated by the C code. The number of columns in the returned score matrix is incorrectly determined by length of the first read in the fastq file, leading to longer reads being truncated. We will release a fixed version soon.

ADD COMMENT

Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6