Good morning to everyone, I'm using the boxplot function as suggested in the RnaSeqEdgeRQL to show a summary graph of the quality of my input .FASTQ files, which contain my RNASeq input reads (Go to the Paragraph 'Accuracy of base-calling'). As you can see in the provided example, the x-axis shows the bases positions of the analysed reads. So, in the suggested example, with reads up to 100 bp in length, the scales ranges from 1 to 100.
In my case, my reads are from Roche454 pyrosequencing, so the reads are longer (up to 400 bp, 150-200 on average).
I used the following code:
QS <- qualityScores("pathofmyfile/FD.fastq")
boxplot(QS, ylab="Quality score", xlab="Base position", main="pathofmyfile/FD.fastq", cex=0.25, col="red")
To get the resulting plot:
But my reads are longer than 150 bp... why aren't all the positions shown? Am I missing something? Could it depend on the previous qualityScores function? I get this warning. I don't think it's related since it involves the y-axis, but I'm reporting it anyway.
qualityScores Rsubread 1.32.4
Scan the input file...
Totally 105191 reads were scanned; the sampling interval is 10.
Now extract read quality information...
Warning: the Phred score offset (33) seems wrong : 0.
Completed successfully. Quality scores for 10000 reads (equally spaced in the file) are returned.
However, the Phred score offset (33) seemed to be wrong. The quality scores can be meaningless.
Any idea why my x-axis isn't complete? Thanks in advance!
By default,
qualityScores
extracts quality scores from 10000 reads in a fastq file. You can letqualityScores
to extract quality scores from all reads in your fastq file by settingnreads=105191
, to see if more base positions will be shown in your boxplot.Hello, I had already tried what you suggested, and oddly enough the x-axis includes a little more bases.. but it's still far from covering the full length range... In addition, the end part of the graph looks slightly different...
I'm going to attach a pastebin file of the first lines of my input FD.fastq file: as you can see, some reads are far longer than 161 bp... you can find it here.
For comparison:
The plot i get with the parameter nreads=105191
The previous one (I have just rerun the script)