When trimming a DNAStringSet containing qualites using DNAStringSet(x, start=, end=)
, the
quality information is lost:
library(Biostrings)
library(ShortRead) # For example data only
fl <- system.file(package="ShortRead", "extdata", "E-MTAB-1147","ERR127302_1_subset.fastq.gz")
fq <- readDNAStringSet(fl, format="fastq", with.qualities=TRUE)[1:4]
print(class(elementMetadata(fq)))
# [1] "DFrame"
# attr(,"package")
# [1] "S4Vectors"
fq.trimmed <- DNAStringSet(fq, start=30, end=50)
print(class(elementMetadata(fq.trimmed)))
# [1] "NULL"
The quality information is present after reading the data with readDNAStringSet()
, after trimming
the elementMetadata
slot is empty.
It would be helpfull, if either a warning is issued that the quality information is discarded, or - even better -
if the quality information is trimmed as well.
The environment is as follows:
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.52.0 GenomicAlignments_1.30.0 SummarizedExperiment_1.24.0
[4] Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
[7] Rsamtools_2.10.0 GenomicRanges_1.46.1 BiocParallel_1.28.2
[10] Biostrings_2.62.0 GenomeInfoDb_1.30.0 XVector_0.34.0
[13] IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.40.0 lattice_0.20-45 jpeg_0.1-9 hwriter_1.3.2
[5] tools_4.1.2 parallel_4.1.2 grid_4.1.2 png_0.1-7
[9] latticeExtra_0.6-29 crayon_1.4.2 Matrix_1.3-4 GenomeInfoDbData_1.2.7
[13] RColorBrewer_1.1-2 bitops_1.0-7 RCurl_1.98-1.5 DelayedArray_0.20.0
[17] compiler_4.1.2
Thank you for your answer.
I am not sure why the creation of a new
DNAStringSet
object from an existing one should make any difference. IMHO, expected behaviour ofDNAStringSet(x, start=, end=)
would be to modify the quality strings in the same way as the sequences and not to discard them silently.BTW:
narrow()
does create a newDNAStringSet
object as well, but keeps the qualities (albeit in an incorrect form):As a workaround, I implemented a function to do that and additionally decided to post the issue on the Bioconductor support site so that others are aware of it.
All in all your answer leaves me with the feeling of not being welcome and somehow intimidated to post further questions/comments.
I am sorry if I made you feel intimidated. I was actually asking a question! You used a constructor function, which by definition makes a new thing. If you are making a new thing, should there be a warning that the new thing is different from the old thing you made it from? I mean, it's a new thing and all. That's different from using a transformation function like
narrow
orsubseq
, which are intended to transform an existing thing.But maybe I approach things differently? When using a function, the first thing I do is read the help page to see what the developer says it should do. I sort of assume that's what most people do, and I made the possibly unwarranted assumption that you had read the help page and I was confused that you expected a constructor function (which doesn't even have an argument for the
mcols
slot) to do or say anything about themcols
slot of the object you started with.Way back in the day I posted something on the old Bioconductor listserv about the ellipsis argument and made a statement about how (I thought) it works. Gordon Smyth replied with a question 'Why do you think that?', which I took to be an insult to my intelligence. But then I thought about it for a while and decided that he was actually asking me why I thought that, and if I told him my thought process he was offering to explain where I was going wrong. You should take my question in the same spirit.
As a side note, that's not really how you use
tracemem
. Instead it's something like this:Where you can see that narrowing the object does make copies, but that using the constructor does all that, plus makes an entirely new thing.
Actually
DNAStringSet(x, start=, end=)
callsnarrow()
internally so we should fixnarrow()
.@gerhard-thallinger-1552: Can you please open an issue here with a minimal reproducible example and
sessionInfo()
? Thanks!Best,
H.
Done. You can find them here and here.
Best,
Gerhard
Thanks. See discussion and proposed resolution in issue #61.