Entering edit mode
Frocha
▴
20
@frocha-12039
Last seen 6.6 years ago
I have two fastq files which contain paired-end reads. How can I check whether the reads in the two fastq files are in the same order (i.e., the n-th read in file 1 is paired with the n-th read in file 2)?
Unwinding, I guess
all(id(fq1) == id(fq2) & sread(fq1) == sread(fq2))
?I think the problem is that identical() returns TRUE when used on an 'external pointer', and that is where the XStringSet stores the data. I thought that some safeguard for this had been introduced into R or Biostrings, so I'm either mis-remembering or it's a regression.
I guess they're fastq files so same approach but with
ShortRead::readFastq()
, if the files are large then iterate usingFastqStreamer()
and check identity of each chunk.Thanks to the answers. Now I use the id() function to get two BstringSet objects (each contain the id of reads in a file) and use identical() function to check if they are identical. The results is TRUE, even the two ids of the two paired reads differ by 1 character (1 for the first read and 2 for the second read). What is the explanation for this?
trailing character and readFastq? use something like
subseq(id(rfq), 1, width(id(rfq)) - 1L)
and then test for identity. In general? MaybeBiostrings::stringDist()
.Thanks! Now I use the id() function to get two BstringSet objects (each contain the id of reads in a file) and use identical() function to check if they are identical. The results is TRUE, even the two ids of the two paired reads differ by 1 character (1 for the first read and 2 for the second read, this character is in the middle of the id). What is the explanation for this?
sessionInfo(), especially version of Biostrings? If it's old, you could try as.character() on each of the ids.
the version of Biostrings is 2.42.0.
Do you have a simple reproducible example, e.g., a single ID that illustrates the problem?
Here it is:
> head(subseq(id(fq.1), 1, width(id(fq.1))),1)
A BStringSet instance of length 1
width seq
[1] 45 SRR372747.1.1 B2GA003:1:2:1046:1028 length=36
> head(subseq(id(fq.2), 1, width(id(fq.2))),1)
A BStringSet instance of length 1
width seq
[1] 45 SRR372747.1.2 B2GA003:1:2:1046:1028 length=36
> identical(subseq(id(fq.1), 1, width(id(fq.1))),subseq(id(fq.2), 1, width(id(fq.2))))
[1] TRUE