I tried using the processAmplicons function from edgeR where the hairpin sequence is at start in the fastq file and the barcode towards the end. While there are 100% matches with barcodes, I'm getting 0% for the hairpins. However the hairpin sequences are present in the fastq file. As the updated version of this function now allows both structured and variable structures I am not able to specify hairpin and barcode start positions.
Any help with why the hairpin sequences aren't being recognized would be appreciated.
library(edgeR) processAmplicons(readfile="SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq", barcodefile="GSE139118_Pool1revcompbarcodes.txt", hairpinfile="GSE139118_Pool1revcompshRNA.txt", verbose=TRUE)
-- Number of Barcodes : 20 -- Number of Hairpins : 1911 Processing reads in SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq. -- Processing 10 million reads -- Processing 20 million reads -- Processing 30 million reads -- Processing 40 million reads Number of reads in file SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq : 31232748 The input run parameters are: -- Barcode in forward read: length 4 -- Hairpin in forward read: length 22 -- Mismatch in barcode/hairpin sequences not allowed. Total number of read is 31232748 There are 31232748 reads (100.0000 percent) with barcode matches There are 1 reads (0.0000 percent) with hairpin matches There are 1 reads (0.0000 percent) with both barcode and hairpin matches Warning message: In edgeR::DGEList(counts = hairpinReadsSummary, genes = hairpins) : library size of zero detected