edgeR processAmplicons not recognizing hairpin sequences
2
1
Entering edit mode
@f516261e
Last seen 4 days ago
United Kingdom

I tried using the processAmplicons function from edgeR where the hairpin sequence is at start in the fastq file and the barcode towards the end. While there are 100% matches with barcodes, I'm getting 0% for the hairpins. However the hairpin sequences are present in the fastq file. As the updated version of this function now allows both structured and variable structures I am not able to specify hairpin and barcode start positions.

Any help with why the hairpin sequences aren't being recognized would be appreciated.

code:

library(edgeR)
barcodefile="GSE139118_Pool1revcompbarcodes.txt",
hairpinfile="GSE139118_Pool1revcompshRNA.txt",
verbose=TRUE)


output:

 -- Number of Barcodes : 20
-- Number of Hairpins : 1911
Number of reads in file SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq : 31232748

The input run parameters are:
-- Barcode in forward read: length 4
-- Hairpin in forward read: length 22
-- Mismatch in barcode/hairpin sequences not allowed.

Total number of read is 31232748
There are 31232748 reads (100.0000 percent) with barcode matches
There are 1 reads (0.0000 percent) with hairpin matches
There are 1 reads (0.0000 percent) with both barcode and hairpin matches
Warning message:
In edgeR::DGEList(counts = hairpinReadsSummary, genes = hairpins) :
library size of zero detected

edgeR • 298 views
1
Entering edit mode
voogd.o ▴ 10
@2203edc4
Last seen 4 days ago
Australia

Thanks for the question.

It looks like the issue was due to processAmplicons assuming that all hairpin sequences appear after the barcode sequences in each read. So, for your situation, with hairpin sequences at the start before the barcodes, processAmplicons failed to search the area of the read which contained the hairpin.

I've made some changes to processAmplicons, and introduced a new argument hairpinBeforeBarcode which will force processAmplicons to search before and after the barcode sequence for a hairpin sequence. Calling the function again as:

processAmplicons(readfile="SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq",
barcodefile="GSE139118_Pool1revcompbarcodes.txt",
hairpinfile="GSE139118_Pool1revcompshRNA.txt",
verbose=TRUE, hairpinBeforeBarcode=TRUE)


Should return a much higher match of hairpin sequences.

These changes are in edgeR release version 3.38.2 and developmental version 3.39.4.

0
Entering edit mode

Thanks for updating the function with this argument. I have tried calling the processAmplicons function again with this version of edgeR as suggested above and I'm getting the following error:

Error in tryCatch({ :
condition handlers must be specified with a condition class


Having looked at the code, I would guess there's an issue with the error function (line 233) after using tryCatch on line 154.

1
Entering edit mode

My fault. I introduced a code error when I committed Oliver Voogd's changes to the public package. Now fixed in edgeR 3.38.3 and 3.39.5.

0
Entering edit mode
@gordon-smyth
Last seen 3 minutes ago
WEHI, Melbourne, Australia

Thanks for the heads-up. We are revising the processAmplicons function so that you will be able to specify the hairpin and barcode start positions. The new version will be available from Bioconductor in a few days.