edgeR processAmplicons not recognizing hairpin sequences
2
1
Entering edit mode
@f516261e
Last seen 3.1 years ago
United Kingdom

I tried using the processAmplicons function from edgeR where the hairpin sequence is at start in the fastq file and the barcode towards the end. While there are 100% matches with barcodes, I'm getting 0% for the hairpins. However the hairpin sequences are present in the fastq file. As the updated version of this function now allows both structured and variable structures I am not able to specify hairpin and barcode start positions.

Any help with why the hairpin sequences aren't being recognized would be appreciated.

code:

library(edgeR)
processAmplicons(readfile="SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq",
                   barcodefile="GSE139118_Pool1revcompbarcodes.txt", 
                   hairpinfile="GSE139118_Pool1revcompshRNA.txt",
                   verbose=TRUE)

output:

 -- Number of Barcodes : 20
 -- Number of Hairpins : 1911
Processing reads in SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq.
 -- Processing 10 million reads
 -- Processing 20 million reads
 -- Processing 30 million reads
 -- Processing 40 million reads
Number of reads in file SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq : 31232748

The input run parameters are: 
 -- Barcode in forward read: length 4
 -- Hairpin in forward read: length 22
 -- Mismatch in barcode/hairpin sequences not allowed. 

Total number of read is 31232748 
There are 31232748 reads (100.0000 percent) with barcode matches
There are 1 reads (0.0000 percent) with hairpin matches
There are 1 reads (0.0000 percent) with both barcode and hairpin matches
Warning message:
In edgeR::DGEList(counts = hairpinReadsSummary, genes = hairpins) :
  library size of zero detected
edgeR • 1.4k views
ADD COMMENT
1
Entering edit mode
voogd.o ▴ 10
@2203edc4
Last seen 3.2 years ago
Australia

Thanks for the question.

It looks like the issue was due to processAmplicons assuming that all hairpin sequences appear after the barcode sequences in each read. So, for your situation, with hairpin sequences at the start before the barcodes, processAmplicons failed to search the area of the read which contained the hairpin.

I've made some changes to processAmplicons, and introduced a new argument hairpinBeforeBarcode which will force processAmplicons to search before and after the barcode sequence for a hairpin sequence. Calling the function again as:

processAmplicons(readfile="SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq",
               barcodefile="GSE139118_Pool1revcompbarcodes.txt", 
               hairpinfile="GSE139118_Pool1revcompshRNA.txt",
               verbose=TRUE, hairpinBeforeBarcode=TRUE)

Should return a much higher match of hairpin sequences.

These changes are in edgeR release version 3.38.2 and developmental version 3.39.4.

Hopefully this solves your issue!

ADD COMMENT
0
Entering edit mode

Thanks for updating the function with this argument. I have tried calling the processAmplicons function again with this version of edgeR as suggested above and I'm getting the following error:

Error in tryCatch({ : 
  condition handlers must be specified with a condition class

Having looked at the code, I would guess there's an issue with the error function (line 233) after using tryCatch on line 154.

ADD REPLY
1
Entering edit mode

My fault. I introduced a code error when I committed Oliver Voogd's changes to the public package. Now fixed in edgeR 3.38.3 and 3.39.5.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

Thanks for the heads-up. We are revising the processAmplicons function so that you will be able to specify the hairpin and barcode start positions. The new version will be available from Bioconductor in a few days.

ADD COMMENT

Login before adding your answer.

Traffic: 1116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6