How to count barcode sequences in a fastq file
0
0
Entering edit mode
@yuragrabovska-9835
Last seen 2.4 years ago
United Kingdom

Hi,

I have a number of single end fastq files which contain sequencing from a barcoding experiment. I have a large list of barcodes (~120,000) and I want to count the number of exact barcode matches in the fastq files.

I have been looking into the ShortRead package but I'm not entirely sure if it's the right tool for this as I can't figure out how to use it to do this.

Can anyone suggest a way I can get counts for exact matches in R

fastq barcoding • 3.1k views
ADD COMMENT
0
Entering edit mode

What do you mean by "barcoding experiment"? Something like sequencing reads that contains a barcode, like a CRISPRi screen? If so then it probably comes down to making a fasta file with all barcode sequences and end-to-end alignment with something like bowtie2 with penalty parameters set to a high value like 10000 so only perfect end-to-end matches will get aligned, and everything else will go unmapped. In R directly probably the Rsubread package can do that, but it is basically a one-liner in bash to run bowtie2. Then you could use something like featureCounts to count reads per barcode.

ADD REPLY
0
Entering edit mode

Hi, you can do this is bash with a one liner,

assuming the barcode is, GTGAAA, here I count the first 4000 but if you eliminate the head pipe it will count the entire file.

count first 4000

gunzip -c test_R1.fastq.gz | head -4000 | grep 0:GTGAAA | wc -l;

count entire file ( takes a least a few mins for a typical rnaseq file)

gunzip -c test_R1.fastq.gz | grep 0:GTGAAA | wc -l

A

ADD REPLY
0
Entering edit mode

You may be able to do it with umi_tools extract: https://umi-tools.readthedocs.io/en/latest/reference/extract.html

This question is more suited to a general forum like Biostars or Bioinformatics Stack Exchange.

ADD REPLY

Login before adding your answer.

Traffic: 754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6