Using edgeR to analyze Cripsr/Cas9 Screening data
I am having trouble understanding the workflow described in the paper. I have a data set of four fastq files from a crispr/cas9 screening experiment as well as a fasta file of the sgRNA used in the analysis (for an example see below). The experiment uses single-indexing strategy with two control samples and two treated samples.

I am not sure how to use the workflow described in the paper and the R vignette to analyze my data. I have read both the paper and the vignette, but I still don't understand how to adapt my data to the example in there.

I know, I need to trim the fastq files to contain only the sgRNA part of the reads.

What I don;t understnad are the two text files for the processAmplicons() function. Where do I get the Samples4.txt and the sgRNAs4.txt. What is the third column in the second file? is it already the counts?

I would appreciate the suggestions on how to proceed.

thanks

Assa

The fastq files withe the complete read (sgRNA is highlighted)

ctrl1
@M01100:33:000000000-A9U6C:1:1101:12325:1758 1:N:0:1
CTTCTTTCTTGTGGAAAGGACGAAACACCGGTGGGCTGCAAATCCAAGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT
+
1>A>11BDFF1DB1B111A1B000AACCE?0A///BE//011BBGA110AFFHHFFBBGHFF1BFFGFFFBBGGD2@FFGB11FGFAGGG?GH/?222BB@
@M01100:33:000000000-A9U6C:1:1101:17009:1766 1:N:0:1
GCCCCTTCTTGTGGAAAGGACGAAACACCGAGGGATGTTATCTCCTCCGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCACCTT
+
1111>>FFFF1F11111111B000AAEFE?00///B/BD2FEGGBABE/AEEGG2F11EG1B111D1@F110FF@221B11110B10FE@/FG/?B220BB
@M01100:33:000000000-A9U6C:1:1101:13526:1767 1:N:0:1
CCTTAGTCTTGTGGAAAGGACGAAACACCGCGCGCGCGGCGCCCACAGTTTAGAGCTAGAAATAGCAAGTTAAAATAGGCTAGTCCGTTATCAACTTGAA
+
---------------
ctrl2
CCAGGTTCTTGTGGAAAGGACGAAACACCGTCCCGAAGGCTCCTCACCGGTTTTAGAGTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTG
+
>>111BDFFFFFB1111111B00AEABFFAGCEE/////ABECGF1AF/?EEGGBF1EF2F2EGGBFHF@FHHFHHGGHHHHGHFHHHHGHHGHHHDHHHH
ATGATATCTTGTGGAAAGGACGAAACACCGGCGTCGAGGAAGCGTAACTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT
+
1>11>3DFFFFFFCBDBCABFA00AAFFEE0A/EA?/EECFFEEGCCEGHHHHHGHBGGHFFFFGGGHHHHHHHHHHHHHHHHHHHHHHHGHHGHHHHHHH
GTCGCCTCTTGTGGAAAGGACGAAACACCGGAGAGCATGGCAGTACACGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT
+
AAAA3AAAFFFFGCFFFFF4BA2A2ACGFE222222FGHHHHGHFHEAAEFGGGGHFHFGFHHHHGGFHHHHHHHHHHGHHHHHGHHHHHGHHGHHHHHHH

the fastA file with the sgRNA used in the library

>ENSG00000139083_GCCTGCTCAGTGTAGCATTA
gcctgctcagtgtagcatta
>ENSG00000139083_GGGAACATGAAGTGGCGTCG
gggaacatgaagtggcgtcg
>ENSG00000139083_GTGAGTGTTCGTGACCCGAG
gtgagtgttcgtgacccgag
>ENSG00000139083_GAGGAAGCGTAACTCGGCAC
gaggaagcgtaactcggcac
I had the same confusion reading the edgeR user guide. I found the following helpful to decide what type of input is expected by edgeR.

http://bioinf.wehi.edu.au/shRNAseq/

http://bioinf.wehi.edu.au/shRNAseq/pooledScreenAnalysis.pdf