I am having trouble understanding the workflow described in the paper. I have a data set of four fastq files from a crispr/cas9 screening experiment as well as a fasta file of the sgRNA used in the analysis (for an example see below). The experiment uses single-indexing strategy with two control samples and two treated samples.
I am not sure how to use the workflow described in the paper and the R vignette to analyze my data. I have read both the paper and the vignette, but I still don't understand how to adapt my data to the example in there.
I know, I need to trim the fastq files to contain only the sgRNA part of the reads.
What I don;t understnad are the two text files for the `processAmplicons()` function. Where do I get the `Samples4.txt` and the `sgRNAs4.txt`. What is the third column in the second file? is it already the counts?
I would appreciate the suggestions on how to proceed.
thanks
Assa
The fastq files withe the complete read (sgRNA is highlighted)
ctrl1 @M01100:33:000000000-A9U6C:1:1101:12325:1758 1:N:0:1 CTTCTTTCTTGTGGAAAGGACGAAACACCGGTGGGCTGCAAATCCAAGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT + 1>A>11BDFF1DB1B111A1B000AACCE?0A///BE//011BBGA110AFFHHFFBBGHFF1BFFGFFFBBGGD2@FFGB11FGFAGGG?GH/?222BB@ @M01100:33:000000000-A9U6C:1:1101:17009:1766 1:N:0:1 GCCCCTTCTTGTGGAAAGGACGAAACACCGAGGGATGTTATCTCCTCCGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCACCTT + 1111>>FFFF1F11111111B000AAEFE?00///B/BD2FEGGBABE/AEEGG2F11EG1B111D1@F110FF@221B11110B10FE@/FG/?B220BB @M01100:33:000000000-A9U6C:1:1101:13526:1767 1:N:0:1 CCTTAGTCTTGTGGAAAGGACGAAACACCGCGCGCGCGGCGCCCACAGTTTAGAGCTAGAAATAGCAAGTTAAAATAGGCTAGTCCGTTATCAACTTGAA + 11AA11BDFF1F11ADB111BA00EEFHE?00/AA/E/A>/>>@/?/>FGH211BEFF111BBEG11>FGFB22>BBGFFFFFGDE?GG?F>22<BB111 --------------- ctrl2 @M01100:32:000000000-AAD7V:1:1101:13689:1787 1:N:0:1 CCAGGTTCTTGTGGAAAGGACGAAACACCGTCCCGAAGGCTCCTCACCGGTTTTAGAGTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTG + >>111BDFFFFFB1111111B00AEABFFAGCEE/////ABECGF1AF/?EEGGBF1EF2F2EGGBFHF@FHHFHHGGHHHHGHFHHHHGHHGHHHDHHHH @M01100:32:000000000-AAD7V:1:1101:14753:1826 1:N:0:1 ATGATATCTTGTGGAAAGGACGAAACACCGGCGTCGAGGAAGCGTAACTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT + 1>11>3DFFFFFFCBDBCABFA00AAFFEE0A/EA?/EECFFEEGCCEGHHHHHGHBGGHFFFFGGGHHHHHHHHHHHHHHHHHHHHHHHGHHGHHHHHHH @M01100:32:000000000-AAD7V:1:1101:14960:1844 1:N:0:1 GTCGCCTCTTGTGGAAAGGACGAAACACCGGAGAGCATGGCAGTACACGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT + AAAA3AAAFFFFGCFFFFF4BA2A2ACGFE222222FGHHHHGHFHEAAEFGGGGHFHFGFHHHHGGFHHHHHHHHHHGHHHHHGHHHHHGHHGHHHHHHH
the fastA file with the sgRNA used in the library
>ENSG00000139083_GCCTGCTCAGTGTAGCATTA gcctgctcagtgtagcatta >ENSG00000139083_GGGAACATGAAGTGGCGTCG gggaacatgaagtggcgtcg >ENSG00000139083_GTGAGTGTTCGTGACCCGAG gtgagtgttcgtgacccgag >ENSG00000139083_GAGGAAGCGTAACTCGGCAC gaggaagcgtaactcggcac