Apologies if this has been asked before, I've found several similar questions but nothing quite what I want...
I am analysing some data from a 3'UTR-RNA-seq protocol. In a 150bp NextSeq read, what often happens is that you get something like:
The 12bpAdapter is fixed, but the size of the Poly and 3' Adapter is not fixed.
Having got my data as a ShortReadQ object, I can then use
narrow(fq, start=13) to get rid of the 12bp adapter, but I'm not sure how to get rid of the PolyA onwards. Intuitively I want to do something like
grep('AAAAAAAA', string) to get the position of the start of the polyA and then use
narrow(fq, end=pos) again where necessary.
The trimTailw function is nice for filtering on quality scores, a windowing
alphabetScore function would be another way of doing it. There is also
trimLRPatterns but this will only find a pattern at the end of the sequence, not in the middle (I think). Finally, using
vmatchPattern('AAAAAAAA',sread(fq)) I can sort of work out the position of the polyA but this seems a bit clumsy.
Any guidance to point me in the right direction would be appreciated - I'm sure someone must have done this before!. Would rather do this in R as I'm going to use Rsubread for the alignment step.