Please help me find the right Bioconductor packages for my Masters thesis.
SF1 and QKI are involved in splicing regulation, alternative splicing, and circRNA biogenesis. QKI is a tumor suppressor and inhibits tumorigenicity and metastasis. SF1 binds to the early complex of the spliceosome to do alternative splicing of the downstream exon and QKI competes for the same spot on the premRNA, for example in NUMB. My paper used previously published PAR-CLIP data of QKI binding sites in HEK293 (human kidney) cells. That PAR-CLIP data was aligned with these regions: human Exons, human circRNA, human 5’ UTR, 3’ UTR, and human genes from GFF3 file. I care about if the QKI is upstream, downstream, or inside of the region and I broke that down into D1, D2, D5, and D6 distances. Bedtools closest found if QKI was in the region, upstream or downstream and the awk script found the middle of the QKI and measured D1, D2, D5, and D6.
| <----------------|-----------------> |
QKI D1 QKI D2 QKI
D1 is middle of QKI to start of region
D2 is middle of QKI to end of region
D5 is start of region to middle of QKI
D6 is end of region to middle of QKI
Future research will fix the bedtools closest plus awk script edge case where QKI overlaps with the region but the QKI midpoint is not inside, so it is reported as D1 and D2 instead of what it should be: D5 and D6.
It would be even better if I could search the PAR-CLIP data for the QKI RNA recognition element ACUAAY (or maybe ACUAACN1–20UAAC) and use that instead of the midpoint of the QKI.
SF1 mammalian BPS YNCURAY (Liu et al., 2001)
QKI has an ACUAACN1–20UAAC motif determined by SELEX (Conn et al., 2015). QKI is a dimer which is why it has two motifs.
The RNA recognition element of QKI (ACUAAY) was determined by SELEX (Hafner et al., 2010). ACUAAY was what was used in the bedtools closest and awk scripts.
I think ChIPpeakAnno might be able to do something similar and make a pie chart (see Figure 2: Pie chart of common peaks among features) but that pie chart does not seem to include upstream or downstream (D5 and D6) which I need.
Please help me because I cannot go through all of the packages and find the ones I need because I have never used Bioconductor (only Bedtools).
I think that I can use awk scripts to find the ACUAAY inside of the PAR-CLIP data and trim the PAR-CLIP data to only include the ACUAAY and then rerun everything to get D1, D2, D5, and D6, but that is the hard way.
Also please let me know if Bioconductor has any visualization tools that might help. Right now I am using R Studio histograms to visualize D1, D2, D5, and D6.