I'd like to search plant science literature (full text) to only return articles in which the word "three" appears four or more times in the full-text Methods section. Is there any way to do this at all ? (I mean with any language, online App or helper tool ?). I presume that I would need to search Pubmed Central (although another online database would be o.k. or perhaps I would need to download a database first ?) and I would prefer an R solution (although another language would be fine as well, perhaps using json and regex ?). Perhaps this might be possible using R Biotea ? perhaps a SPARQL query with regular expressions using something called an RDF database ? ChatGPT suggested first downloading data from europepmc and then querying this - but wouldn't this mean first downloading terabytes of data ? (perhaps this would be possible (?) but it doesn't seem like an efficient answer - does anyone have experience of this ?) At the moment I'm just enquiring whether it's at all possible - and which is the best/easiest direction to go. I've looked at scite - and this accepts json and regex - but apparently only searches citations rather than full-text methods.
This is off-topic as entirely unrelated to Bioconductor. You seem to be crossposting this to > 1 community, please at least leave a link to the other communities so people do not invest double effort in case it's solved already elsewhere.
Apologies - you are potentially correct (assuming there is no answer within Bioconductor). I'm having trouble finding a suitable venue for this question as I have no idea which bioinformatics tools to approach for such a question. As soon as I have a suitable link I will link the question.
link as promised: https://datascience.stackexchange.com/questions/129579/what-informatics-tools-will-allow-complex-search-in-order-to-select-full-text-ar