Commit Graph

64 Commits (96718d29d3e3e38d905048111d2fe6ef8d59d8a1)

Author SHA1 Message Date
Peter Rauscher 96718d29d3
Reference Ron's article in README 2023-04-07 14:01:20 +00:00
Peter Rauscher e3ef97cd95
OAP-68: Combine config.env and database.ini (#52)
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex

* added test case

* Combined api/config.env and database.ini into one root .env file. Needs documentation update!

* Adding default postgres credentials in .env for container tests, needs to be overwritten on DO deployment.

* documentation changes & removed unused scripts

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-07 09:37:43 -04:00
Justin O'Boyle 7e4963df77
New setup instructions (#35) 2023-04-06 14:31:18 -04:00
Peter Rauscher a5624cb718
Filter multi-line stopwords (#51)
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex

* added test case

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-06 14:28:35 +00:00
Celina Peralta 7f92b17dc2
OAP-61: flatten suggestions database (#46)
* OAP-61: flatten suggestions database

* Remove limit on items to refresh

* Delete run.sh
2023-04-03 17:02:16 -04:00
Celina Peralta b80d745b30
add schedule package to daemon (#48) 2023-03-31 15:33:41 -04:00
Eric Hellman 966c336b68
improve documentation (#43)
* fix permissions

* add links to sub-READMEs
2023-03-27 12:53:06 -04:00
Justin O'Boyle 376545450d
Basic testing (#45)
* finished changes to stopwords and langauges

* final changes to stopwords

* basic testing

* add tests

* Remove formatter for now

* fix merge

* cd

* touch __init__

* Relative path issue \?

* run tests before app

* Move tests to inside docker

* exit when any command fails

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-03-22 14:52:38 -04:00
Celina Peralta 884872cf60
celinanperalta/OAP-58, OAP-59: Update suggestion task, remove unnecessary collections from harvest (#42)
* Each thread inserts into DB using one synchronized conn

* Fix formatting for get_empty query

* OAP-59: Filter out unnecessary collections from harvest

* Add endpoints table check

* Fix typo in get_empty description
2023-03-22 13:15:53 -04:00
Celina Peralta 15f801a19a
Bug fixes for prod harvest (#40)
* fix tqdm counter and add request headers

* fix typo in generate suggestions, add limit to seed.py
2023-03-07 12:41:36 -05:00
Celina Peralta ebaaa7cab3
fix tqdm counter and add request headers (#39) 2023-03-06 15:37:33 -05:00
Celina Peralta 888e73e6e2
OAP-56 and logging (#38)
* incremental progress, logging, synchronization

* update refresh/harvest period

* add logger StreamHandler, add Dockerfile nltk download, tweak seed parameters

* update readme and add scripts for manual commands
2023-03-06 14:44:45 -05:00
Justin O'Boyle f4b9ed39ab
Embed script deployment fixes (#37)
* Don't EncodeURIComponent

* Fix CORS

* Explicitly define CORS header
2023-03-03 09:13:53 -05:00
Justin O'Boyle 83938f73b5
Correct CORS headers and first pass at embedded API item (#36)
* Ignore cross origin

* Add script

* Dynamic host
2023-03-03 08:58:02 -05:00
Celina Peralta e772dc2b87
OAP-54: Full harvest for DB, add threshold (#34)
* Fix harvest synchronization, add threshold parameter

* Move daemon env vars to docker-compose.yml
2023-02-23 19:23:23 -05:00
Celina Peralta f7c33c07e9
OAP-53 Fix engine Dockerfile, build psycopg2 from source not binary, write daemon (#32)
* add libpq5 to build

* remove psycopg2-binary

* add punkt as resource

* Use multiprocessing.cpu_count for max suggestion workers
2023-02-10 07:45:44 -05:00
Peter Rauscher 32bb124706
Fixed minor error with handle validation (#33) 2023-02-09 16:46:22 +00:00
Celina Peralta 27b9a77f78
OAP-48, OAP-50 (#30) 2022-12-13 08:25:40 -05:00
Justin O'Boyle 7ac8bd7af8
Make docker not run on localhost (#31) 2022-12-13 08:25:25 -05:00
Justin O'Boyle 535715932d
Setup docker (#26)
* basic config

* Add github action

* Fix makefile for linux and variable python (#27)

* fix makefile

* remove out

* add to gitignore

* Fix makefile for linux and variable python (#27)

* fix makefile

* remove out

* add to gitignore

* Fix dockerfile

* stash changes

* Make makefile dynamic (#28)

* Remove broken docker packages for now

* Add web

* Make Black Formatter happy?

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2022-12-13 07:46:08 -05:00
Justin O'Boyle 924d2b2539
Make makefile dynamic (#28) 2022-12-05 20:57:25 -05:00
Max Zaremba 63619d3aa9
Fix makefile for linux and variable python (#27)
* fix makefile

* remove out

* add to gitignore
2022-12-04 21:54:40 -05:00
Celina Peralta 9bfdc51e5c
Tweak seed task params (#25)
* No get text

* remove pytest

* add OAP-39 work.

* add weekly item endpoint

* get weekly items

* refresh + generate suggestions

* thread seed tasks

* fix generate_suggestions concurrency

* fix typos

* draft concurrent data ingest

* seed, refresh tasks

* change seed config params
2022-11-29 01:19:16 -05:00
Celina Peralta 922ff68a17
celinanperalta/OAP 23 (#22)
* No get text

* remove pytest

* add OAP-39 work.

* add weekly item endpoint

* get weekly items

* refresh + generate suggestions

* thread seed tasks

* fix generate_suggestions concurrency

* fix typos

* draft concurrent data ingest

* seed, refresh tasks
2022-11-28 22:17:43 -05:00
Justin O'Boyle a8563e48be
Fix build job (#24)
* Fix type

* fix formatting
2022-11-28 19:36:41 -05:00
j-sofia cdf4659146
OAP-37: Read stopwords from txt (#23)
* read stopwords from txt

read stopwords from txt and README change

* leftover code removed

* formatting

* formatting again

* formatting last try
2022-11-16 17:22:34 -05:00
Justin O'Boyle 9435a69032
OAP-40 Align API more closely with ngram generation, fix environment (#21)
* First commit

* Update gitignore

* Update schema

* Remove todo
2022-11-14 15:20:43 -05:00
Celina Peralta 1aa611231b
OAP-36, OAP-39: No get_text() in OapenItem, register adapters for DB objects in Python (#20)
* No get text

* remove pytest

* add OAP-39 work.
2022-11-09 19:36:20 -05:00
Peter Rauscher 013fef0f0d
OAP-38: Add /ngrams endpoint to API (#19)
* Added data function to query ngrams within the api

* removed cleaning and seeding tasks from API level, handled at engine level

* Added /:handle/ngrams endpoint to API routes

* Reflected change from uuid to handle in log messages within API

* Just some API readme changes

* Added regex to routes to mitigate url decoding, plus added validation function for handle

Co-authored-by: j-sofia <joey.sofia1@gmail.com>
Co-authored-by: Peter Rauscher <peterrauscher@protonmail.com>

Co-authored-by: j-sofia <joey.sofia1@gmail.com>
2022-11-09 18:45:48 -05:00
Celina Peralta 4333d4fcc3
[Draft] OAP-32 Ngram Caching (#18)
* start caching ngrams

* fix build warnings

* add timestamp

* resolve comments

* pull out mogrify

* remove pytest from hook for now
2022-11-02 23:07:56 -04:00
j-sofia ccbdda287e
OAP-17: PostgreSQL integration into API with pg-promise, data function to read from DB, dotenv to read DB credentials from environment variables (#9)
* local db connection and data functions

added pg-promise package to interface with PostgreSQL, added data functions, changed api to port 3001, updated README.md

* pr review changes

* dotenv

* Update README.md with api dependencies

* Update README.md

* PR changes

* typo
2022-10-26 03:07:10 +00:00
Celina Peralta cf9569a358
celinanperalta/OAP 33 (#16)
* make db use handle not uuid

* remove lib

* remove lib
2022-10-24 19:35:43 -04:00
Celina Peralta fd7f30ca31
celinanperalta/OAP 31 (#17)
* Get items by handle

* Get items by handle

* remove lib

* update clean/seed tasks

* refactor ngrams

* fix typo in oapen.py

* why is this not being ignored
2022-10-24 19:30:57 -04:00
Celina Peralta 06468a650c Merge branch 'celinanperalta/OAP-31' into main 2022-10-24 19:20:50 -04:00
Celina Peralta 5a974ff0f1 merge main 2022-10-24 19:19:31 -04:00
Celina Peralta 162eb86497 fix typo in oapen.py 2022-10-23 19:53:48 -04:00
Celina Peralta ee45695fb6 Merge branch 'main' of https://github.com/EbookFoundation/oapen-suggestion-service into main 2022-10-23 19:52:52 -04:00
Celina Peralta 1520f08b05
Fix pre-commit hook + linting jobs for OAPEN engine (#14)
* sync upstream

* isort, black, flake8 precommit hook

* Ignore bin

* reset bin

* reset bin

* try to fix black

* remove bin!

* update gh action
2022-10-23 19:51:47 -04:00
Celina Peralta d2668491ab refactor ngrams 2022-10-18 13:58:04 -04:00
Celina Peralta 0b97c79bde Merge remote-tracking branch 'upstream/main' into celinanperalta/OAP-31 2022-10-18 13:40:36 -04:00
Celina Peralta 6589faa2e3 update clean/seed tasks 2022-10-18 13:39:16 -04:00
Celina Peralta 2bba7eaf98 remove lib 2022-10-18 13:24:35 -04:00
Celina Peralta ed617affd5 Get items by handle 2022-10-18 13:21:55 -04:00
Celina Peralta fded0c1344 Get items by handle 2022-10-18 13:21:35 -04:00
Max Zaremba 033fc1e56e
OAP 26 (#12)
We are disregarding the linting job failure as this is maybe an environment issue. Will be fixed in subsequent PRs.
2022-10-18 11:31:49 -04:00
Celina Peralta 417c55ed33 Merge branch 'main' of https://github.com/EbookFoundation/oapen-suggestion-service into main 2022-10-18 09:42:41 -04:00
Justin O'Boyle 09ec61b7d7
OAP-35 Connect `api/` and `web/`, fix querying between them, add running documentation & make dev environment easier to use (#13) 2022-10-18 08:10:11 -04:00
Justin O'Boyle 962f2d0972
OAP-21 Add project details and dependency maintenance info to README (#10) 2022-10-11 15:54:45 -04:00
Celina Peralta 3392f79665
Merge branch 'EbookFoundation:main' into main 2022-10-11 14:11:41 -04:00
Celina Peralta 2e0f398055
OAP-14: DB connection and seeding (#8)
Add seeding for database, fix pre-commit hooks, and add Makefile
2022-10-11 14:08:35 -04:00