Eric Hellman
d90c83c84c
try reducing the number of ingest workers to 1 ( #68 )
...
I think we've been overloading the OAPEN API
2023-05-10 09:42:19 -04:00
Peter Rauscher
40cac8ba55
DB SSL, API format changes, new endpoints, and unit testing with Actions
...
* Run unit tests with Github Actions on each push
* Change job timeout to 10 minutes
* Fix for sslmode in API connection string
* Select lists of suggestions or ngrams with /api or /api/ngrams respectively, JSONify ngrams response
* Added better documentation of API endpoints
* Switch from connection string to connection object in API
2023-04-18 19:48:09 -04:00
Peter Rauscher
2e41d8e36b
Change database build & rebuild behavior, removes RUN_CLEAN option ( #59 )
...
* Removed RUN_CLEAN and added scripts for manually rebuilding database, instructions in documentation
* Add cleaning commands as "docker compose run" options, update documentation to reflect changes
2023-04-18 11:09:47 -04:00
Celina Peralta
7e33ac4677
celinanperalta/OAP 65 ( #54 )
...
Add additional parameters to OAPEN DB and refactor engine types
2023-04-14 10:39:01 +02:00
Peter Rauscher
fec1e984c2
Require SSL for Postgres connections (API and mining engine) ( #56 )
2023-04-13 18:09:59 +00:00
Peter Rauscher
e3ef97cd95
OAP-68: Combine config.env and database.ini ( #52 )
...
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex
* added test case
* Combined api/config.env and database.ini into one root .env file. Needs documentation update!
* Adding default postgres credentials in .env for container tests, needs to be overwritten on DO deployment.
* documentation changes & removed unused scripts
---------
Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-07 09:37:43 -04:00
Peter Rauscher
a5624cb718
Filter multi-line stopwords ( #51 )
...
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex
* added test case
---------
Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-06 14:28:35 +00:00
Celina Peralta
7f92b17dc2
OAP-61: flatten suggestions database ( #46 )
...
* OAP-61: flatten suggestions database
* Remove limit on items to refresh
* Delete run.sh
2023-04-03 17:02:16 -04:00
Celina Peralta
b80d745b30
add schedule package to daemon ( #48 )
2023-03-31 15:33:41 -04:00
Justin O'Boyle
376545450d
Basic testing ( #45 )
...
* finished changes to stopwords and langauges
* final changes to stopwords
* basic testing
* add tests
* Remove formatter for now
* fix merge
* cd
* touch __init__
* Relative path issue \?
* run tests before app
* Move tests to inside docker
* exit when any command fails
---------
Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-03-22 14:52:38 -04:00
Celina Peralta
884872cf60
celinanperalta/OAP-58, OAP-59: Update suggestion task, remove unnecessary collections from harvest ( #42 )
...
* Each thread inserts into DB using one synchronized conn
* Fix formatting for get_empty query
* OAP-59: Filter out unnecessary collections from harvest
* Add endpoints table check
* Fix typo in get_empty description
2023-03-22 13:15:53 -04:00
Celina Peralta
15f801a19a
Bug fixes for prod harvest ( #40 )
...
* fix tqdm counter and add request headers
* fix typo in generate suggestions, add limit to seed.py
2023-03-07 12:41:36 -05:00
Celina Peralta
ebaaa7cab3
fix tqdm counter and add request headers ( #39 )
2023-03-06 15:37:33 -05:00
Celina Peralta
888e73e6e2
OAP-56 and logging ( #38 )
...
* incremental progress, logging, synchronization
* update refresh/harvest period
* add logger StreamHandler, add Dockerfile nltk download, tweak seed parameters
* update readme and add scripts for manual commands
2023-03-06 14:44:45 -05:00
Celina Peralta
e772dc2b87
OAP-54: Full harvest for DB, add threshold ( #34 )
...
* Fix harvest synchronization, add threshold parameter
* Move daemon env vars to docker-compose.yml
2023-02-23 19:23:23 -05:00
Celina Peralta
f7c33c07e9
OAP-53 Fix engine Dockerfile, build psycopg2 from source not binary, write daemon ( #32 )
...
* add libpq5 to build
* remove psycopg2-binary
* add punkt as resource
* Use multiprocessing.cpu_count for max suggestion workers
2023-02-10 07:45:44 -05:00
Celina Peralta
27b9a77f78
OAP-48, OAP-50 ( #30 )
2022-12-13 08:25:40 -05:00
Justin O'Boyle
535715932d
Setup docker ( #26 )
...
* basic config
* Add github action
* Fix makefile for linux and variable python (#27 )
* fix makefile
* remove out
* add to gitignore
* Fix makefile for linux and variable python (#27 )
* fix makefile
* remove out
* add to gitignore
* Fix dockerfile
* stash changes
* Make makefile dynamic (#28 )
* Remove broken docker packages for now
* Add web
* Make Black Formatter happy?
Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2022-12-13 07:46:08 -05:00
Justin O'Boyle
924d2b2539
Make makefile dynamic ( #28 )
2022-12-05 20:57:25 -05:00
Max Zaremba
63619d3aa9
Fix makefile for linux and variable python ( #27 )
...
* fix makefile
* remove out
* add to gitignore
2022-12-04 21:54:40 -05:00
Celina Peralta
9bfdc51e5c
Tweak seed task params ( #25 )
...
* No get text
* remove pytest
* add OAP-39 work.
* add weekly item endpoint
* get weekly items
* refresh + generate suggestions
* thread seed tasks
* fix generate_suggestions concurrency
* fix typos
* draft concurrent data ingest
* seed, refresh tasks
* change seed config params
2022-11-29 01:19:16 -05:00
Celina Peralta
922ff68a17
celinanperalta/OAP 23 ( #22 )
...
* No get text
* remove pytest
* add OAP-39 work.
* add weekly item endpoint
* get weekly items
* refresh + generate suggestions
* thread seed tasks
* fix generate_suggestions concurrency
* fix typos
* draft concurrent data ingest
* seed, refresh tasks
2022-11-28 22:17:43 -05:00
Justin O'Boyle
a8563e48be
Fix build job ( #24 )
...
* Fix type
* fix formatting
2022-11-28 19:36:41 -05:00
j-sofia
cdf4659146
OAP-37: Read stopwords from txt ( #23 )
...
* read stopwords from txt
read stopwords from txt and README change
* leftover code removed
* formatting
* formatting again
* formatting last try
2022-11-16 17:22:34 -05:00
Justin O'Boyle
9435a69032
OAP-40 Align API more closely with ngram generation, fix environment ( #21 )
...
* First commit
* Update gitignore
* Update schema
* Remove todo
2022-11-14 15:20:43 -05:00
Celina Peralta
1aa611231b
OAP-36, OAP-39: No get_text() in OapenItem, register adapters for DB objects in Python ( #20 )
...
* No get text
* remove pytest
* add OAP-39 work.
2022-11-09 19:36:20 -05:00
Celina Peralta
4333d4fcc3
[Draft] OAP-32 Ngram Caching ( #18 )
...
* start caching ngrams
* fix build warnings
* add timestamp
* resolve comments
* pull out mogrify
* remove pytest from hook for now
2022-11-02 23:07:56 -04:00
Celina Peralta
cf9569a358
celinanperalta/OAP 33 ( #16 )
...
* make db use handle not uuid
* remove lib
* remove lib
2022-10-24 19:35:43 -04:00
Celina Peralta
fd7f30ca31
celinanperalta/OAP 31 ( #17 )
...
* Get items by handle
* Get items by handle
* remove lib
* update clean/seed tasks
* refactor ngrams
* fix typo in oapen.py
* why is this not being ignored
2022-10-24 19:30:57 -04:00
Celina Peralta
06468a650c
Merge branch 'celinanperalta/OAP-31' into main
2022-10-24 19:20:50 -04:00
Celina Peralta
5a974ff0f1
merge main
2022-10-24 19:19:31 -04:00
Celina Peralta
162eb86497
fix typo in oapen.py
2022-10-23 19:53:48 -04:00
Celina Peralta
ee45695fb6
Merge branch 'main' of https://github.com/EbookFoundation/oapen-suggestion-service into main
2022-10-23 19:52:52 -04:00
Celina Peralta
1520f08b05
Fix pre-commit hook + linting jobs for OAPEN engine ( #14 )
...
* sync upstream
* isort, black, flake8 precommit hook
* Ignore bin
* reset bin
* reset bin
* try to fix black
* remove bin!
* update gh action
2022-10-23 19:51:47 -04:00
Celina Peralta
d2668491ab
refactor ngrams
2022-10-18 13:58:04 -04:00
Celina Peralta
0b97c79bde
Merge remote-tracking branch 'upstream/main' into celinanperalta/OAP-31
2022-10-18 13:40:36 -04:00
Celina Peralta
6589faa2e3
update clean/seed tasks
2022-10-18 13:39:16 -04:00
Celina Peralta
2bba7eaf98
remove lib
2022-10-18 13:24:35 -04:00
Celina Peralta
ed617affd5
Get items by handle
2022-10-18 13:21:55 -04:00
Celina Peralta
fded0c1344
Get items by handle
2022-10-18 13:21:35 -04:00
Max Zaremba
033fc1e56e
OAP 26 ( #12 )
...
We are disregarding the linting job failure as this is maybe an environment issue. Will be fixed in subsequent PRs.
2022-10-18 11:31:49 -04:00
Celina Peralta
3392f79665
Merge branch 'EbookFoundation:main' into main
2022-10-11 14:11:41 -04:00
Celina Peralta
2e0f398055
OAP-14: DB connection and seeding ( #8 )
...
Add seeding for database, fix pre-commit hooks, and add Makefile
2022-10-11 14:08:35 -04:00
Celina Peralta
f0956854ea
Merge https://github.com/EbookFoundation/oapen-suggestion-service into main
2022-10-04 08:24:05 -04:00
Celina Peralta
0f247eac8c
OAP-15, OAP-22: Data ingest + text preprocessing ( #6 )
...
* sync upstream
* db skeleton
* update readme
* basic api calls
* api calls
* data ingest + text preprocessing
* update gitignore
* remove lib changes
* lint
* remove unused imports
* gitignore updates
* update python job
* ignore flake8 warnings
2022-10-04 08:22:55 -04:00
Celina Peralta
b55f906143
Merge branch 'EbookFoundation:main' into main
2022-09-30 17:13:31 -04:00
Celina Peralta
e005787bbe
OAP-22: Set up python build job in GH actions ( #4 )
...
* sync upstream
* Add linting and testing, update to python 3.10
* push engine workflow
* fix workflow version
* fix workflow version 2
* change setup-python to v3
* workflow: cd oapen-engine
* workflow
* workflow
* workflow
* add lock file
* remove unnecessary cd
* remove unnecessary cd
* remove isort
* update workflow
* job
* job
* job
* job
* job
* job
* isort why
* isort why
* isort why
* isort why
* isort why
* isort why
* isort why
* job
* job
* job
* job
* job
* job
* job
* job
* job
* job
* job
* lint
* job
* hooks
* remove lib
2022-09-30 15:49:04 -04:00
Celina Peralta
196f62a8b0
Merge branch 'EbookFoundation:main' into main
2022-09-27 19:21:30 -04:00
Celina Peralta
e139ce9adb
celinanperalta/oap 13 ( #3 )
...
* sync upstream
* Add pgsql connection, update pipfile
* add config files
* update gitignore
* remove cached
* remove psycopg2-binary dependency
* add sklearn and pandas to pipfile
* add readme.md
2022-09-27 18:46:58 -04:00
Celina Peralta
857a0e9691
sync upstream
2022-09-27 15:19:46 -04:00