Commit Graph

42 Commits (better-pass-related)

Author SHA1 Message Date
Justin O'Boyle 376545450d
Basic testing (#45)
* finished changes to stopwords and langauges

* final changes to stopwords

* basic testing

* add tests

* Remove formatter for now

* fix merge

* cd

* touch __init__

* Relative path issue \?

* run tests before app

* Move tests to inside docker

* exit when any command fails

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-03-22 14:52:38 -04:00
Celina Peralta 884872cf60
celinanperalta/OAP-58, OAP-59: Update suggestion task, remove unnecessary collections from harvest (#42)
* Each thread inserts into DB using one synchronized conn

* Fix formatting for get_empty query

* OAP-59: Filter out unnecessary collections from harvest

* Add endpoints table check

* Fix typo in get_empty description
2023-03-22 13:15:53 -04:00
Celina Peralta 15f801a19a
Bug fixes for prod harvest (#40)
* fix tqdm counter and add request headers

* fix typo in generate suggestions, add limit to seed.py
2023-03-07 12:41:36 -05:00
Celina Peralta ebaaa7cab3
fix tqdm counter and add request headers (#39) 2023-03-06 15:37:33 -05:00
Celina Peralta 888e73e6e2
OAP-56 and logging (#38)
* incremental progress, logging, synchronization

* update refresh/harvest period

* add logger StreamHandler, add Dockerfile nltk download, tweak seed parameters

* update readme and add scripts for manual commands
2023-03-06 14:44:45 -05:00
Celina Peralta e772dc2b87
OAP-54: Full harvest for DB, add threshold (#34)
* Fix harvest synchronization, add threshold parameter

* Move daemon env vars to docker-compose.yml
2023-02-23 19:23:23 -05:00
Celina Peralta f7c33c07e9
OAP-53 Fix engine Dockerfile, build psycopg2 from source not binary, write daemon (#32)
* add libpq5 to build

* remove psycopg2-binary

* add punkt as resource

* Use multiprocessing.cpu_count for max suggestion workers
2023-02-10 07:45:44 -05:00
Celina Peralta 27b9a77f78
OAP-48, OAP-50 (#30) 2022-12-13 08:25:40 -05:00
Justin O'Boyle 535715932d
Setup docker (#26)
* basic config

* Add github action

* Fix makefile for linux and variable python (#27)

* fix makefile

* remove out

* add to gitignore

* Fix makefile for linux and variable python (#27)

* fix makefile

* remove out

* add to gitignore

* Fix dockerfile

* stash changes

* Make makefile dynamic (#28)

* Remove broken docker packages for now

* Add web

* Make Black Formatter happy?

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2022-12-13 07:46:08 -05:00
Justin O'Boyle 924d2b2539
Make makefile dynamic (#28) 2022-12-05 20:57:25 -05:00
Max Zaremba 63619d3aa9
Fix makefile for linux and variable python (#27)
* fix makefile

* remove out

* add to gitignore
2022-12-04 21:54:40 -05:00
Celina Peralta 9bfdc51e5c
Tweak seed task params (#25)
* No get text

* remove pytest

* add OAP-39 work.

* add weekly item endpoint

* get weekly items

* refresh + generate suggestions

* thread seed tasks

* fix generate_suggestions concurrency

* fix typos

* draft concurrent data ingest

* seed, refresh tasks

* change seed config params
2022-11-29 01:19:16 -05:00
Celina Peralta 922ff68a17
celinanperalta/OAP 23 (#22)
* No get text

* remove pytest

* add OAP-39 work.

* add weekly item endpoint

* get weekly items

* refresh + generate suggestions

* thread seed tasks

* fix generate_suggestions concurrency

* fix typos

* draft concurrent data ingest

* seed, refresh tasks
2022-11-28 22:17:43 -05:00
Justin O'Boyle a8563e48be
Fix build job (#24)
* Fix type

* fix formatting
2022-11-28 19:36:41 -05:00
j-sofia cdf4659146
OAP-37: Read stopwords from txt (#23)
* read stopwords from txt

read stopwords from txt and README change

* leftover code removed

* formatting

* formatting again

* formatting last try
2022-11-16 17:22:34 -05:00
Justin O'Boyle 9435a69032
OAP-40 Align API more closely with ngram generation, fix environment (#21)
* First commit

* Update gitignore

* Update schema

* Remove todo
2022-11-14 15:20:43 -05:00
Celina Peralta 1aa611231b
OAP-36, OAP-39: No get_text() in OapenItem, register adapters for DB objects in Python (#20)
* No get text

* remove pytest

* add OAP-39 work.
2022-11-09 19:36:20 -05:00
Celina Peralta 4333d4fcc3
[Draft] OAP-32 Ngram Caching (#18)
* start caching ngrams

* fix build warnings

* add timestamp

* resolve comments

* pull out mogrify

* remove pytest from hook for now
2022-11-02 23:07:56 -04:00
Celina Peralta cf9569a358
celinanperalta/OAP 33 (#16)
* make db use handle not uuid

* remove lib

* remove lib
2022-10-24 19:35:43 -04:00
Celina Peralta fd7f30ca31
celinanperalta/OAP 31 (#17)
* Get items by handle

* Get items by handle

* remove lib

* update clean/seed tasks

* refactor ngrams

* fix typo in oapen.py

* why is this not being ignored
2022-10-24 19:30:57 -04:00
Celina Peralta 06468a650c Merge branch 'celinanperalta/OAP-31' into main 2022-10-24 19:20:50 -04:00
Celina Peralta 5a974ff0f1 merge main 2022-10-24 19:19:31 -04:00
Celina Peralta 162eb86497 fix typo in oapen.py 2022-10-23 19:53:48 -04:00
Celina Peralta ee45695fb6 Merge branch 'main' of https://github.com/EbookFoundation/oapen-suggestion-service into main 2022-10-23 19:52:52 -04:00
Celina Peralta 1520f08b05
Fix pre-commit hook + linting jobs for OAPEN engine (#14)
* sync upstream

* isort, black, flake8 precommit hook

* Ignore bin

* reset bin

* reset bin

* try to fix black

* remove bin!

* update gh action
2022-10-23 19:51:47 -04:00
Celina Peralta d2668491ab refactor ngrams 2022-10-18 13:58:04 -04:00
Celina Peralta 0b97c79bde Merge remote-tracking branch 'upstream/main' into celinanperalta/OAP-31 2022-10-18 13:40:36 -04:00
Celina Peralta 6589faa2e3 update clean/seed tasks 2022-10-18 13:39:16 -04:00
Celina Peralta 2bba7eaf98 remove lib 2022-10-18 13:24:35 -04:00
Celina Peralta ed617affd5 Get items by handle 2022-10-18 13:21:55 -04:00
Celina Peralta fded0c1344 Get items by handle 2022-10-18 13:21:35 -04:00
Max Zaremba 033fc1e56e
OAP 26 (#12)
We are disregarding the linting job failure as this is maybe an environment issue. Will be fixed in subsequent PRs.
2022-10-18 11:31:49 -04:00
Celina Peralta 3392f79665
Merge branch 'EbookFoundation:main' into main 2022-10-11 14:11:41 -04:00
Celina Peralta 2e0f398055
OAP-14: DB connection and seeding (#8)
Add seeding for database, fix pre-commit hooks, and add Makefile
2022-10-11 14:08:35 -04:00
Celina Peralta f0956854ea Merge https://github.com/EbookFoundation/oapen-suggestion-service into main 2022-10-04 08:24:05 -04:00
Celina Peralta 0f247eac8c
OAP-15, OAP-22: Data ingest + text preprocessing (#6)
* sync upstream

* db skeleton

* update readme

* basic api calls

* api calls

* data ingest + text preprocessing

* update gitignore

* remove lib changes

* lint

* remove unused imports

* gitignore updates

* update python job

* ignore flake8 warnings
2022-10-04 08:22:55 -04:00
Celina Peralta b55f906143
Merge branch 'EbookFoundation:main' into main 2022-09-30 17:13:31 -04:00
Celina Peralta e005787bbe
OAP-22: Set up python build job in GH actions (#4)
* sync upstream

* Add linting and testing, update to python 3.10

* push engine workflow

* fix workflow version

* fix workflow version 2

* change setup-python to v3

* workflow: cd oapen-engine

* workflow

* workflow

* workflow

* add lock file

* remove unnecessary cd

* remove unnecessary cd

* remove isort

* update workflow

* job

* job

* job

* job

* job

* job

* isort why

* isort why

* isort why

* isort why

* isort why

* isort why

* isort why

* job

* job

* job

* job

* job

* job

* job

* job

* job

* job

* job

* lint

* job

* hooks

* remove lib
2022-09-30 15:49:04 -04:00
Celina Peralta 196f62a8b0
Merge branch 'EbookFoundation:main' into main 2022-09-27 19:21:30 -04:00
Celina Peralta e139ce9adb
celinanperalta/oap 13 (#3)
* sync upstream

* Add pgsql connection, update pipfile

* add config files

* update gitignore

* remove cached

* remove psycopg2-binary dependency

* add sklearn and pandas to pipfile

* add readme.md
2022-09-27 18:46:58 -04:00
Celina Peralta 857a0e9691 sync upstream 2022-09-27 15:19:46 -04:00
Celina Peralta ad263b11fc
Create mining engine boilerplate (#2)
* add oapen engine folder

* add pipenv
2022-09-27 15:07:50 -04:00