Commit Graph

78 Commits (add-new-publisher-stopwords)

Author SHA1 Message Date
Celina Peralta 379a574e6a add new publishers to stopwords 2023-04-26 17:45:03 -04:00
Celina Peralta 3622261f98
Fix self-signed cert error (#64)
* add certificate to droplet, update readme

* update test container job to add crt
2023-04-20 13:54:59 -04:00
Peter Rauscher 72b2b87fe4 Documentation updates 2023-04-19 10:24:20 -04:00
Peter Rauscher 40cac8ba55
DB SSL, API format changes, new endpoints, and unit testing with Actions
* Run unit tests with Github Actions on each push

* Change job timeout to 10 minutes

* Fix for sslmode in API connection string

* Select lists of suggestions or ngrams with /api or /api/ngrams respectively, JSONify ngrams response

* Added better documentation of API endpoints

* Switch from connection string to connection object in API
2023-04-18 19:48:09 -04:00
Peter Rauscher 2e41d8e36b
Change database build & rebuild behavior, removes RUN_CLEAN option (#59)
* Removed RUN_CLEAN and added scripts for manually rebuilding database, instructions in documentation

* Add cleaning commands as "docker compose run" options, update documentation to reflect changes
2023-04-18 11:09:47 -04:00
Celina Peralta f284809a08
add docs for viewing logs (#63) 2023-04-18 11:04:21 -04:00
Peter Rauscher 415e50510c
restart service on system failure/reboot and respect port number settings (#61) 2023-04-16 23:55:57 +00:00
Peter Rauscher 10a6ddb31c
Documentation (#58)
* Fixed TOC links

---------

Co-authored-by: eric <eric@hellman.net>
2023-04-16 18:15:55 +00:00
Eric Hellman 684beb0cc2
WIP Documentation (#55)
* fix permissions

* add links to sub-READMEs

* Update .gitignore

* add DO setup info

* Dont clean automatically and move .env to .env.template to avoid overwriting on pulls

* Copy .env.template to .env to satisfy testing reqs

* add log info

* Move deployment instructions to README and improve documentation overall

---------

Co-authored-by: Peter Rauscher <peterrauscher@protonmail.com>
2023-04-15 05:24:48 +00:00
Eric Hellman 19f95765ce
need to be able to git pull without messing up .env (#57) 2023-04-14 18:53:36 +00:00
Celina Peralta 7e33ac4677
celinanperalta/OAP 65 (#54)
Add additional parameters to OAPEN DB and refactor engine types
2023-04-14 10:39:01 +02:00
Peter Rauscher 3f05e94e67
Deployment instructions (#53)
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex

* added test case

* Combined api/config.env and database.ini into one root .env file. Needs documentation update!

* Adding default postgres credentials in .env for container tests, needs to be overwritten on DO deployment.

* documentation changes & removed unused scripts

* Documentation changes

* Added deployment instructions and moved .env to example.env

* renaming example.env to .env for test-containers.yml workflow to pass

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-13 19:32:23 +00:00
Peter Rauscher fec1e984c2
Require SSL for Postgres connections (API and mining engine) (#56) 2023-04-13 18:09:59 +00:00
Peter Rauscher 4a73bdba94
Update docker-compose.yml 2023-04-08 00:45:37 +00:00
Peter Rauscher 96718d29d3
Reference Ron's article in README 2023-04-07 14:01:20 +00:00
Peter Rauscher e3ef97cd95
OAP-68: Combine config.env and database.ini (#52)
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex

* added test case

* Combined api/config.env and database.ini into one root .env file. Needs documentation update!

* Adding default postgres credentials in .env for container tests, needs to be overwritten on DO deployment.

* documentation changes & removed unused scripts

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-07 09:37:43 -04:00
Justin O'Boyle 7e4963df77
New setup instructions (#35) 2023-04-06 14:31:18 -04:00
Peter Rauscher a5624cb718
Filter multi-line stopwords (#51)
* OAP-64: Filter multi-line stopwords and ignore substring matches using regex

* added test case

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-04-06 14:28:35 +00:00
Celina Peralta 7f92b17dc2
OAP-61: flatten suggestions database (#46)
* OAP-61: flatten suggestions database

* Remove limit on items to refresh

* Delete run.sh
2023-04-03 17:02:16 -04:00
Celina Peralta b80d745b30
add schedule package to daemon (#48) 2023-03-31 15:33:41 -04:00
Eric Hellman 966c336b68
improve documentation (#43)
* fix permissions

* add links to sub-READMEs
2023-03-27 12:53:06 -04:00
Justin O'Boyle 376545450d
Basic testing (#45)
* finished changes to stopwords and langauges

* final changes to stopwords

* basic testing

* add tests

* Remove formatter for now

* fix merge

* cd

* touch __init__

* Relative path issue \?

* run tests before app

* Move tests to inside docker

* exit when any command fails

---------

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2023-03-22 14:52:38 -04:00
Celina Peralta 884872cf60
celinanperalta/OAP-58, OAP-59: Update suggestion task, remove unnecessary collections from harvest (#42)
* Each thread inserts into DB using one synchronized conn

* Fix formatting for get_empty query

* OAP-59: Filter out unnecessary collections from harvest

* Add endpoints table check

* Fix typo in get_empty description
2023-03-22 13:15:53 -04:00
Celina Peralta 15f801a19a
Bug fixes for prod harvest (#40)
* fix tqdm counter and add request headers

* fix typo in generate suggestions, add limit to seed.py
2023-03-07 12:41:36 -05:00
Celina Peralta ebaaa7cab3
fix tqdm counter and add request headers (#39) 2023-03-06 15:37:33 -05:00
Celina Peralta 888e73e6e2
OAP-56 and logging (#38)
* incremental progress, logging, synchronization

* update refresh/harvest period

* add logger StreamHandler, add Dockerfile nltk download, tweak seed parameters

* update readme and add scripts for manual commands
2023-03-06 14:44:45 -05:00
Justin O'Boyle f4b9ed39ab
Embed script deployment fixes (#37)
* Don't EncodeURIComponent

* Fix CORS

* Explicitly define CORS header
2023-03-03 09:13:53 -05:00
Justin O'Boyle 83938f73b5
Correct CORS headers and first pass at embedded API item (#36)
* Ignore cross origin

* Add script

* Dynamic host
2023-03-03 08:58:02 -05:00
Celina Peralta e772dc2b87
OAP-54: Full harvest for DB, add threshold (#34)
* Fix harvest synchronization, add threshold parameter

* Move daemon env vars to docker-compose.yml
2023-02-23 19:23:23 -05:00
Celina Peralta f7c33c07e9
OAP-53 Fix engine Dockerfile, build psycopg2 from source not binary, write daemon (#32)
* add libpq5 to build

* remove psycopg2-binary

* add punkt as resource

* Use multiprocessing.cpu_count for max suggestion workers
2023-02-10 07:45:44 -05:00
Peter Rauscher 32bb124706
Fixed minor error with handle validation (#33) 2023-02-09 16:46:22 +00:00
Celina Peralta 27b9a77f78
OAP-48, OAP-50 (#30) 2022-12-13 08:25:40 -05:00
Justin O'Boyle 7ac8bd7af8
Make docker not run on localhost (#31) 2022-12-13 08:25:25 -05:00
Justin O'Boyle 535715932d
Setup docker (#26)
* basic config

* Add github action

* Fix makefile for linux and variable python (#27)

* fix makefile

* remove out

* add to gitignore

* Fix makefile for linux and variable python (#27)

* fix makefile

* remove out

* add to gitignore

* Fix dockerfile

* stash changes

* Make makefile dynamic (#28)

* Remove broken docker packages for now

* Add web

* Make Black Formatter happy?

Co-authored-by: Max Zaremba <max.zaremba@gmail.com>
2022-12-13 07:46:08 -05:00
Justin O'Boyle 924d2b2539
Make makefile dynamic (#28) 2022-12-05 20:57:25 -05:00
Max Zaremba 63619d3aa9
Fix makefile for linux and variable python (#27)
* fix makefile

* remove out

* add to gitignore
2022-12-04 21:54:40 -05:00
Celina Peralta 9bfdc51e5c
Tweak seed task params (#25)
* No get text

* remove pytest

* add OAP-39 work.

* add weekly item endpoint

* get weekly items

* refresh + generate suggestions

* thread seed tasks

* fix generate_suggestions concurrency

* fix typos

* draft concurrent data ingest

* seed, refresh tasks

* change seed config params
2022-11-29 01:19:16 -05:00
Celina Peralta 922ff68a17
celinanperalta/OAP 23 (#22)
* No get text

* remove pytest

* add OAP-39 work.

* add weekly item endpoint

* get weekly items

* refresh + generate suggestions

* thread seed tasks

* fix generate_suggestions concurrency

* fix typos

* draft concurrent data ingest

* seed, refresh tasks
2022-11-28 22:17:43 -05:00
Justin O'Boyle a8563e48be
Fix build job (#24)
* Fix type

* fix formatting
2022-11-28 19:36:41 -05:00
j-sofia cdf4659146
OAP-37: Read stopwords from txt (#23)
* read stopwords from txt

read stopwords from txt and README change

* leftover code removed

* formatting

* formatting again

* formatting last try
2022-11-16 17:22:34 -05:00
Justin O'Boyle 9435a69032
OAP-40 Align API more closely with ngram generation, fix environment (#21)
* First commit

* Update gitignore

* Update schema

* Remove todo
2022-11-14 15:20:43 -05:00
Celina Peralta 1aa611231b
OAP-36, OAP-39: No get_text() in OapenItem, register adapters for DB objects in Python (#20)
* No get text

* remove pytest

* add OAP-39 work.
2022-11-09 19:36:20 -05:00
Peter Rauscher 013fef0f0d
OAP-38: Add /ngrams endpoint to API (#19)
* Added data function to query ngrams within the api

* removed cleaning and seeding tasks from API level, handled at engine level

* Added /:handle/ngrams endpoint to API routes

* Reflected change from uuid to handle in log messages within API

* Just some API readme changes

* Added regex to routes to mitigate url decoding, plus added validation function for handle

Co-authored-by: j-sofia <joey.sofia1@gmail.com>
Co-authored-by: Peter Rauscher <peterrauscher@protonmail.com>

Co-authored-by: j-sofia <joey.sofia1@gmail.com>
2022-11-09 18:45:48 -05:00
Celina Peralta 4333d4fcc3
[Draft] OAP-32 Ngram Caching (#18)
* start caching ngrams

* fix build warnings

* add timestamp

* resolve comments

* pull out mogrify

* remove pytest from hook for now
2022-11-02 23:07:56 -04:00
j-sofia ccbdda287e
OAP-17: PostgreSQL integration into API with pg-promise, data function to read from DB, dotenv to read DB credentials from environment variables (#9)
* local db connection and data functions

added pg-promise package to interface with PostgreSQL, added data functions, changed api to port 3001, updated README.md

* pr review changes

* dotenv

* Update README.md with api dependencies

* Update README.md

* PR changes

* typo
2022-10-26 03:07:10 +00:00
Celina Peralta cf9569a358
celinanperalta/OAP 33 (#16)
* make db use handle not uuid

* remove lib

* remove lib
2022-10-24 19:35:43 -04:00
Celina Peralta fd7f30ca31
celinanperalta/OAP 31 (#17)
* Get items by handle

* Get items by handle

* remove lib

* update clean/seed tasks

* refactor ngrams

* fix typo in oapen.py

* why is this not being ignored
2022-10-24 19:30:57 -04:00
Celina Peralta 06468a650c Merge branch 'celinanperalta/OAP-31' into main 2022-10-24 19:20:50 -04:00
Celina Peralta 5a974ff0f1 merge main 2022-10-24 19:19:31 -04:00
Celina Peralta 162eb86497 fix typo in oapen.py 2022-10-23 19:53:48 -04:00