12 KiB
OAPEN Suggestion Service
Description
The OAPEN Suggestion Service uses natural-language processing to suggest books based on their content similarities. To protect user privacy, we utilize text analysis rather than usage data to provide recommendations. This service is built on the proof-of-concept and paper by Ronald Snijder from the OAPEN Foundation, and you can read the paper here.
Table of Contents
Installation (Server)
DigitalOcean Droplet
- Log in to your DigitalOcean account.
- Create a new Droplet.
- Under "Choose an image" select "Marketplace" and search for "Docker". Select "Docker 20.10.21 on Ubuntu 22.04".
- Choose any size, but the cheapest option will work fine.
- If you do not have an ssh key, generate one with:
And copy the public key to your clipboard. If you have a key on your computer already, you can use that.ssh-keygen -t rsa -b 4096
- Under "Choose Authentication Method" choose "SSH Key" and click "New SSH Key", and in the popup window paste the public key you copied to your clipboard. Make sure it is selected.
- Give the Droplet a name and click "Create".
- Open the firewall ports
DigitalOcean Managed Database
- From the DigitalOcean dashboard, click "Databases" > "Create Database".
- Ideally, select the same region & datacenter as the Droplet you just created, so they can be part of the same VPC network.
- Choose "PostgreSQL v15".
- Select any sizing plan, but the cheapest one will suffice.
- Give the database a name, and click "Create Database Cluster".
- Once the database is done creating (this can take a few minutes), find the "Connection details" section on the new database's page, you will need them later.
Setup Users & Install Requirements
-
Log in to the droplet over SSH:
ssh root@<your-droplet-ip>
-
Create a new user
oapen
and set a password, adding them to thesudo
anddocker
groups, then login as the new user:useradd -m -G sudo,docker oapen passwd oapen su -l -s /bin/bash oapen
-
Install the
docker compose
command:sudo apt-get update sudo apt-get install docker-compose-plugin
-
Change the SSH configuration file to disallow root login:
sudo sed -i 's/PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
-
Allow SSH login with non-root user with the same SSH keys you uploaded to DigitalOcean:
mkdir -p ~/.ssh sudo cp /root/.ssh/authorized_keys ~/.ssh/ sudo chown -R oapen:oapen ~/.ssh sudo chmod 700 ~/.ssh sudo chmod 600 ~/.ssh/authorized_keys sudo systemctl restart ssh
-
Create a swapfile to avoid issues with high memory usage:
sudo fallocate -l 1G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
Feel free to replace
1G
in the first command with4G
. Although the service should never use this much memory, extra swap never hurts if you have the disk space to spare. More on swap here. -
Restart the droplet to persist all of the changes. From now on, login to the droplet with:
ssh oapen@<your-droplet-ip>
Clone & Configure the Project
-
Clone the repository and cd into the directory it creates:
git clone https://github.com/EbookFoundation/oapen-suggestion-service.git cd oapen-suggestion-service
You can clone this anywhere but in the home directory is easiest.
-
Copy the
.env.template
file to.env
:cp .env.template .env
-
Using a text editor like
vim
ornano
configure all of the options in.env
:API_PORT=<Port to serve API on> POSTGRES_HOST=<Hostname of postgres server> POSTGRES_PORT=<Port postgres is running on> POSTGRES_DB_NAME=<Name of the postgres database> POSTGRES_USERNAME=<Username of the postgres user> POSTGRES_PASSWORD=<Password of the postgres user> POSTGRES_SSLMODE=<'require' when using a managed database>
Postgres credentials can be found in the "Connection details" section of the managed database
-
Open the
docker-compose.yml
file and find the line:- RUN_CLEAN=1
This is set to
1
by default, which causes the database to be COMPLETELY deleted and the types recreated each time the server restarts. It is important to have this set to1
only on the first run of the application, or after making changes that affect the structure of the database. As soon as you run the application with the following command, you should change the line to:- RUN_CLEAN=0
To prevent this behavior.
SSL Certificate
TODO: add documentation
Running
You can start the services by running the following command in the directory where you cloned the repo:
docker compose up -d
The API will be running on https://<your-ip>:<API_PORT>
.
NOTE: The
-d
flag runs the services in the background, so you can safely exit the session and the services will continue to run.
You can stop the services with:
docker compose down
You can view the logs with:
docker compose logs -f
You can dump them with
docker compose logs > some_file.txt
To view the logs for just a specific service component - for example the mining enginge - use:
docker logs -f oapen-suggestion-service-oapen-engine-1
Endpoints
The API provides access to the following endpoints:
http://localhost:3001/api/{handle}
http://localhost:3001/api/{handle}/?threshold={integer}
http://localhost:3001/api/{handle}/ngrams
Service Components
This project is a monorepo, with multiple services that work in tandem to provide suggestions:
Suggestion Engine
This engine is written in Python, and generates the recommendation data for users. Our suggestion service is centered around the trigram semantic inferencing algorithm. This script should be run as a job on a cron schedule to periodically ingest new texts added to the OAPEN catalog through their API. It populates the database with pre-processed lists of suggestions for each entry in the catalog.
You can find the code for the suggestion engine in oapen-engine/
, and read more about it in oapen-engine/README.md
.
API
This API server serves book recommendations from the database over HTTP in a standard RESTful architecture.
You can find the code for the API in api/
, and readmore about it in api/README.md
.
Embed Script
The embed script is a drop-in snippet of HTML, CSS, and JavaScript that can be added to the library.oapen.org site, and adds book recommendation functionality to the sidebar of each book page.
You can find the code for the embed script in embed-script/
, and read more about it in embed-script/README.md
.
Web Demo
This is a web-app demo that can be used to query the API engine and see suggested books. This does not have to be maintained if the API is used on another site, but is useful for development and a tech demo.
You can find the code for the web demo in web/
.
Configuration info for the web demo is in web/README.md
.
Base dependencies:
- NodeJS 14.x+
- NPM package manager
Automatically-installed dependencies:
next
-- Framework for production-driven web apps- Maintained by Vercel and the open source community
react
-- Frontend design framework- Maintained by Meta.
- Largest frontend web UI library.
- (Alternative considered: Angular -- however, was recently deprecated by Google)
pg
-- basic PostgreSQL driver- Maintained on npm
typescript
-- Types for JavaScript- Maintained by Microsoft and the open source community.
Updates
TODO: add documentation
Local Installation (No Server)
-
Install Docker
This project uses Docker. Instructions for installing Docker here. Note that if you do not install Docker with Docker Desktop (which is recommended) you will have to install Docker Compose separately Instructions for that here.
-
Install PostgreSQL
You can find instructions for installing PostgreSQL on your machine here.
Or you can create a PostgreSQL server with Docker:
docker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgrespw postgres
The username and database name will both be
postgres
and the password will bepostgrespw
. You can connect via the hostnamehost.docker.internal
over port5432
. -
Clone and configure the project
-
Clone the repo and go into its directory:
git clone https://github.com/EbookFoundation/oapen-suggestion-service.git cd oapen-suggestion-service
-
Copy the
.env.template
file to.env
:cp .env.template .env
-
Using a text editor like
vim
ornano
configure all of the options in.env
:API_PORT=<Port to serve API on> POSTGRES_HOST=<Hostname of postgres server> POSTGRES_PORT=<Port postgres is running on> POSTGRES_DB_NAME=<Name of the postgres database> POSTGRES_USERNAME=<Username of the postgres user> POSTGRES_PASSWORD=<Password of the postgres user> POSTGRES_SSLMODE=<'allow' for a local installation>
-
Open the
docker-compose.yml
file and find the line:- RUN_CLEAN=1
This is set to
1
by default, which causes the database to be COMPLETELY deleted and the types recreated each time the server restarts. It is important to have this set to1
only on the first run of the application, or after making changes that affect the structure of the database. As soon as you run the application with the following command, you should change the line to:- RUN_CLEAN=0
To prevent this behavior.
-
-
See Running