DB SSL, API format changes, new endpoints, and unit testing with Actions

* Run unit tests with Github Actions on each push

* Change job timeout to 10 minutes

* Fix for sslmode in API connection string

* Select lists of suggestions or ngrams with /api or /api/ngrams respectively, JSONify ngrams response

* Added better documentation of API endpoints

* Switch from connection string to connection object in API
peterrauscher/oap-66
Peter Rauscher 2023-04-18 19:48:09 -04:00 committed by GitHub
parent 2e41d8e36b
commit 40cac8ba55
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 279 additions and 54 deletions

View File

@ -5,7 +5,7 @@ jobs:
name: Run tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- uses: actions/checkout@v3
- uses: actions/setup-node@v1
with:
node-version: '14.x'

View File

@ -4,16 +4,21 @@ on: push
jobs:
docker:
timeout-minutes: 4
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v1
- run: cp .env.template .env
uses: actions/checkout@v3
- name: Create .env file
run: |
cp .env.template .env
sed -i 's/POSTGRES_SSLMODE=require/POSTGRES_SSLMODE=allow/' .env
- name: Create PostgreSQL container
run: docker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgrespw postgres
- name: Start containers
run: docker-compose -f "docker-compose.yml" up -d --build
- name: Unit tests for oapen-engine
run: docker-compose run --entrypoint "./scripts/tests.sh" oapen-engine
- name: Stop containers
if: always()
run: docker-compose -f "docker-compose.yml" down

132
README.md
View File

@ -6,22 +6,26 @@ The OAPEN Suggestion Service uses natural-language processing to suggest books b
## Table of Contents
* [Installation (Server)](#installation-server)
+ [DigitalOcean Droplet](#digitalocean-droplet)
+ [DigitalOcean Managed Database](#digitalocean-managed-database)
+ [Setup Users & Install Requirements](#setup-users--install-requirements)
+ [Clone & Configure the Project](#clone--configure-the-project)
+ [SSL Certificate](#ssl-certificate)
* [Running](#running)
* [Endpoints](#endpoints)
* [Logging](#logging)
* [Service Components](#service-components)
+ [Suggestion Engine](#suggestion-engine)
+ [API](#api)
+ [Embed Script](#embed-script)
+ [Web Demo](#web-demo)
* [Updates](#updates)
* [Local Installation (No Server)](#local-installation-no-server)
- [Installation (Server)](#installation-server)
* [DigitalOcean Droplet](#digitalocean-droplet)
* [DigitalOcean Managed Database](#digitalocean-managed-database)
* [Setup Users & Install Requirements](#setup-users-install-requirements)
* [Clone & Configure the Project](#clone-configure-the-project)
* [SSL Certificate](#ssl-certificate)
- [Running](#running)
- [Endpoints](#endpoints)
* [/api](#get-api)
* [/api/ngrams](#get-apingrams)
* [/api/{handle}](#get-apihandle)
* [/api/{handle}/ngrams](#get-apihandlengrams)
- [Logging](#logging)
- [Service Components](#service-components)
* [Suggestion Engine](#suggestion-engine)
* [API](#api)
* [Embed Script](#embed-script)
* [Web Demo](#web-demo)
- [Updates](#updates)
- [Local Installation (No Server)](#local-installation-no-server)
## Installation (Server)
@ -188,13 +192,93 @@ docker compose up -d --build
The API provides access to the following endpoints:
- `http://localhost:3001/api/{handle}`
- e.g. http://localhost:3001/api/20.400.12657/47581
- `http://localhost:3001/api/{handle}/?threshold={integer}`
- e.g. http://localhost:3001/api/20.400.12657/47581/?threshold=5
- `http://localhost:3001/api/{handle}/ngrams`
- e.g. http://localhost:3001/api/20.400.12657/47581/ngrams
### GET /api
Returns an array of suggestions for each book as an array.
The array of books is ordered by the date they were added (most recent first).
#### Query Parameters
- `limit` (optional): limits the number of results returned. Default is 25, maximum is 100.
- `offset` (optional): offset the list of results. Default is 0.
- `threshold` (optional): sets the minimum similarity score to receive suggestions for. Default is 0, returning all suggestions.
#### Examples
Any combination of the query parameters in any order are valid.
- `/api?threshold=3`
Returns suggestions with a similarity score of 3 or more for the 25 most recently added books.
- `/api?threshold=5&limit=100`
Returns suggestions with a similarity score of 3 or more for the 100 most recently added books.
- `/api?limit=50&offset=1000`
Returns 50 books and all of their suggestions, skipping the 1000 most recent.
### GET /api/ngrams
Returns an array of ngrams and their occurences for each book as an array.
The array of books is ordered by the date they were added (most recent first).
#### Query Parameters
- `limit` (optional): limits the number of results returned. Default is 25, maximum is 100.
- `offset` (optional): offset the list of results. Default is 0.
#### Examples
Any combination of the query parameters in any order are valid.
- `/api?limit=100`
Returns ngrams for the 100 most recent books.
- `/api?offset=1000`
Returns ngrams for 25 books, skipping the 1000 most recent.
### GET /api/{handle}
Returns suggestions for the book with the specified handle.
#### Path Parameters
`{handle}` (required): the handle of the book to retrieve.
#### Query Parameters
`threshold` (optional): sets the minimum similarity score to receive suggestions for. Default is 0, returning all suggestions.
#### Examples
> **NOTE**: You won't need to worry about the forward slash in handles causing problems, this is handled server-side.
- `/api/20.400.12657/47581`
Returns suggestions for [the book](https://library.oapen.org/handle/20.500.12657/37041) with the handle `20.400.12657/47581`.
- `/api/20.400.12657/47581?threshold=3`
Returns suggestions with a similarity score of 3 or more for [the book](https://library.oapen.org/handle/20.500.12657/37041) with the handle `20.400.12657/47581`.
### GET /api/{handle}/ngrams
Returns the ngrams and their occurences for the book with the specified handle.
#### Path Parameters
`{handle}` (required): the handle of the book to retrieve.
#### Example
`/api/20.400.12657/47581/ngrams`
Returns ngrams and their occurences for [the book](https://library.oapen.org/handle/20.500.12657/37041) with the handle `20.400.12657/47581`.
## Logging
Log files are automatically generated by Docker for each container. The log files can be found in `/var/lib/docker/containers/<container-id>/*-json.log`.
@ -309,5 +393,5 @@ Configuration info for the web demo is in [`web/README.md`](web/README.md).
POSTGRES_PASSWORD=<Password of the postgres user>
POSTGRES_SSLMODE=<'allow' for a local installation>
```
4. See [Running](#running)

View File

@ -7,21 +7,22 @@ class DatabaseConnectionError extends Error {
}
}
if (
!(
process.env.POSTGRES_USERNAME &&
process.env.POSTGRES_PASSWORD &&
process.env.POSTGRES_HOST &&
process.env.POSTGRES_PORT &&
process.env.POSTGRES_DB_NAME &&
process.env.POSTGRES_SSLMODE
)
)
let db;
try {
const cn = {
host: process.env.POSTGRES_HOST,
port: process.env.POSTGRES_PORT,
database: process.env.POSTGRES_DB_NAME,
user: process.env.POSTGRES_USERNAME,
password: process.env.POSTGRES_PASSWORD,
ssl: process.env.POSTGRES_SSLMODE === "require"
};
db = pgp(cn);
} catch {
throw new DatabaseConnectionError(
"Some Postgres environment variables weren't found. Please configure them in the .env file."
"Postgres connection could not be created, please check your .env file."
);
}
const connection_string = `postgresql://${process.env.POSTGRES_USERNAME}:${process.env.POSTGRES_PASSWORD}@${process.env.POSTGRES_HOST}:${process.env.POSTGRES_PORT}/${process.env.POSTGRES_DB_NAME}?sslmode=${process.env.POSTGRES_SSLMODE}`;
const db = pgp(connection_string);
module.exports = db;
module.exports = db;

View File

@ -18,16 +18,15 @@ async function querySuggestions(handle, threshold = 0) {
return { error: { name: error.name, message: error.message } };
});
if (result?.["error"])
return result;
if (result?.["error"]) return result;
console.log(result);
const data = {
"handle": handle,
"suggestions": result
handle: handle,
suggestions: result,
};
return data;
}
@ -35,18 +34,83 @@ async function queryNgrams(handle) {
await validate.checkHandle(handle);
const query = new PQ({
text: "SELECT * FROM oapen_suggestions.ngrams WHERE handle = $1",
text: `SELECT handle, "name", thumbnail, created_at, updated_at,
array_agg(
JSON_BUILD_OBJECT(
'ngram', ngram.ngram,
'count', ngram.count
)
) as ngrams
FROM oapen_suggestions.ngrams, UNNEST(ngrams) as ngram
WHERE handle = $1
GROUP BY handle;`,
values: [handle],
});
return db.one(query).catch((error) => {
return { error: { name: error.name, message: error.message } };
});
}
// return await db.any(query);
async function queryManySuggestions(
threshold = 0,
limit = validate.DEFAULT_ITEM_LIMIT,
offset = 0
) {
if (threshold < 0) threshold = 0;
if (limit > validate.MAX_ITEM_LIMIT) {
limit = validate.MAX_ITEM_LIMIT;
} else if (limit < 1) {
limit = 1;
}
if (offset < 0) offset = 0;
const query = new PQ({
text: `SELECT suggestion AS handle, score
FROM oapen_suggestions.suggestions
WHERE score >= $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3;`,
values: [threshold, limit, offset],
});
return db.query(query).catch((error) => {
return { error: { name: error.name, message: error.message } };
});
}
async function queryManyNgrams(limit = validate.DEFAULT_ITEM_LIMIT, offset = 0) {
if (limit > validate.MAX_ITEM_LIMIT) {
limit = validate.MAX_ITEM_LIMIT;
} else if (limit < 1) {
limit = 1;
}
if (offset < 0) offset = 0;
const query = new PQ({
text: `SELECT handle, "name", thumbnail, created_at, updated_at,
array_agg(
JSON_BUILD_OBJECT(
'ngram', ngram.ngram,
'count', ngram.count
)
) as ngrams
FROM oapen_suggestions.ngrams, UNNEST(ngrams) as ngram
GROUP BY handle
ORDER BY created_at
LIMIT $1 OFFSET $2;
`,
values: [limit, offset],
});
return db.query(query).catch((error) => {
return { error: { name: error.name, message: error.message } };
});
}
module.exports = {
querySuggestions,
queryNgrams,
queryManySuggestions,
queryManyNgrams,
};

View File

@ -59,4 +59,67 @@ router.get("/:handle([0-9]+.[0-9]+.[0-9]+/[0-9]+)/ngrams", async (req, res) => {
}
});
router.get("/", async (req, res) => {
try {
let threshold = parseInt(req.query.threshold) || 0;
if (threshold < 0) threshold = 0;
let limit = parseInt(req.query.limit) || validate.DEFAULT_ITEM_LIMIT;
let offset = parseInt(req.query.offset) || 0;
if (limit > validate.MAX_ITEM_LIMIT) {
limit = validate.MAX_ITEM_LIMIT;
} else if (limit < 1) {
limit = 1;
}
if (offset < 0) offset = 0;
let responseData = await data.queryManySuggestions(threshold, limit, offset);
if (
responseData.error &&
responseData.error.name === pgp.errors.QueryResultError.name
) {
return res.status(404).json({ error: responseData.error.message });
} else if (responseData.error) {
return res.status(500).json(responseData);
}
res.header("Access-Control-Allow-Origin", "*");
return res.status(200).json(responseData);
} catch (e) {
return res.status(500).json({ error: "Internal server error" });
}
})
router.get("/ngrams", async (req, res) => {
try {
let limit = parseInt(req.query.limit) || validate.DEFAULT_ITEM_LIMIT;
let offset = parseInt(req.query.offset) || 0;
if (limit > validate.MAX_ITEM_LIMIT) {
limit = validate.MAX_ITEM_LIMIT;
} else if (limit < 1) {
limit = 1;
}
if (offset < 0) offset = 0;
let responseData = await data.queryManyNgrams(limit, offset);
if (
responseData.error &&
responseData.error.name === pgp.errors.QueryResultError.name
) {
return res.status(404).json({ error: responseData.error.message });
} else if (responseData.error) {
return res.status(500).json(responseData);
}
res.header("Access-Control-Allow-Origin", "*");
return res.status(200).json(responseData);
} catch (e) {
return res.status(500).json({ error: "Internal server error" });
}
})
module.exports = router;

View File

@ -7,6 +7,9 @@ class UserError extends Error {
}
}
const MAX_ITEM_LIMIT = 100;
const DEFAULT_ITEM_LIMIT = 25;
// RegEx to match formatting of handle
const handleRegExpression = new RegExp("([0-9]+.[0-9]+.[0-9]+/[0-9]+)");
@ -24,4 +27,6 @@ let checkHandle = async (handle) => {
module.exports = {
checkHandle,
MAX_ITEM_LIMIT,
DEFAULT_ITEM_LIMIT
};

View File

@ -1,3 +1,3 @@
#!/bin/sh
#!/bin/bash
python src/tasks/refresh_items.py

View File

@ -0,0 +1,3 @@
#!/bin/bash
python src/test/data/run_tests.py