in `copyrightEntry` XML entries there are frequently repeating date elements that have the same values. These can be different and more or less complete. The most common example of this is an entry having two `pub_date` values of `1932` and `1932-06-15` for example.
Previously this process simply took the first date. This updates the date parser to sort all dates of the same type by length, meaning the most complete dates are parsed. If they cannot be parsed the other, less complete, dates are used.
This was prompted by the presence of false January 1st dates and also the discovery that publication dates are often used as replacement registration dates in renewals.
To help researchers identify the source for copyright entries the following fields (already stored in the database) are added to the API response:
- volume number
- group
- matter
- number (for 3rd series entries only)
The API checks to see if `3` appears in the `series` field and if so adds the `number` field.
This also adds these fields to the object definition in the Swagger documentation.
This includes a fix to standardize output of dates to ISO 8601, replacing the current output of their display as entered in the copyright volumes.
This also includes several code formatting fixes and updates to the `search` and `uuid` endpoints to ensure that the proper status code is returned with non-200 responses
The important tables `xml` and `registration` were not properly set for their `CASCADE` behavior, in addiiton `XML` needed to have the `single_parent` option enabled to allow for cascading-deletes (since otherwise a single entry could be referenced by an entry and a error.
This adds `.ebextension` options to the repository that can control how the beanstalk environment is configured. The two files perform different tasks:
- `sfr-bardo-copyright-development.config` is an empty file for environment variables (empty because at present ENV variables contain secrets that cannot be committed to source control)
- `cron-linux.config` contains configuration details for a nightly cron task that checks for updates from the source git repositories
The ElasticBeanstalk application looks for an object named `application` to run with `WSGI` this was previously created with the `create_app` method and used `app` as the name for the application object.
Removes a Swagger YAML file that is not currently used. It was too difficult to maintain swagger in a set of separate YAML files, so a single one was created and loaded in the main FLask app.
Add a basic `Flask` API that responds to queries for copyright data. This includes 5 basic endpoints:
- `/search/fulltext`: queries all text fields in the Registration and Renewal records
- `/search/registration/<regnum>`: queries the collection for a specific copyright registration by registration number
- `/search/renewal/<rennum>`: queries the collection for a specific copyright renewal by renewal number
- `/registration/<uuid>`: fetches a single registration record by internally assigned UUID
- `/registration/<uuid?`: fetches a single renewal record by internally assigned UUID
The api can be run with the standard `python -m flask run` from the root of the project and by default will run in `production` mode. To set `development` run `export FLASK_ENV=development` before starting the application.
This adds a full `Claimant` object to the ElasticSearch index, including the `claimant_type` field which helps users see the specific relationship a claimant has to a renewal. It would be good to provide translations of these codes in the future, but this is not currently necessary.
Includes an initial version of the utility script used to generate
the copyright entry/renewal database along with instructions on how
to run the script and create a version of the database locally