regluit/README.md

197 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

regluit
=======
A 'monolithic' alternative to [unglu](http://github.com/gluejar/unglu)
for the unglue.it website. regluit is essentially a Django project that
contains three applications: `frontend`, `api` and `core` that can be deployed
and configured on as many ec2 instances that are needed to support traffic.
The key difference with [unglu](http://github.com/gluejar/unglu) is that the
`frontend` app is able to access database models from `core` in the same
way that the `api` is able to...which hopefully should simplify some things.
Develop
-------
Here are some instructions for setting up regluit for development on
an Ubuntu system. If you are on OS X see notes below
to install python-setuptools in step 1:
1. `aptitude install python-setuptools git python-lxml`
1. `sudo easy_install virtualenv virtualenvwrapper`
1. `git clone git@github.com:Gluejar/regluit.git`
1. `cd regluit`
1. `mkvirtualenv regluit`
1. `pip install -r requirements_versioned.pip`
1. `add2virtualenv ..`
1. `cp settings/dev.py settings/me.py`
1. edit `settings/me.py` and set `EMAIL_HOST_USER` and `EMAIL_HOST_PASSWORD` to your gmail username and password, if you want to see that registration emails will work properly.
1. edit `settings/me.py` and look at the facebook, twitter and google auth settings to enable federated logins from those sites
1. `echo 'export DJANGO_SETTINGS_MODULE=regluit.settings.me' >> ~/.virtualenvs/regluit/bin/postactivate`
1. `deactivate ; workon regluit`
1. `django-admin.py syncdb --migrate --noinput`
1. `django-admin.py celeryd --loglevel=INFO` start the celery daemon to perform asynchronous tasks like adding related editions, and display logging information in the foreground.`
1. `django-admin.py celerybeat -l INFO` to start the celerybeat daemon to handle scheduled tasks.
1. `django-admin.py runserver 0.0.0.0:8000` (you can change the port number from the default value of 8000)
1. point your browser at http://localhost:8000/
CSS development
1. We are using Less version 2.8 for CSS. http://incident57.com/less/. We use minified CSS.
Production Deployment
---------------------
Below are the steps for getting regluit running on EC2 with Apache and mod_wsgi, and talking to an Amazon Relational Data Store instance.
Instructions for setting please are slightly different.
1. create an ubuntu ec2 instance (e.g, go http://alestic.com/ to find various ubuntu images)
1. `sudo aptitude update`
1. `sudo aptitude upgrade`
1. `sudo aptitude install git-core apache libapache2-mod-wsgi mysql-client python-virtualenv python-mysqldb redis-server python-lxml postfix python-dev`
1. `sudo mkdir /opt/regluit`
1. `sudo chown ubuntu:ubuntu /opt/regluit`
1. `cd /opt`
1. `git config --global user.name "Raymond Yee"`
1. `git config --global user.email "rdhyee@gluejar.com"`
1. `ssh-keygen`
1. add `~/.ssh/id\_rsa.pub` as a deploy key on github https://github.com/Gluejar/regluit/admin/keys
1. `git clone git@github.com:Gluejar/regluit.git`
1. `cd /opt/regluit`
1. create an Amazon RDS instance
1. connect to it, e.g. `mysql -u root -h gluejardb.cboagmr25pjs.us-east-1.rds.amazonaws.com -p`
1. `CREATE DATABASE unglueit CHARSET utf8;`
1. `GRANT ALL ON unglueit.\* TO unglueit@ip-10-244-250-168.ec2.internal IDENTIFIED BY 'unglueit' REQUIRE SSL;`
1. update settings/prod.py with database credentials
1. `virtualenv ENV`
1. `source ENV/bin/activate`
1. `pip install -r requirements_versioned.pip`
1. `echo "/opt/" > ENV/lib/python2.7/site-packages/regluit.pth`
1. `django-admin.py syncdb --migrate --settings regluit.settings.prod`
1. `sudo mkdir /var/www/static`
1. `sudo chown ubuntu:ubuntu /var/www/static`
1. `django-admin.py collectstatic --settings regluit.settings.prod`
1. `sudo ln -s /opt/regluit/deploy/regluit.conf /etc/apache2/sites-available/regluit`
1. `sudo a2ensite regluit`
1. `sudo a2enmod ssl rewrite`
1. `cd /home/ubuntu`
1. copy SSL server key to `/etc/ssl/private/server.key`
1. copy SSL certificate to `/etc/ssl/certs/server.crt`
1. `sudo /etc/init.d/apache2 restart`
1. `sudo adduser --no-create-home celery --disabled-password --disabled-login` (just enter return for all?)
1. `sudo cp deploy/celeryd /etc/init.d/celeryd`
1. `sudo chmod 755 /etc/init.d/celeryd`
1. `sudo cp deploy/celeryd.conf /etc/default/celeryd`
1. `sudo mkdir /var/log/celery`
1. `sudo mkdir /var/run/celery`
1. `sudo chown celery:celery /var/log/celery /var/run/celery`
1. `sudo /etc/init.d/celeryd start`
1. `sudo cp deploy/celerybeat /etc/init.d/celerybeat`
1. `sudo chmod 755 /etc/init.d/celerybeat`
1. `sudo cp deploy/celerybeat.conf /etc/default/celerybeat`
1. `sudo mkdir /var/log/celerybeat`
1. `sudo chown celery:celery /var/log/celerybeat`
1. `sudo /etc/init.d/celerybeat start`
## setup to enable ckeditor to work properly
1. `mkdir /var/www/static/media/`
1. `sudo chown ubuntu:www-data /var/www/static/media/`
Updating Production
--------------------
1. Study the latest changes in the master branch, especially keep in mind how
it has [changed from what's in production](https://github.com/Gluejar/regluit/compare/production...master).
1. Update the production branch accordingly. If everything in `master` is ready to be moved into `production`, you can just merge `master` into `production`. Otherwise, you can grab specific parts. (How to do so is something that should probably be described in greater detail.)
1. Login to unglue.it and run [`/opt/regluit/deploy/update-prod`](https://github.com/Gluejar/regluit/blob/master/deploy/update-prod)
OS X Developer Notes
-------------------
To run regluit on OS X you should have XCode installed
Install virtualenvwrapper according
to the process at http://blog.praveengollakota.com/47430655:
1. `sudo easy\_install pip`
1. `sudo pip install virtualenv`
1. `pip install virtualenvwrapper`
Edit or create .bashrc in ~ to enable virtualenvwrapper commands:
1. `mkdir ~/.virtualenvs`
1. Edit .bashrc to include the following lines:
export WORKON_HOME=$HOME/.virtualenvs
source your_path_to_virtualenvwrapper.sh_here
In the above web site, the path to virtualenvwrapper.sh was
/Library/Frameworks/Python.framework/Versions/2.7/bin/virtualenvwrapper.sh
In Snow Leopard, this may be /usr/local/bin/virtualenvwrapper.sh
Configure Terminal to automatically notice this at startup:
Terminal > Preferences > Settings > Shell
Click "run command"; add `source ~/.bashrc`
If you get 'EnvironmentError: mysql_config not found'
edit the line ~/.virtualenvs/regluit/build/MySQL-python/setup_posix.py
1. mysql_config.path = "mysql_config"
to be (using a path that exists on your system)
1. mysql_config.path = "/usr/local/mysql-5.5.20-osx10.6-x86_64/bin/mysql_config"
You may need to set utf8 in /etc/my.cnf
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
Selenium Install
---------------
Download the selenium server:
http://selenium.googlecode.com/files/selenium-server-standalone-2.5.0.jar
Start the selenium server:
'java -jar selenium-server-standalone-2.5.0.jar'
MARC Records
------------
### For unglued books with existing print edition MARC records
1. Get the MARCXML record for the print edition from the Library of Congress.
1. Find the book in [catalog.loc.gov](http://catalog.loc.gov/)
1. Click on the permalink in its record (will look something like [lccn.loc.gov/2009009516](http://lccn.loc.gov/2009009516))
1. Download MARCXML
1. At /marc/ungluify/ , enter the _unglued edition_ in the Edition field, upload file, choose license
1. The XML record will be automatically...
* converted to suitable MARCXML and .mrc records, with both direct and via-unglue.it download links
* written to S3
* added to a new instance of MARCRecord
* provided to ungluers at /marc/
### For CC/PD books with existing records that link to the ebook edition
1. Use /admin to create a new MARC record instance
1. Upload the MARC records to s3 (or wherever)
1. Add the URLs of the .xml and/or .mrc record(s) to the appropriate field(s)
1. Select the relevant edition
1. Select an appropriate marc_format:
* use DIRECT if it links directly to the ebook file
* use UNGLUE if it links to the unglue.it download page
* if you have records with both DIRECT and UNGLUE links, you'll need two MARCRecord instances
* if you have both kinds of link, put them in _separate_ records, as marc_format can only take one value
`ungluify_record.py` should only be used to modify records of print editions of unglued ebooks. It will not produce appropriate results for CC/PD ebooks.
### For unglued ebooks without print edition MARC records, or CC/PD books without ebook MARC records
1. Get a contract cataloger to produce quality records (.xml and .mrc formats)
* we are using ung[x] as the format for our accession numbers, where [x] is the id of the MARCRecord instance, plus leading zeroes
1. Upload those records to s3 (or wherever)
1. Create a MARCRecord instance in /admin
1. Add the URLs of the .xml and .mrc records to the appropriate fields
1. Select the relevant edition
1. Select an appropriate marc_format:
* use DIRECT if it links directly to the ebook file
* use UNGLUE if it links to the unglue.it download page
* if you have records with both DIRECT and UNGLUE links, you'll need two MARCRecord instances
* if you have both kinds of link, put them in _separate_ records, as marc_format can only take one value