regluit/README.md

regluit
=======

A 'monolithic' alternative to [unglu](http://github.com/gluejar/unglu)
for the unglue.it website. regluit is essentially a Django project that
contains three applications: `frontend`, `api` and `core` that can be deployed
and configured on as many ec2 instances that are needed to support traffic.
The key difference with [unglu](http://github.com/gluejar/unglu) is that the
`frontend` app is able to access database models from `core` in the same
way that the `api` is able to...which hopefully should simplify some things.

Develop
-------

Here are some instructions for setting up regluit for development on
an Ubuntu system. If you are on OS X see notes below
to install python-setuptools in step 1:

1. `aptitude install python-setuptools git python-lxml`
1. `sudo easy_install virtualenv virtualenvwrapper`
1. `git clone git@github.com:Gluejar/regluit.git`
1. `cd regluit`
1. `mkvirtualenv regluit`
1. `pip install -r requirements_versioned.pip`
1. `add2virtualenv ..`
1. `cp settings/dev.py settings/me.py`
1. edit `settings/me.py` and set `EMAIL_HOST_USER` and `EMAIL_HOST_PASSWORD`  to your gmail username and password, if you want to see that registration emails will work properly.
1. edit `settings/me.py` and look at the facebook, twitter and google auth settings to enable federated logins from those sites
1. `echo 'export DJANGO_SETTINGS_MODULE=regluit.settings.me' >> ~/.virtualenvs/regluit/bin/postactivate`
1. `deactivate ; workon regluit`
1. `django-admin.py syncdb --migrate --noinput`
1. `django-admin.py celeryd --loglevel=INFO` start the celery daemon to perform asynchronous tasks like adding related editions, and display logging information in the foreground.`
1. `django-admin.py celerybeat -l INFO` to start the celerybeat daemon to handle scheduled tasks.
1. `django-admin.py runserver 0.0.0.0:8000` (you can change the port number from the default value of 8000)
1. point your browser at http://localhost:8000/

CSS development

1. We are using Less version 2.8 for CSS. http://incident57.com/less/. We use minified CSS.

Production Deployment
---------------------

Below are the steps for getting regluit running on EC2 with Apache and mod_wsgi, and talking to an Amazon Relational Data Store instance.
Instructions for setting please are slightly different.

1. create an ubuntu ec2 instance (e.g, go http://alestic.com/ to find various ubuntu images)
1. `sudo aptitude update`
1. `sudo aptitude upgrade`
1. `sudo aptitude install git-core apache libapache2-mod-wsgi mysql-client python-virtualenv python-mysqldb redis-server python-lxml postfix python-dev libmysqlclient-dev`
1. `sudo mkdir /opt/regluit`
1. `sudo chown ubuntu:ubuntu /opt/regluit`
1. `cd /opt`
1. `git config --global user.name "Raymond Yee"`
1. `git config --global user.email "rdhyee@gluejar.com"`
1. `ssh-keygen`
1. add `~/.ssh/id\_rsa.pub` as a deploy key on github https://github.com/Gluejar/regluit/admin/keys
1. `git clone git@github.com:Gluejar/regluit.git`
1. `cd /opt/regluit`
1. create an Amazon RDS instance
1. connect to it, e.g. `mysql -u root -h gluejardb.cboagmr25pjs.us-east-1.rds.amazonaws.com -p`
1. `CREATE DATABASE unglueit CHARSET utf8;`
1. `GRANT ALL ON unglueit.\* TO ‘unglueit’@’ip-10-244-250-168.ec2.internal’ IDENTIFIED BY 'unglueit' REQUIRE SSL;`
1. update settings/prod.py with database credentials
1. `virtualenv ENV`
1. `source ENV/bin/activate`
1. `pip install -r requirements_versioned.pip`
1. `echo "/opt/" > ENV/lib/python2.7/site-packages/regluit.pth`
1. `django-admin.py syncdb --migrate --settings regluit.settings.prod`
1. `sudo mkdir /var/www/static`
1. `sudo chown ubuntu:ubuntu /var/www/static`
1. `django-admin.py collectstatic --settings regluit.settings.prod`
1. `sudo ln -s /opt/regluit/deploy/regluit.conf /etc/apache2/sites-available/regluit`
1. `sudo a2ensite regluit`
1. `sudo a2enmod ssl rewrite`
1. `cd /home/ubuntu`
1. copy SSL server key to `/etc/ssl/private/server.key`
1. copy SSL certificate to `/etc/ssl/certs/server.crt`
1. `sudo /etc/init.d/apache2 restart`
1. `sudo adduser --no-create-home celery --disabled-password --disabled-login` (just enter return for all?)
1. `sudo cp deploy/celeryd /etc/init.d/celeryd`
1. `sudo chmod 755 /etc/init.d/celeryd`
1. `sudo cp deploy/celeryd.conf /etc/default/celeryd`
1. `sudo mkdir /var/log/celery`
1. `sudo mkdir /var/run/celery`
1. `sudo chown celery:celery /var/log/celery /var/run/celery`
1. `sudo /etc/init.d/celeryd start`
1. `sudo cp deploy/celerybeat /etc/init.d/celerybeat`
1. `sudo chmod 755 /etc/init.d/celerybeat`
1. `sudo cp deploy/celerybeat.conf /etc/default/celerybeat`
1. `sudo mkdir /var/log/celerybeat`
1. `sudo chown celery:celery /var/log/celerybeat`
1. `sudo /etc/init.d/celerybeat start`

## setup to enable ckeditor to work properly

1. `mkdir /var/www/static/media/`
1. `sudo chown ubuntu:www-data /var/www/static/media/`


Updating Production
--------------------

1. Study the latest changes in the master branch, especially keep in mind how
it has [changed from what's in production](https://github.com/Gluejar/regluit/compare/production...master).
1. Update the production branch accordingly.  If everything in `master` is ready to be moved into `production`, you can just merge `master` into `production`. Otherwise, you can grab specific parts.  (How to do so is something that should probably be described in greater detail.)
1. Login to unglue.it and run [`/opt/regluit/deploy/update-prod`](https://github.com/Gluejar/regluit/blob/master/deploy/update-prod)


OS X Developer Notes
-------------------

To run regluit on OS X you should have XCode installed

Install virtualenvwrapper according
to the process at http://blog.praveengollakota.com/47430655:

1. `sudo easy\_install pip`
1. `sudo pip install virtualenv`
1. `pip install virtualenvwrapper`

Edit or create .bashrc in ~ to enable virtualenvwrapper commands:
1. `mkdir ~/.virtualenvs`
1. Edit .bashrc to include the following lines:

    export WORKON_HOME=$HOME/.virtualenvs
    source your_path_to_virtualenvwrapper.sh_here

In the above web site, the path to virtualenvwrapper.sh was
/Library/Frameworks/Python.framework/Versions/2.7/bin/virtualenvwrapper.sh
In Snow Leopard, this may be /usr/local/bin/virtualenvwrapper.sh

Configure Terminal to automatically notice this at startup:
Terminal –> Preferences –> Settings –> Shell
Click "run command"; add `source ~/.bashrc`

If you get 'EnvironmentError: mysql_config not found'
edit the line ~/.virtualenvs/regluit/build/MySQL-python/setup_posix.py
1. mysql_config.path = "mysql_config"
to be (using a path that exists on your system)
1. mysql_config.path = "/usr/local/mysql-5.5.20-osx10.6-x86_64/bin/mysql_config"

You may need to set utf8 in /etc/my.cnf
collation-server = utf8_unicode_ci

    init-connect='SET NAMES utf8'
    character-set-server = utf8

Selenium Install
---------------

Download the selenium server:
http://selenium.googlecode.com/files/selenium-server-standalone-2.5.0.jar

Start the selenium server:
'java -jar selenium-server-standalone-2.5.0.jar'

MARC Records
------------

### For unglued books with existing print edition MARC records
1. Get the MARCXML record for the print edition from the Library of Congress.
    1. Find the book in [catalog.loc.gov](http://catalog.loc.gov/)
    1. Click on the permalink in its record (will look something like [lccn.loc.gov/2009009516](http://lccn.loc.gov/2009009516))
    1. Download MARCXML
1. At /marc/ungluify/ , enter the _unglued edition_ in the Edition field, upload file, choose license
1. The XML record will be automatically...
    * converted to suitable MARCXML and .mrc records, with both direct and via-unglue.it download links
    * written to S3
    * added to a new instance of MARCRecord
    * provided to ungluers at /marc/

### For CC/PD books with existing records that link to the ebook edition
1. Use /admin to create a new MARC record instance
1. Upload the MARC records to s3 (or wherever)
1. Add the URLs of the .xml and/or .mrc record(s) to the appropriate field(s)
1. Select the relevant edition
1. Select an appropriate marc_format:
    * use DIRECT if it links directly to the ebook file
    * use UNGLUE if it links to the unglue.it download page
    * if you have records with both DIRECT and UNGLUE links, you'll need two MARCRecord instances
    * if you have both kinds of link, put them in _separate_ records, as marc_format can only take one value
`ungluify_record.py` should only be used to modify records of print editions of unglued ebooks.  It will not produce appropriate results for CC/PD ebooks.

### For unglued ebooks without print edition MARC records, or CC/PD books without ebook MARC records
1. Get a contract cataloger to produce quality records (.xml and .mrc formats)
    * we are using ung[x] as the format for our accession numbers, where [x] is the id of the MARCRecord instance, plus leading zeroes
1. Upload those records to s3 (or wherever)
1. Create a MARCRecord instance in /admin
1. Add the URLs of the .xml and .mrc records to the appropriate fields
1. Select the relevant edition
1. Select an appropriate marc_format:
    * use DIRECT if it links directly to the ebook file
    * use UNGLUE if it links to the unglue.it download page
    * if you have records with both DIRECT and UNGLUE links, you'll need two MARCRecord instances
    * if you have both kinds of link, put them in _separate_ records, as marc_format can only take one value


# vagrant / ansible

[How to build machines using Vagrant/ansible](docs/vagrant_ansible.md)