webapp for unglue.it

Go to file

eric 0600078c27 empty the repair migration		2016-07-29 18:49:08 -04:00
api	context_instance is deprecated	2016-07-27 14:49:10 -04:00
bisac	patterns in urlpatterns are deprecated	2016-07-27 13:02:47 -04:00
bookdata	code can now load description, subjects and covers for the pdf files	2014-07-24 16:29:28 -07:00
booxtream	migrations	2016-07-21 15:38:09 -04:00
core	refactor libraryauth	2016-07-28 15:28:05 -04:00
deploy	forgot to fix update-* scripts for new celery configuration	2016-06-28 12:05:38 -07:00
distro	migrations	2016-07-21 15:38:09 -04:00
docs	implementation of read-only api for Work, Edition, Subject, Campaign, Author	2011-09-12 14:50:29 -07:00
experimental	switch to contrib_comments	2016-07-21 16:05:57 -04:00
frontend	refactor libraryauth	2016-07-28 15:28:05 -04:00
libraryauth	empty the repair migration	2016-07-29 18:49:08 -04:00
logs	need this log directory	2011-09-04 05:40:12 +00:00
marc	patterns in urlpatterns are deprecated	2016-07-27 13:02:47 -04:00
mobi	add checker for mobi	2014-02-05 18:17:26 -05:00
not_maintained	moved to not_maintained	2013-08-19 22:27:25 -04:00
notebooks	change the various crontabs	2016-06-24 14:23:03 -07:00
payment	patterns in urlpatterns are deprecated	2016-07-27 13:02:47 -04:00
pyepub	switch from deprecated get_model a app registry	2016-07-24 18:39:36 -04:00
questionnaire	context_instance is deprecated	2016-07-27 14:49:10 -04:00
selenium	Adding explicit waits to selenium payment tests in order to wait for very slow js when running headless on ec2	2012-01-26 19:50:14 +00:00
settings	refactor libraryauth	2016-07-28 15:28:05 -04:00
static	add link to author name	2016-06-06 13:49:22 -04:00
sysadmin	Added rrsets_for_domain to aws.py and some mods to my notebooks	2015-05-04 10:51:11 -07:00
test	switch to contrib_comments	2016-07-21 16:05:57 -04:00
test-data	removed duplicate works until work deduping is working again	2011-10-10 17:25:55 -04:00
utils	allow blank ISBN	2014-08-06 10:59:16 -04:00
vagrant	now use master branch	2016-06-27 09:28:44 -07:00
.gitignore	stop ignoring marc directory	2014-10-17 17:12:24 -04:00
README.md	work in progress to adapt Fabric script for deploying gluejar.com	2015-05-04 10:51:11 -07:00
STAR_unglue_it.ca-bundle	update just.conf to move from just.unglueit.com -> just.unglue.it and for using a different CA	2013-01-07 20:53:43 -05:00
__init__.py	setup api, core and frontend apps, also added initial homepage template from stefan	2011-08-30 23:46:55 -04:00
admin.py	refactor admin	2016-07-26 10:34:45 -04:00
aws_cleanup.ipynb	* making progress on building please.unglue.it	2015-05-04 10:51:12 -07:00
bitnami_launch.ipynb	buid_ec2_instances_for_django.ipynb now has a Fabric script that can build an Ubuntu instance w/ local mysql server installed and all the regluit code downloaded from github	2015-05-04 10:51:09 -07:00
build_ec2_instances_for_django.ipynb	small change in a comment in build_ec2_instances_for_django.ipynb	2015-05-04 10:51:13 -07:00
build_ec2_instances_for_django.py	fixes to allow for lxml to be installed	2015-05-04 10:51:12 -07:00
build_just.ipynb	add build_just.ipynb	2015-05-04 10:51:13 -07:00
context_processors.py	update noftifications	2016-07-22 18:44:54 -04:00
deploy_gluejar_dot_com.ipynb	latest notebook update	2015-05-04 10:51:12 -07:00
deploy_gluejar_dot_com.py	* making progress on building please.unglue.it	2015-05-04 10:51:12 -07:00
fabfile.py	update selenium dependency	2016-04-18 10:17:21 -07:00
manage.py	update manage.py for Django 1.6	2016-04-11 13:17:16 -07:00
requirements_versioned.pip	switch from deprecated get_model a app registry	2016-07-24 18:39:36 -04:00
ssh_fingerprint.ipynb	buid_ec2_instances_for_django.ipynb now has a Fabric script that can build an Ubuntu instance w/ local mysql server installed and all the regluit code downloaded from github	2015-05-04 10:51:09 -07:00
urls.py	refactor libraryauth	2016-07-28 15:28:05 -04:00

README.md

regluit

A 'monolithic' alternative to unglu for the unglue.it website. regluit is essentially a Django project that contains three applications: frontend, api and core that can be deployed and configured on as many ec2 instances that are needed to support traffic. The key difference with unglu is that the frontend app is able to access database models from core in the same way that the api is able to...which hopefully should simplify some things.

Develop

Here are some instructions for setting up regluit for development on an Ubuntu system. If you are on OS X see notes below to install python-setuptools in step 1:

aptitude install python-setuptools git python-lxml
sudo easy_install virtualenv virtualenvwrapper
git clone git@github.com:Gluejar/regluit.git
cd regluit
mkvirtualenv regluit
pip install -r requirements_versioned.pip
add2virtualenv ..
cp settings/dev.py settings/me.py
edit settings/me.py and set EMAIL_HOST_USER and EMAIL_HOST_PASSWORD to your gmail username and password, if you want to see that registration emails will work properly.
edit settings/me.py and look at the facebook, twitter and google auth settings to enable federated logins from those sites
echo 'export DJANGO_SETTINGS_MODULE=regluit.settings.me' >> ~/.virtualenvs/regluit/bin/postactivate
deactivate ; workon regluit
django-admin.py syncdb --migrate --noinput
django-admin.py celeryd --loglevel=INFO start the celery daemon to perform asynchronous tasks like adding related editions, and display logging information in the foreground.`
django-admin.py celerybeat -l INFO to start the celerybeat daemon to handle scheduled tasks.
django-admin.py runserver 0.0.0.0:8000 (you can change the port number from the default value of 8000)
point your browser at http://localhost:8000/

CSS development

We are using Less version 2.8 for CSS. http://incident57.com/less/. We use minified CSS.

Production Deployment

Below are the steps for getting regluit running on EC2 with Apache and mod_wsgi, and talking to an Amazon Relational Data Store instance. Instructions for setting please are slightly different.

create an ubuntu ec2 instance (e.g, go http://alestic.com/ to find various ubuntu images)
sudo aptitude update
sudo aptitude upgrade
sudo aptitude install git-core apache libapache2-mod-wsgi mysql-client python-virtualenv python-mysqldb redis-server python-lxml postfix python-dev libmysqlclient-dev
sudo mkdir /opt/regluit
sudo chown ubuntu:ubuntu /opt/regluit
cd /opt
git config --global user.name "Raymond Yee"
git config --global user.email "rdhyee@gluejar.com"
ssh-keygen
add ~/.ssh/id\_rsa.pub as a deploy key on github https://github.com/Gluejar/regluit/admin/keys
git clone git@github.com:Gluejar/regluit.git
cd /opt/regluit
create an Amazon RDS instance
connect to it, e.g. mysql -u root -h gluejardb.cboagmr25pjs.us-east-1.rds.amazonaws.com -p
CREATE DATABASE unglueit CHARSET utf8;
GRANT ALL ON unglueit.\* TO ‘unglueit’@’ip-10-244-250-168.ec2.internal’ IDENTIFIED BY 'unglueit' REQUIRE SSL;
update settings/prod.py with database credentials
virtualenv ENV
source ENV/bin/activate
pip install -r requirements_versioned.pip
echo "/opt/" > ENV/lib/python2.7/site-packages/regluit.pth
django-admin.py syncdb --migrate --settings regluit.settings.prod
sudo mkdir /var/www/static
sudo chown ubuntu:ubuntu /var/www/static
django-admin.py collectstatic --settings regluit.settings.prod
sudo ln -s /opt/regluit/deploy/regluit.conf /etc/apache2/sites-available/regluit
sudo a2ensite regluit
sudo a2enmod ssl rewrite
cd /home/ubuntu
copy SSL server key to /etc/ssl/private/server.key
copy SSL certificate to /etc/ssl/certs/server.crt
sudo /etc/init.d/apache2 restart
sudo adduser --no-create-home celery --disabled-password --disabled-login (just enter return for all?)
sudo cp deploy/celeryd /etc/init.d/celeryd
sudo chmod 755 /etc/init.d/celeryd
sudo cp deploy/celeryd.conf /etc/default/celeryd
sudo mkdir /var/log/celery
sudo mkdir /var/run/celery
sudo chown celery:celery /var/log/celery /var/run/celery
sudo /etc/init.d/celeryd start
sudo cp deploy/celerybeat /etc/init.d/celerybeat
sudo chmod 755 /etc/init.d/celerybeat
sudo cp deploy/celerybeat.conf /etc/default/celerybeat
sudo mkdir /var/log/celerybeat
sudo chown celery:celery /var/log/celerybeat
sudo /etc/init.d/celerybeat start

setup to enable ckeditor to work properly

mkdir /var/www/static/media/
sudo chown ubuntu:www-data /var/www/static/media/

Updating Production

Study the latest changes in the master branch, especially keep in mind how it has changed from what's in production.
Update the production branch accordingly. If everything in master is ready to be moved into production, you can just merge master into production. Otherwise, you can grab specific parts. (How to do so is something that should probably be described in greater detail.)
Login to unglue.it and run /opt/regluit/deploy/update-prod

OS X Developer Notes

To run regluit on OS X you should have XCode installed

Install virtualenvwrapper according to the process at http://blog.praveengollakota.com/47430655:

sudo easy\_install pip
sudo pip install virtualenv
pip install virtualenvwrapper

Edit or create .bashrc in ~ to enable virtualenvwrapper commands:

mkdir ~/.virtualenvs
Edit .bashrc to include the following lines:

export WORKON_HOME=$HOME/.virtualenvs source your_path_to_virtualenvwrapper.sh_here

In the above web site, the path to virtualenvwrapper.sh was /Library/Frameworks/Python.framework/Versions/2.7/bin/virtualenvwrapper.sh In Snow Leopard, this may be /usr/local/bin/virtualenvwrapper.sh

Configure Terminal to automatically notice this at startup: Terminal –> Preferences –> Settings –> Shell Click "run command"; add source ~/.bashrc

If you get 'EnvironmentError: mysql_config not found' edit the line ~/.virtualenvs/regluit/build/MySQL-python/setup_posix.py

mysql_config.path = "mysql_config" to be (using a path that exists on your system)
mysql_config.path = "/usr/local/mysql-5.5.20-osx10.6-x86_64/bin/mysql_config"

You may need to set utf8 in /etc/my.cnf collation-server = utf8_unicode_ci

init-connect='SET NAMES utf8'
character-set-server = utf8

Selenium Install

Download the selenium server: http://selenium.googlecode.com/files/selenium-server-standalone-2.5.0.jar

Start the selenium server: 'java -jar selenium-server-standalone-2.5.0.jar'

MARC Records

For unglued books with existing print edition MARC records

Get the MARCXML record for the print edition from the Library of Congress.
1. Find the book in catalog.loc.gov
2. Click on the permalink in its record (will look something like lccn.loc.gov/2009009516)
3. Download MARCXML
At /marc/ungluify/ , enter the unglued edition in the Edition field, upload file, choose license
The XML record will be automatically...
- converted to suitable MARCXML and .mrc records, with both direct and via-unglue.it download links
- written to S3
- added to a new instance of MARCRecord
- provided to ungluers at /marc/

For CC/PD books with existing records that link to the ebook edition

Use /admin to create a new MARC record instance
Upload the MARC records to s3 (or wherever)
Add the URLs of the .xml and/or .mrc record(s) to the appropriate field(s)
Select the relevant edition
Select an appropriate marc_format:
- use DIRECT if it links directly to the ebook file
- use UNGLUE if it links to the unglue.it download page
- if you have records with both DIRECT and UNGLUE links, you'll need two MARCRecord instances
- if you have both kinds of link, put them in separate records, as marc_format can only take one value
  ungluify_record.py should only be used to modify records of print editions of unglued ebooks. It will not produce appropriate results for CC/PD ebooks.

For unglued ebooks without print edition MARC records, or CC/PD books without ebook MARC records

Get a contract cataloger to produce quality records (.xml and .mrc formats)
- we are using ungas the format for our accession numbers, where is the id of the MARCRecord instance, plus leading zeroes
Upload those records to s3 (or wherever)
Create a MARCRecord instance in /admin
Add the URLs of the .xml and .mrc records to the appropriate fields
Select the relevant edition
Select an appropriate marc_format:
- use DIRECT if it links directly to the ebook file
- use UNGLUE if it links to the unglue.it download page
- if you have records with both DIRECT and UNGLUE links, you'll need two MARCRecord instances
- if you have both kinds of link, put them in separate records, as marc_format can only take one value

README.md Unescape Escape