773 lines
25 KiB
Plaintext
773 lines
25 KiB
Plaintext
{
|
|
"metadata": {
|
|
"name": "",
|
|
"signature": "sha256:79e7f4505df4df7b3f16885d4d975832795dea0dd4b4d0790179dc25b15f8eee"
|
|
},
|
|
"nbformat": 3,
|
|
"nbformat_minor": 0,
|
|
"worksheets": [
|
|
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"\n",
|
|
"Let me see some examples of OPDS in the wild to see how it works:\n",
|
|
"\n",
|
|
"available feeds: https://code.google.com/p/openpub/wiki/AvailableFeeds\n",
|
|
"\n",
|
|
"let's look at archive.org, which presumably should have a good feed\n",
|
|
"\n",
|
|
"* archive.org: http://bookserver.archive.org/catalog/\n",
|
|
"* feedbooks.com: http://www.feedbooks.com/catalog.atom\n",
|
|
"* oreilly.com: http://opds.oreilly.com/opds/\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Some concepts"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"http://www.slideshare.net/fullscreen/HadrienGardeur/understanding-opds/7\n",
|
|
"\n",
|
|
"OPDS is based on\n",
|
|
"\n",
|
|
"* resources\n",
|
|
"* collections \n",
|
|
"\n",
|
|
"A collection aggregates resources.\n",
|
|
"\n",
|
|
"Two kinds of resources:\n",
|
|
"\n",
|
|
"* Navigation link \n",
|
|
"* Catalog entry \n",
|
|
"\n",
|
|
"for two kinds of collections:\n",
|
|
"\n",
|
|
"* Navigation \n",
|
|
"* Acquisition"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Acquisition scenarios"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Multiple acquisition scenarios:\n",
|
|
" \n",
|
|
"* Open Access\n",
|
|
"* Sale\n",
|
|
"* Lending\n",
|
|
"* Subscription\n",
|
|
"* Extract\n",
|
|
"* Undefined"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"import requests\n",
|
|
"from lxml.etree import fromstring\n",
|
|
"\n",
|
|
"ATOM_NS = \"http://www.w3.org/2005/Atom\"\n",
|
|
"\n",
|
|
"def nsq(url, tag):\n",
|
|
" return \"{\" + url +\"}\" + tag\n",
|
|
"\n",
|
|
"url = \"http://bookserver.archive.org/catalog/\"\n",
|
|
" \n",
|
|
"r = requests.get(url)"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"doc=fromstring(r.text)\n",
|
|
"doc"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"# get links\n",
|
|
"# what types specified in spec?\n",
|
|
"\n",
|
|
"[link.attrib for link in doc.findall(nsq(ATOM_NS,'link'))]"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"it might be useful to use specialized libraries to handle Atom or AtomPub."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"doc.findall(nsq(ATOM_NS, \"entry\"))"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Atom feed generation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"https://github.com/sramana/pyatom\n",
|
|
"\n",
|
|
" pip install pyatom"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"# let's try the basics of pyatom\n",
|
|
"# puzzled wwhere <links> come from.\n",
|
|
"\n",
|
|
"from pyatom import AtomFeed\n",
|
|
"import datetime\n",
|
|
"\n",
|
|
"feed = AtomFeed(title=\"Unglue.it\",\n",
|
|
" subtitle=\"Unglue.it OPDS Navigation\",\n",
|
|
" feed_url=\"https://unglue.it/opds\",\n",
|
|
" url=\"https://unglue.it/\",\n",
|
|
" author=\"unglue.it\")\n",
|
|
"\n",
|
|
"# Do this for each feed entry\n",
|
|
"feed.add(title=\"My Post\",\n",
|
|
" content=\"Body of my post\",\n",
|
|
" content_type=\"html\",\n",
|
|
" author=\"Me\",\n",
|
|
" url=\"http://example.org/entry1\",\n",
|
|
" updated=datetime.datetime.utcnow())\n",
|
|
"\n",
|
|
"print feed.to_string()"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Creating navigation feed"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"template: https://gist.github.com/rdhyee/94d82f6639809fb7796f#file-unglueit_nav_opds-xml"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"````xml\n",
|
|
"<feed xmlns:dcterms=\"http://purl.org/dc/terms/\" xmlns:opds=\"http://opds-spec.org/\"\n",
|
|
" xmlns=\"http://www.w3.org/2005/Atom\"\n",
|
|
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n",
|
|
" xsi:noNamespaceSchemaLocation=\"http://www.kbcafe.com/rss/atom.xsd.xml\">\n",
|
|
" <title>Unglue.it Catalog</title>\n",
|
|
" <id>https://unglue.it/opds</id>\n",
|
|
" <updated>2014-06-13T21:48:34Z</updated>\n",
|
|
" <author>\n",
|
|
" <name>unglue.it</name>\n",
|
|
" <uri>https://unglue.it</uri>\n",
|
|
" </author>\n",
|
|
" <!-- crawlable link in archive.org (optional for unglue.it) -->\n",
|
|
" <link rel=\"http://opds-spec.org/crawlable\" type=\"application/atom+xml;profile=opds-catalog;kind=acquisition\" href=\"https://unglue.it/opds/crawlable\" title=\"Crawlable feed\"/>\n",
|
|
" <link rel=\"start\" href=\"https://unglue.it/opds\" type=\"application/atom+xml;profile=opds-catalog;kind=navigation\" />\n",
|
|
" <entry>\n",
|
|
" <title>Creative Commons</title>\n",
|
|
" <id>https://unglue.it/creativecommons/</id>\n",
|
|
" <updated>2014-06-13T00:00:00Z</updated>\n",
|
|
" <link href=\"creativecommons.xml\" type=\"application/atom+xml;profile=opds-catalog;kind=acquisition\" />\n",
|
|
" <content>These Creative Commons licensed ebooks are ready to read - the people who created them want you to read and share them..</content>\n",
|
|
" </entry>\n",
|
|
" <entry>\n",
|
|
" <title>Active Campaigns</title>\n",
|
|
" <id>https://unglue.it/campaigns/ending#2</id>\n",
|
|
" <updated>2014-06-13T00:00:00Z</updated>\n",
|
|
" <link href=\"active_campaigns.xml\" type=\"application/atom+xml;profile=opds-catalog;kind=acquisition\"/>\n",
|
|
" <content>With your help we're raising money to make these books free to the world.</content>\n",
|
|
" </entry>\n",
|
|
"</feed>````"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"from lxml import etree\n",
|
|
"import datetime\n",
|
|
"import pytz\n",
|
|
"\n",
|
|
"def text_node(tag, text):\n",
|
|
" node = etree.Element(tag)\n",
|
|
" node.text = text\n",
|
|
" return node\n",
|
|
"\n",
|
|
"def entry_node(title, id_, updated, link_href, link_type, content):\n",
|
|
" node = etree.Element(\"entry\")\n",
|
|
" node.append(text_node(\"title\", title))\n",
|
|
" node.append(text_node(\"id\", id_))\n",
|
|
" node.append(text_node(\"updated\", updated))\n",
|
|
" \n",
|
|
" link_node = etree.Element(\"link\")\n",
|
|
" link_node.attrib.update({'href':link_href, 'type':link_type})\n",
|
|
" node.append(link_node)\n",
|
|
" \n",
|
|
" node.append(text_node(\"content\", content))\n",
|
|
" return node\n",
|
|
"\n",
|
|
"feed_xml = \"\"\"<feed xmlns:dcterms=\"http://purl.org/dc/terms/\" \n",
|
|
" xmlns:opds=\"http://opds-spec.org/\"\n",
|
|
" xmlns=\"http://www.w3.org/2005/Atom\"\n",
|
|
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n",
|
|
" xsi:noNamespaceSchemaLocation=\"http://www.kbcafe.com/rss/atom.xsd.xml\"\n",
|
|
" xsi:schemaLocation=\"http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd\"/>\"\"\"\n",
|
|
"\n",
|
|
"feed = etree.fromstring(feed_xml)\n",
|
|
"\n",
|
|
"# add title\n",
|
|
"\n",
|
|
"feed.append(text_node('title', \"Unglue.it Catalog\"))\n",
|
|
"\n",
|
|
"# id \n",
|
|
"\n",
|
|
"feed.append(text_node('id', \"https://unglue.it/opds\"))\n",
|
|
"\n",
|
|
"# updated\n",
|
|
"\n",
|
|
"feed.append(text_node('updated',\n",
|
|
" pytz.utc.localize(datetime.datetime.utcnow()).isoformat()))\n",
|
|
"\n",
|
|
"# author\n",
|
|
"\n",
|
|
"author_node = etree.Element(\"author\")\n",
|
|
"author_node.append(text_node('name', 'unglue.it'))\n",
|
|
"author_node.append(text_node('uri', 'https://unglue.it'))\n",
|
|
"feed.append(author_node)\n",
|
|
"\n",
|
|
"# start link\n",
|
|
"\n",
|
|
"start_link = etree.Element(\"link\")\n",
|
|
"start_link.attrib.update({\"rel\":\"start\",\n",
|
|
" \"href\":\"https://unglue.it/opds\",\n",
|
|
" \"type\":\"application/atom+xml;profile=opds-catalog;kind=navigation\",\n",
|
|
"})\n",
|
|
"feed.append(start_link)\n",
|
|
"\n",
|
|
"# crawlable link\n",
|
|
"\n",
|
|
"crawlable_link = etree.Element(\"link\")\n",
|
|
"crawlable_link.attrib.update({\"rel\":\"http://opds-spec.org/crawlable\", \n",
|
|
" \"href\":\"https://unglue.it/opds/crawlable\",\n",
|
|
" \"type\":\"application/atom+xml;profile=opds-catalog;kind=acquisition\",\n",
|
|
" \"title\":\"Crawlable feed\"})\n",
|
|
"feed.append(crawlable_link)\n",
|
|
"\n",
|
|
"# CC entry_node\n",
|
|
"\n",
|
|
"cc_entry = entry_node(title=\"Creative Commons\",\n",
|
|
" id_=\"https://unglue.it/creativecommons/\",\n",
|
|
" updated=\"2014-06-13T00:00:00Z\",\n",
|
|
" link_href=\"creativecommons.xml\",\n",
|
|
" link_type=\"application/atom+xml;profile=opds-catalog;kind=acquisition\",\n",
|
|
" content=\"These Creative Commons licensed ebooks are ready to read - the people who created them want you to read and share them..\")\n",
|
|
"feed.append(cc_entry)\n",
|
|
"\n",
|
|
"print etree.tostring(feed, pretty_print=True)\n"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Writing Crawlable Feed"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"````xml\n",
|
|
"<feed xmlns:dcterms=\"http://purl.org/dc/terms/\" xmlns:opds=\"http://opds-spec.org/\"\n",
|
|
" xmlns=\"http://www.w3.org/2005/Atom\"\n",
|
|
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n",
|
|
" xsi:noNamespaceSchemaLocation=\"http://www.kbcafe.com/rss/atom.xsd.xml\"\n",
|
|
" xsi:schemaLocation=\"http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd\"> \n",
|
|
" <title>Unglue.it Catalog -- 1 to 1 of 2000 -- crawlable feed</title>\n",
|
|
" <id>https://unglue.it/opds/crawlable</id>\n",
|
|
" <updated>2014-06-16T00:00:00Z</updated>\n",
|
|
" <link rel=\"start\" href=\"https://unglue.it/opds\" type=\"application/atom+xml;profile=opds-catalog;kind=navigation\" />\n",
|
|
" <link rel=\"self\" type=\"application/atom+xml;profile=opds-catalog;kind=acquisition\" href=\"https://unglue.it/opds/crawlable\"/>\n",
|
|
" <author>\n",
|
|
" <name>unglue.it</name>\n",
|
|
" <uri>https://unglue.it</uri>\n",
|
|
" </author>\n",
|
|
" <link rel=\"next\" type=\"application/atom+xml;profile=opds-catalog;kind=acquisition\" href=\"/opds/crawlable/1\" title=\"Next results\"/>\n",
|
|
" <entry>\n",
|
|
" <title>Oral Literature In Africa</title>\n",
|
|
" <id>https://unglue.it/work/81834/</id>\n",
|
|
" <updated>2013-07-17T23:27:37Z</updated>\n",
|
|
" <link href=\"https://unglue.it/download_ebook/904/\" type=\"application/pdf\" rel=\"http://opds-spec.org/acquisition\"/>\n",
|
|
" <link href=\"https://unglue.it/download_ebook/905/\" type=\"application/epub+zip\" rel=\"http://opds-spec.org/acquisition\"/>\n",
|
|
" <link href=\"https://unglue.it/download_ebook/906/\" type=\"application/x-mobipocket-ebook\" rel=\"http://opds-spec.org/acquisition\"/>\n",
|
|
" <link href=\"https://unglueit.files.wordpress.com/2012/05/olacover_thumbnail.jpg\" type=\"image/jpeg\" rel=\"http://opds-spec.org/image/thumbnail\"/>\n",
|
|
" <dcterms:issued>2012</dcterms:issued>\n",
|
|
" <author>\n",
|
|
" <name>Ruth Finnegan</name>\n",
|
|
" </author>\n",
|
|
" <category term=\"Africa\"/>\n",
|
|
" <category term=\"African Folk literature\"/>\n",
|
|
" <category term=\"Folk literature\"/>\n",
|
|
" <dcterms:publisher>Open Book Publishers</dcterms:publisher>\n",
|
|
" <dcterms:language>en</dcterms:language>\n",
|
|
" <content type=\"html\"></content>\n",
|
|
" </entry>\n",
|
|
"</feed>\n",
|
|
"````"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"# crawlable feed\n",
|
|
"\n",
|
|
"from itertools import islice\n",
|
|
"\n",
|
|
"from lxml import etree\n",
|
|
"import datetime\n",
|
|
"import urlparse\n",
|
|
"\n",
|
|
"import pytz\n",
|
|
"\n",
|
|
"from regluit.core import models\n",
|
|
"import regluit.core.cc as cc\n",
|
|
"\n",
|
|
"licenses = cc.LICENSE_LIST\n",
|
|
"\n",
|
|
"FORMAT_TO_MIMETYPE = {'pdf':\"application/pdf\",\n",
|
|
" 'epub':\"application/epub+zip\",\n",
|
|
" 'mobi':\"application/x-mobipocket-ebook\",\n",
|
|
" 'html':\"text/html\",\n",
|
|
" 'text':\"text/html\"}\n",
|
|
"\n",
|
|
"def text_node(tag, text):\n",
|
|
" node = etree.Element(tag)\n",
|
|
" node.text = text\n",
|
|
" return node\n",
|
|
"\n",
|
|
"def map_to_unglueit(url):\n",
|
|
" m = list(urlparse.urlparse(url))\n",
|
|
" (m[0], m[1]) = ('https','unglue.it')\n",
|
|
" return urlparse.urlunparse(m)\n",
|
|
"\n",
|
|
"def work_node(work):\n",
|
|
" node = etree.Element(\"entry\")\n",
|
|
" # title\n",
|
|
" node.append(text_node(\"title\", work.title))\n",
|
|
" \n",
|
|
" # id\n",
|
|
" node.append(text_node('id', \"https://unglue.it{0}\".format(work.get_absolute_url())))\n",
|
|
" \n",
|
|
" # updated -- using creation date\n",
|
|
" node.append(text_node('updated', work.created.isoformat()))\n",
|
|
" \n",
|
|
" # links for all ebooks\n",
|
|
" \n",
|
|
" for ebook in work.ebooks():\n",
|
|
" link_node = etree.Element(\"link\")\n",
|
|
" link_node.attrib.update({\"href\":map_to_unglueit(ebook.download_url),\n",
|
|
" \"type\":FORMAT_TO_MIMETYPE.get(ebook.format, \"\"),\n",
|
|
" \"rel\":\"http://opds-spec.org/acquisition\"})\n",
|
|
" node.append(link_node)\n",
|
|
" \n",
|
|
" # get the cover -- assume jpg?\n",
|
|
" \n",
|
|
" cover_node = etree.Element(\"link\")\n",
|
|
" cover_node.attrib.update({\"href\":work.cover_image_small(),\n",
|
|
" \"type\":\"image/jpeg\",\n",
|
|
" \"rel\":\"http://opds-spec.org/image/thumbnail\"})\n",
|
|
" node.append(cover_node)\n",
|
|
" \n",
|
|
" # <dcterms:issued>2012</dcterms:issued>\n",
|
|
" node.append(text_node(\"{http://purl.org/dc/terms/}issued\", work.publication_date_year))\n",
|
|
" \n",
|
|
" # author\n",
|
|
" # TO DO: include all authors?\n",
|
|
" author_node = etree.Element(\"author\")\n",
|
|
" author_node.append(text_node(\"name\", work.author()))\n",
|
|
" node.append(author_node)\n",
|
|
" \n",
|
|
" # publisher\n",
|
|
" #<dcterms:publisher>Open Book Publishers</dcterms:publisher>\n",
|
|
" if len(work.publishers()):\n",
|
|
" for publisher in work.publishers():\n",
|
|
" node.append(text_node(\"{http://purl.org/dc/terms/}issued\", publisher.name.name))\n",
|
|
" \n",
|
|
" # language\n",
|
|
" #<dcterms:language>en</dcterms:language>\n",
|
|
" node.append(text_node(\"{http://purl.org/dc/terms/}language\", work.language))\n",
|
|
"\n",
|
|
" # subject tags\n",
|
|
" # [[subject.name for subject in work.subjects.all()] for work in ccworks if work.subjects.all()]\n",
|
|
" if work.subjects.all():\n",
|
|
" for subject in work.subjects.all():\n",
|
|
" category_node = etree.Element(\"category\")\n",
|
|
" category_node.attrib[\"term\"] = subject.name \n",
|
|
" node.append(category_node)\n",
|
|
" \n",
|
|
" return node\n",
|
|
"\n",
|
|
"feed_xml = \"\"\"<feed xmlns:dcterms=\"http://purl.org/dc/terms/\" \n",
|
|
" xmlns:opds=\"http://opds-spec.org/\"\n",
|
|
" xmlns=\"http://www.w3.org/2005/Atom\"\n",
|
|
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n",
|
|
" xsi:noNamespaceSchemaLocation=\"http://www.kbcafe.com/rss/atom.xsd.xml\"\n",
|
|
" xsi:schemaLocation=\"http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd\"/>\"\"\"\n",
|
|
"\n",
|
|
"feed = etree.fromstring(feed_xml)\n",
|
|
"\n",
|
|
"# add title\n",
|
|
"# TO DO: will need to calculate the number items and where in the feed we are\n",
|
|
"\n",
|
|
"feed.append(text_node('title', \"Unglue.it Catalog: crawlable feed\"))\n",
|
|
"\n",
|
|
"# id \n",
|
|
"\n",
|
|
"feed.append(text_node('id', \"https://unglue.it/opds/crawlable\"))\n",
|
|
"\n",
|
|
"# updated\n",
|
|
"# TO DO: fix time zone?\n",
|
|
"\n",
|
|
"feed.append(text_node('updated',\n",
|
|
" pytz.utc.localize(datetime.datetime.utcnow()).isoformat()))\n",
|
|
"\n",
|
|
"# author\n",
|
|
"\n",
|
|
"author_node = etree.Element(\"author\")\n",
|
|
"author_node.append(text_node('name', 'unglue.it'))\n",
|
|
"author_node.append(text_node('uri', 'https://unglue.it'))\n",
|
|
"feed.append(author_node)\n",
|
|
"\n",
|
|
"# links: start, self, next/prev (depending what's necessary -- to start with put all CC books)\n",
|
|
"\n",
|
|
"# start link\n",
|
|
"\n",
|
|
"start_link = etree.Element(\"link\")\n",
|
|
"start_link.attrib.update({\"rel\":\"start\",\n",
|
|
" \"href\":\"https://unglue.it/opds\",\n",
|
|
" \"type\":\"application/atom+xml;profile=opds-catalog;kind=navigation\",\n",
|
|
"})\n",
|
|
"feed.append(start_link)\n",
|
|
"\n",
|
|
"# self link\n",
|
|
"\n",
|
|
"self_link = etree.Element(\"link\")\n",
|
|
"self_link.attrib.update({\"rel\":\"self\",\n",
|
|
" \"href\":\"https://unglue.it/opds/crawlable\",\n",
|
|
" \"type\":\"application/atom+xml;profile=opds-catalog;kind=acquisition\",\n",
|
|
"})\n",
|
|
"feed.append(self_link)\n",
|
|
"\n",
|
|
"licenses = cc.LICENSE_LIST\n",
|
|
"\n",
|
|
"ccworks = models.Work.objects.filter(editions__ebooks__isnull=False, \n",
|
|
" editions__ebooks__rights__in=licenses).distinct().order_by('-created')\n",
|
|
"\n",
|
|
"for work in islice(ccworks,None):\n",
|
|
" node = work_node(work)\n",
|
|
" feed.append(node)\n",
|
|
"\n",
|
|
"print etree.tostring(feed, pretty_print=True)\n"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"# how to get CC books?\n",
|
|
"# make use of CCListView: https://github.com/Gluejar/regluit/blob/b675052736f79dcb8d84ddc6349c99fa392fa9bc/frontend/views.py#L878\n",
|
|
"# template: https://github.com/Gluejar/regluit/blob/b675052736f79dcb8d84ddc6349c99fa392fa9bc/frontend/templates/cc_list.html\n",
|
|
"\n",
|
|
"from regluit.core import models\n",
|
|
"import regluit.core.cc as cc\n",
|
|
"\n",
|
|
"licenses = cc.LICENSE_LIST\n",
|
|
"\n",
|
|
"ccworks = models.Work.objects.filter(editions__ebooks__isnull=False, \n",
|
|
" editions__ebooks__rights__in=licenses).distinct().order_by('-created')\n",
|
|
"ccworks"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"dir(ccworks[0])"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"work = ccworks[0]\n",
|
|
"ebook = work.ebooks()[0]\n",
|
|
"dir(ebook)"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"from collections import Counter\n",
|
|
"\n",
|
|
"c = Counter()\n",
|
|
"\n",
|
|
"for work in islice(ccworks,None):\n",
|
|
" c.update([ebook.format for ebook in work.ebooks()])\n",
|
|
" \n",
|
|
"print c\n",
|
|
"\n",
|
|
"#[[ebook.format for ebook in work.ebooks()] for work in islice(ccworks,1)]"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Calling regluit.core.opds code"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"from regluit.core import opds\n",
|
|
"opds.creativecommons()"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Dealing URLs of downloaded books"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"from regluit.core.models import Work\n",
|
|
"\n",
|
|
"work = Work.objects.get(id=137688)\n",
|
|
"[ebook.download_url for ebook in work.ebooks()]"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Tacking on a query component to a URL"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"import urllib\n",
|
|
"\n",
|
|
"def add_query_component(url, qc):\n",
|
|
" m = list(urlparse.urlparse(url))\n",
|
|
" if len(m[4]):\n",
|
|
" m[4] = \"&\".join([m[4],qc])\n",
|
|
" else:\n",
|
|
" m[4] = qc\n",
|
|
" return urlparse.urlunparse(m)\n",
|
|
"\n",
|
|
"add_query_component(\"https://unglue.it/download_ebook/906/\", \"feed=opds\")"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Getting works of active campaigns "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"campaigns = models.Campaign.objects.filter(status='ACTIVE').order_by('deadline')\n",
|
|
"campaigns"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"models.Work.objects.filter(campaigns__status='ACTIVE')"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"must exclude campaigns without ebooks"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"from regluit.core import models\n",
|
|
"from django.db.models import Q\n",
|
|
"\n",
|
|
"len(models.Work.objects.filter(campaigns__status='ACTIVE').order_by('-created'))"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"works = models.Work.objects.filter(campaigns__status='ACTIVE',\n",
|
|
" editions__ebooks__isnull=False).distinct().order_by('-created')\n",
|
|
"[w.ebooks() for w in works]"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"works = Work.objects.all()\n",
|
|
"work = works[0]\n",
|
|
"work.ebooks()"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"models.Work.objects.filter(work)"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"Appendix: dealing with namespaces in ElementTree"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Maybe come back to http://effbot.org/zone/element-namespaces.htm for more sophisticated ways to register namespaces."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {}
|
|
}
|
|
]
|
|
} |