Commit Graph

17 Commits (cb9eb1e24eb88fa26be6a4de5fec16994666c337)

Author SHA1 Message Date
Raymond Yee 359ff71984 Add g_seed_isbn.json which hold the Gutenberg editions I'm loading. 2012-02-27 13:19:58 -08:00
Raymond Yee 7c9b6f9eba Compute similarity measures and allow filtering of Gutenberg editions by these measures 2012-02-27 12:12:06 -08:00
Raymond Yee dcfc24e380 Code to repick the seed isbn to find isbns that are more likely to be found in a wide variety of data sources 2012-02-27 08:46:34 -08:00
Raymond Yee a8f1c157be Check current progress in so that I can focus on a change in the master branch to add missing isbns to Editions 2012-02-15 16:06:40 -08:00
Raymond Yee 9fb57a6b4e At this point, I have logic in regluit.test.bookloader.load_gutenberg_books to read the data from regluit/experimental/gutenberg/g_seed_isbn.json and load books into the db. Still shaking out bugs from the process though. 2012-02-14 18:01:13 -08:00
Raymond Yee a9c91bf9c8 Changed the number of Gutenberg books to process 2012-02-11 18:01:37 -08:00
Raymond Yee cfc3dd3549 Code that I'm now running in quasi-production on my laptop to compute the seed isbn. Let's see how it goes 2012-02-10 19:15:35 -08:00
Raymond Yee b5c663f82f basics of database structure for running through all the Gutenberg books.
Generating a report on each seed isbn calc
2012-02-10 10:56:08 -08:00
Raymond Yee d3a183bc61 OK: I'm able to return a single candidate isbn seed now while at the same time caching the results 2012-02-08 14:28:46 -08:00
Raymond Yee 3bc5da4685 Now able to cluster isbns by language of work 2012-02-08 10:44:18 -08:00
Raymond Yee d06ee6a67e Progress towards calculating the seed isbn: calculating a union of Freebase + OpenLibrary ISBNs -- then clustered with thingisbn an feeding these ISBNs to Google Books 2012-02-07 22:52:50 -08:00
Raymond Yee 6e5f52db4b work in progress, especially openlibrary xisbn 2012-02-02 23:07:25 -08:00
Raymond Yee 6c21074ee7 Added some comments to gutenberg.py
Trying to debug zotero_books.py -- pyzotero seems to be quite broken now
2012-01-12 17:52:10 -08:00
Raymond Yee 4818e92ba2 Writing out the mapping of Gutenberg epub file to OpenLibrary workid 2011-12-12 10:49:33 -08:00
Raymond Yee d1b58c89ad Added bookdata.json_for_olid to pull out metadata for any given OpenLibrary ID (olid), including work, edition, author
Added map_refine_fb_links_to_openlibrary_work_ids in gutenberg.py to do the mapping of Freebase IDs -> OpenLibrary work ids and capture in database
2011-12-10 14:18:22 -08:00
Raymond Yee a349cb0adf Current results post-Refine processing of Gutenberg etext_id -> Freebase IDs (via Wikipedia links) 2011-12-05 09:47:52 -08:00
Raymond Yee 810e8ac3e7 Code so far to parse Project Gutenberg catalog, extract Wikipedia links, do some Google Refine munging, and then map Freebase ids to OpenLibrary Work IDs 2011-12-05 09:23:17 -08:00