regluit

Commit Graph

Author	SHA1	Message	Date
Raymond Yee	359ff71984	Add g_seed_isbn.json which hold the Gutenberg editions I'm loading.	2012-02-27 13:19:58 -08:00
Raymond Yee	7c9b6f9eba	Compute similarity measures and allow filtering of Gutenberg editions by these measures	2012-02-27 12:12:06 -08:00
Raymond Yee	dcfc24e380	Code to repick the seed isbn to find isbns that are more likely to be found in a wide variety of data sources	2012-02-27 08:46:34 -08:00
Raymond Yee	a8f1c157be	Check current progress in so that I can focus on a change in the master branch to add missing isbns to Editions	2012-02-15 16:06:40 -08:00
Raymond Yee	9fb57a6b4e	At this point, I have logic in regluit.test.bookloader.load_gutenberg_books to read the data from regluit/experimental/gutenberg/g_seed_isbn.json and load books into the db. Still shaking out bugs from the process though.	2012-02-14 18:01:13 -08:00
Raymond Yee	a9c91bf9c8	Changed the number of Gutenberg books to process	2012-02-11 18:01:37 -08:00
Raymond Yee	cfc3dd3549	Code that I'm now running in quasi-production on my laptop to compute the seed isbn. Let's see how it goes	2012-02-10 19:15:35 -08:00
Raymond Yee	b5c663f82f	basics of database structure for running through all the Gutenberg books. Generating a report on each seed isbn calc	2012-02-10 10:56:08 -08:00
Raymond Yee	d3a183bc61	OK: I'm able to return a single candidate isbn seed now while at the same time caching the results	2012-02-08 14:28:46 -08:00
Raymond Yee	3bc5da4685	Now able to cluster isbns by language of work	2012-02-08 10:44:18 -08:00
Raymond Yee	d06ee6a67e	Progress towards calculating the seed isbn: calculating a union of Freebase + OpenLibrary ISBNs -- then clustered with thingisbn an feeding these ISBNs to Google Books	2012-02-07 22:52:50 -08:00
Raymond Yee	6e5f52db4b	work in progress, especially openlibrary xisbn	2012-02-02 23:07:25 -08:00
Raymond Yee	6c21074ee7	Added some comments to gutenberg.py Trying to debug zotero_books.py -- pyzotero seems to be quite broken now	2012-01-12 17:52:10 -08:00
Raymond Yee	4818e92ba2	Writing out the mapping of Gutenberg epub file to OpenLibrary workid	2011-12-12 10:49:33 -08:00
Raymond Yee	d1b58c89ad	Added bookdata.json_for_olid to pull out metadata for any given OpenLibrary ID (olid), including work, edition, author Added map_refine_fb_links_to_openlibrary_work_ids in gutenberg.py to do the mapping of Freebase IDs -> OpenLibrary work ids and capture in database	2011-12-10 14:18:22 -08:00
Raymond Yee	a349cb0adf	Current results post-Refine processing of Gutenberg etext_id -> Freebase IDs (via Wikipedia links)	2011-12-05 09:47:52 -08:00
Raymond Yee	810e8ac3e7	Code so far to parse Project Gutenberg catalog, extract Wikipedia links, do some Google Refine munging, and then map Freebase ids to OpenLibrary Work IDs	2011-12-05 09:23:17 -08:00

17 Commits (63143bf860e225188956ca47dc8112dbfb7be629)