Raymond Yee
|
359ff71984
|
Add g_seed_isbn.json which hold the Gutenberg editions I'm loading.
|
2012-02-27 13:19:58 -08:00 |
Raymond Yee
|
7c9b6f9eba
|
Compute similarity measures and allow filtering of Gutenberg editions by these measures
|
2012-02-27 12:12:06 -08:00 |
Raymond Yee
|
dcfc24e380
|
Code to repick the seed isbn to find isbns that are more likely to be found in a wide variety of data sources
|
2012-02-27 08:46:34 -08:00 |
Raymond Yee
|
a8f1c157be
|
Check current progress in so that I can focus on a change in the master branch to add missing isbns to Editions
|
2012-02-15 16:06:40 -08:00 |
Raymond Yee
|
9fb57a6b4e
|
At this point, I have logic in regluit.test.bookloader.load_gutenberg_books to read the data from regluit/experimental/gutenberg/g_seed_isbn.json and load books into the db. Still shaking out bugs from the process though.
|
2012-02-14 18:01:13 -08:00 |
Raymond Yee
|
a9c91bf9c8
|
Changed the number of Gutenberg books to process
|
2012-02-11 18:01:37 -08:00 |
Raymond Yee
|
cfc3dd3549
|
Code that I'm now running in quasi-production on my laptop to compute the seed isbn. Let's see how it goes
|
2012-02-10 19:15:35 -08:00 |
Raymond Yee
|
b5c663f82f
|
basics of database structure for running through all the Gutenberg books.
Generating a report on each seed isbn calc
|
2012-02-10 10:56:08 -08:00 |
Raymond Yee
|
d3a183bc61
|
OK: I'm able to return a single candidate isbn seed now while at the same time caching the results
|
2012-02-08 14:28:46 -08:00 |
Raymond Yee
|
3bc5da4685
|
Now able to cluster isbns by language of work
|
2012-02-08 10:44:18 -08:00 |
Raymond Yee
|
d06ee6a67e
|
Progress towards calculating the seed isbn: calculating a union of Freebase + OpenLibrary ISBNs -- then clustered with thingisbn an feeding these ISBNs to Google Books
|
2012-02-07 22:52:50 -08:00 |
Raymond Yee
|
6e5f52db4b
|
work in progress, especially openlibrary xisbn
|
2012-02-02 23:07:25 -08:00 |
Raymond Yee
|
6c21074ee7
|
Added some comments to gutenberg.py
Trying to debug zotero_books.py -- pyzotero seems to be quite broken now
|
2012-01-12 17:52:10 -08:00 |
Raymond Yee
|
4818e92ba2
|
Writing out the mapping of Gutenberg epub file to OpenLibrary workid
|
2011-12-12 10:49:33 -08:00 |
Raymond Yee
|
d1b58c89ad
|
Added bookdata.json_for_olid to pull out metadata for any given OpenLibrary ID (olid), including work, edition, author
Added map_refine_fb_links_to_openlibrary_work_ids in gutenberg.py to do the mapping of Freebase IDs -> OpenLibrary work ids and capture in database
|
2011-12-10 14:18:22 -08:00 |
Raymond Yee
|
a349cb0adf
|
Current results post-Refine processing of Gutenberg etext_id -> Freebase IDs (via Wikipedia links)
|
2011-12-05 09:47:52 -08:00 |
Raymond Yee
|
810e8ac3e7
|
Code so far to parse Project Gutenberg catalog, extract Wikipedia links, do some Google Refine munging, and then map Freebase ids to OpenLibrary Work IDs
|
2011-12-05 09:23:17 -08:00 |