Raymond Yee
|
ddd60f5a34
|
Deleting lt_data.json.gz from git repo
|
2012-02-27 14:16:23 -08:00 |
Raymond Yee
|
c98258e459
|
Add g_seed_isbn.json which hold the Gutenberg editions I'm loading.
|
2012-02-27 13:58:47 -08:00 |
Raymond Yee
|
446907109f
|
Compute similarity measures and allow filtering of Gutenberg editions by these measures
|
2012-02-27 13:58:34 -08:00 |
Raymond Yee
|
86fb15b8bc
|
Code to repick the seed isbn to find isbns that are more likely to be found in a wide variety of data sources
|
2012-02-27 13:58:17 -08:00 |
Raymond Yee
|
f7220d9812
|
Programs and data for fighting Frankenworks
|
2012-02-24 12:06:24 -08:00 |
Raymond Yee
|
1d001f33ba
|
Now I think I'm able to calculate the timedate of when the latest "frankenwork" merging is happening
|
2012-02-21 08:54:12 -08:00 |
Raymond Yee
|
e4c23500fb
|
Putting a copy of the Librarything data into the repo
|
2012-02-17 13:48:57 -08:00 |
Raymond Yee
|
2e079b2c2e
|
Now I have booktests to recalculate clusters
|
2012-02-17 10:30:09 -08:00 |
eric
|
471cb62fd2
|
changed core.tasks to not use models
|
2012-02-16 13:19:36 -05:00 |
Raymond Yee
|
a8f1c157be
|
Check current progress in so that I can focus on a change in the master branch to add missing isbns to Editions
|
2012-02-15 16:06:40 -08:00 |
Raymond Yee
|
9fb57a6b4e
|
At this point, I have logic in regluit.test.bookloader.load_gutenberg_books to read the data from regluit/experimental/gutenberg/g_seed_isbn.json and load books into the db. Still shaking out bugs from the process though.
|
2012-02-14 18:01:13 -08:00 |
Raymond Yee
|
c04aacec4a
|
Putting away my work for ry...hope it's ok
|
2012-02-13 11:28:21 -08:00 |
Raymond Yee
|
a9c91bf9c8
|
Changed the number of Gutenberg books to process
|
2012-02-11 18:01:37 -08:00 |
Raymond Yee
|
cfc3dd3549
|
Code that I'm now running in quasi-production on my laptop to compute the seed isbn. Let's see how it goes
|
2012-02-10 19:15:35 -08:00 |
Raymond Yee
|
b5c663f82f
|
basics of database structure for running through all the Gutenberg books.
Generating a report on each seed isbn calc
|
2012-02-10 10:56:08 -08:00 |
Raymond Yee
|
d3a183bc61
|
OK: I'm able to return a single candidate isbn seed now while at the same time caching the results
|
2012-02-08 14:28:46 -08:00 |
Raymond Yee
|
3bc5da4685
|
Now able to cluster isbns by language of work
|
2012-02-08 10:44:18 -08:00 |
Raymond Yee
|
d06ee6a67e
|
Progress towards calculating the seed isbn: calculating a union of Freebase + OpenLibrary ISBNs -- then clustered with thingisbn an feeding these ISBNs to Google Books
|
2012-02-07 22:52:50 -08:00 |
Raymond Yee
|
2d98cf9b0a
|
Now looking at thingisbn data and printing out more data from Google Books (publication data, publisher)
|
2012-02-03 10:08:48 -08:00 |
Raymond Yee
|
9cf875c62a
|
ol.xisbn working now. Running a test comparing OL, Freebase and Google Books on editions for Surfacing
|
2012-02-03 09:00:52 -08:00 |
Raymond Yee
|
6e5f52db4b
|
work in progress, especially openlibrary xisbn
|
2012-02-02 23:07:25 -08:00 |
Raymond Yee
|
6d6f9a2724
|
Small change to the basic hello world tests
|
2012-01-13 09:39:46 -08:00 |
Raymond Yee
|
a08944c465
|
Make sure there are creators before printing them
|
2012-01-13 09:36:40 -08:00 |
Raymond Yee
|
33b5548b41
|
Changing Zotero.items() -> Zotero.top() and put exception handling to see what does work vs what doesn't.
|
2012-01-13 09:20:54 -08:00 |
Raymond Yee
|
16d8716f87
|
Adding a "hello world" test file to test basic functionality of pyzotero
|
2012-01-13 07:37:25 -08:00 |
Raymond Yee
|
6c21074ee7
|
Added some comments to gutenberg.py
Trying to debug zotero_books.py -- pyzotero seems to be quite broken now
|
2012-01-12 17:52:10 -08:00 |
Ed Summers
|
55656e2d3d
|
now getting subjects from openlibrary instead of from googlebooks. You will need to APPLY MIGRATIONS!
|
2011-12-19 01:33:13 -05:00 |
Raymond Yee
|
4818e92ba2
|
Writing out the mapping of Gutenberg epub file to OpenLibrary workid
|
2011-12-12 10:49:33 -08:00 |
Raymond Yee
|
d1b58c89ad
|
Added bookdata.json_for_olid to pull out metadata for any given OpenLibrary ID (olid), including work, edition, author
Added map_refine_fb_links_to_openlibrary_work_ids in gutenberg.py to do the mapping of Freebase IDs -> OpenLibrary work ids and capture in database
|
2011-12-10 14:18:22 -08:00 |
eric
|
167dccf574
|
Wishlists are now filled using the Wishes intermediate table. It's named the same as previous intermediate table, and I've edited the migration so data is not lost.
Also, I've added methods od Wishlists to add and remove Works. There
are "source" and created columns on the Wishes table
|
2011-12-08 18:22:20 -05:00 |
Raymond Yee
|
a349cb0adf
|
Current results post-Refine processing of Gutenberg etext_id -> Freebase IDs (via Wikipedia links)
|
2011-12-05 09:47:52 -08:00 |
Raymond Yee
|
810e8ac3e7
|
Code so far to parse Project Gutenberg catalog, extract Wikipedia links, do some Google Refine munging, and then map Freebase ids to OpenLibrary Work IDs
|
2011-12-05 09:23:17 -08:00 |
Raymond Yee
|
e121e07e72
|
Added xisbn-like method based on Freebase data; Added a Freebase /book/book id to OpenLibrary work id mapper
|
2011-12-05 09:19:07 -08:00 |
Ed Summers
|
30e6dc38cd
|
experimental scripts to try to match metadata in oai-pmh feeds (online books page) to googlebooks
|
2011-12-04 21:45:53 -05:00 |
Raymond Yee
|
31edebe769
|
Fleshing out Freebase book data search
|
2011-11-09 09:09:58 -08:00 |
Raymond Yee
|
68b4da17d1
|
Some code for OpenLibrary, Freebase, HathiTrust to explore the nature of the data available in those sources
|
2011-11-06 07:55:07 -05:00 |
Raymond Yee
|
820107bd4d
|
Got oauth signing to work with goodreads reviews_list
|
2011-11-04 14:04:32 -07:00 |
Raymond Yee
|
29104f6347
|
Setting up an experimental folder to hold proof of concept code
|
2011-11-02 17:48:38 -07:00 |