Commit Graph

97 Commits (c1ecedcb58a977c8f51d1347fffcc1f88e4f763e)

Author SHA1 Message Date
eric b35aa2ce93 fix test failures for django 1.10.8 2018-07-22 13:14:27 -04:00
eric ee03d2d434 add hosts 2018-07-12 12:56:09 -04:00
eric da601a77f6 final fixes 2018-07-11 13:41:52 -04:00
eric 40794ee3f9 use rights info to set rights 2018-07-10 13:58:38 -04:00
eric ec3d26118e fr/en 2018-07-10 13:58:06 -04:00
eric 2f532b97f9 scrape multiple books from one url 2018-07-09 15:46:36 -04:00
eric 2f9dda8432 less agressive merging in doab 2018-06-18 17:04:40 -04:00
eric 3bc7d5c003 fix loader tests 2018-06-18 17:03:41 -04:00
eric 7593944dc0 reset default to 15 days 2018-06-15 15:30:04 -04:00
eric bade8e7f4d handle records without downloads 2018-06-15 10:34:23 -04:00
eric 05fae60ddb delint 2018-05-11 11:46:04 -04:00
eric db9b6e5221 harvest_online_ebooks should count books actually harvested 2018-05-10 16:17:16 -04:00
eric 6585bdd52a provide fallback for hathi scraper
It turns out http://hdl.handle.net/2027/ is used for all of umich, not
just hathitrust
2018-04-27 10:54:41 -04:00
eric a0dc106f6d fix issue with merged works coming back from related editions 2018-04-26 14:57:55 -04:00
eric 8d5da39e5f make populate edition synchronous for doab 2018-04-25 11:21:02 -04:00
eric fa82411921 don't load chapters 2018-04-23 15:41:42 -04:00
eric 6bca7f0983 bugs 2018-04-18 21:39:40 -04:00
eric bbd421d1f2 fix various bugs 2018-04-18 17:53:21 -04:00
eric c9e7d5d5ac avoid errors with using string methods on content_type 2018-04-18 14:56:26 -04:00
eric 3590c1a59f default load_doab to last 45 days 2018-04-18 14:53:42 -04:00
eric 1d6af73cf2 handle isbns separated by '/' 2018-04-18 11:29:57 -04:00
eric 78d66a247e don't fail if null edition 2018-04-17 14:21:21 -04:00
eric 447ed4b2d5 fix cover loading 2018-04-17 14:20:44 -04:00
eric 8dd1fb1822 remove doab author loader
now uses oai functionality
2018-04-16 13:44:10 -04:00
eric b849f3a6ef finish mapping languages 2018-04-16 12:32:21 -04:00
eric a6039e4015 better handling of language codes 2018-04-13 14:39:03 -04:00
eric e433c13108 fix online_to_download bugs 2018-04-13 14:38:39 -04:00
eric 9a6b1efd0d fix bugs for records with missing fields 2018-04-13 14:37:50 -04:00
eric ba7b02b939 add alternate url pattern for doab_id 2018-04-12 15:09:07 -04:00
eric bf7a9d8106 patch for missing language 2018-04-12 15:08:29 -04:00
eric 748b0eaa63 add test 2018-04-09 17:26:04 -04:00
eric c26e365a64 fixed imports 2018-04-09 16:58:58 -04:00
eric ca94c128de online to download handling
+ fix bug that made everythong 'online'
+ handle online ebooks with multiple format downloads
+ download ebooks with volatile links
+ move contenttyper to core.loaders.utils
+ add handling for really html ebooks
2018-04-09 16:32:52 -04:00
eric 07fd095b9a fix bugs 2018-04-09 11:54:16 -04:00
eric 0ba2906c62 delint 2018-04-07 18:38:33 -04:00
eric e03fa239b4 revamp doab loading
- doab loading now done primarily by oai, no processing of csv.
- added pyoai and updated lxml
- doab ids or urls in ebook submission now handled by oai scrape
- doab_load_books removed
- doab_utils moved from Gluejar/DOAB
- licenses now recognizes OpenEdition
- new ebook type "online" will implement in UI after mobile launch;
ebooks now creaded for html contenttype
2018-04-07 17:11:36 -04:00
eric 533eb94152 load springer improvements
We've loaded about half the Springer Open books catalog, adding 20
books at a time. I wanted to load page 23 of results without having to
load pages 1-22. Also added some exception handling.
2018-03-22 16:13:55 -04:00
eric ad9523314d fix bug in ubiquity scraper 2018-02-20 13:07:44 -05:00
eric 33f4b75417 stricter RE 2018-01-04 16:53:29 -05:00
eric ba381add02 add smashwords 2018-01-03 15:53:02 -05:00
eric 59388933a9 one scraper per file 2018-01-03 13:58:45 -05:00
eric e837dd6ff2 added date validation 2018-01-03 13:30:36 -05:00
eric c8837c3c74 make check_metas case insensitive for name 2018-01-03 11:54:48 -05:00
eric 3f3428a68b add some opengraph support 2018-01-02 18:20:34 -05:00
eric f1213d590c fix can_scrape 2018-01-01 19:25:00 -05:00
eric cf093c945d add some custom code for ubiquity press sites 2017-12-23 18:29:16 -05:00
eric e6dbae05db update springer 2017-12-23 18:15:59 -05:00
eric f701f1ba36 refactor can_scrape 2017-12-23 18:12:07 -05:00
eric d1cf6e6fb3 fix some scraping bugs 2017-12-15 19:26:50 -05:00
eric ebf68befeb add Springer publisher 2017-12-10 16:38:30 -05:00