Commit Graph

75 Commits (447ed4b2d5f1e30f804c9bdd189a2b50ab0ed3cc)

Author SHA1 Message Date
eric 447ed4b2d5 fix cover loading 2018-04-17 14:20:44 -04:00
eric 8dd1fb1822 remove doab author loader
now uses oai functionality
2018-04-16 13:44:10 -04:00
eric b849f3a6ef finish mapping languages 2018-04-16 12:32:21 -04:00
eric a6039e4015 better handling of language codes 2018-04-13 14:39:03 -04:00
eric e433c13108 fix online_to_download bugs 2018-04-13 14:38:39 -04:00
eric 9a6b1efd0d fix bugs for records with missing fields 2018-04-13 14:37:50 -04:00
eric ba7b02b939 add alternate url pattern for doab_id 2018-04-12 15:09:07 -04:00
eric bf7a9d8106 patch for missing language 2018-04-12 15:08:29 -04:00
eric 748b0eaa63 add test 2018-04-09 17:26:04 -04:00
eric c26e365a64 fixed imports 2018-04-09 16:58:58 -04:00
eric ca94c128de online to download handling
+ fix bug that made everythong 'online'
+ handle online ebooks with multiple format downloads
+ download ebooks with volatile links
+ move contenttyper to core.loaders.utils
+ add handling for really html ebooks
2018-04-09 16:32:52 -04:00
eric 07fd095b9a fix bugs 2018-04-09 11:54:16 -04:00
eric 0ba2906c62 delint 2018-04-07 18:38:33 -04:00
eric e03fa239b4 revamp doab loading
- doab loading now done primarily by oai, no processing of csv.
- added pyoai and updated lxml
- doab ids or urls in ebook submission now handled by oai scrape
- doab_load_books removed
- doab_utils moved from Gluejar/DOAB
- licenses now recognizes OpenEdition
- new ebook type "online" will implement in UI after mobile launch;
ebooks now creaded for html contenttype
2018-04-07 17:11:36 -04:00
eric 533eb94152 load springer improvements
We've loaded about half the Springer Open books catalog, adding 20
books at a time. I wanted to load page 23 of results without having to
load pages 1-22. Also added some exception handling.
2018-03-22 16:13:55 -04:00
eric ad9523314d fix bug in ubiquity scraper 2018-02-20 13:07:44 -05:00
eric 33f4b75417 stricter RE 2018-01-04 16:53:29 -05:00
eric ba381add02 add smashwords 2018-01-03 15:53:02 -05:00
eric 59388933a9 one scraper per file 2018-01-03 13:58:45 -05:00
eric e837dd6ff2 added date validation 2018-01-03 13:30:36 -05:00
eric c8837c3c74 make check_metas case insensitive for name 2018-01-03 11:54:48 -05:00
eric 3f3428a68b add some opengraph support 2018-01-02 18:20:34 -05:00
eric f1213d590c fix can_scrape 2018-01-01 19:25:00 -05:00
eric cf093c945d add some custom code for ubiquity press sites 2017-12-23 18:29:16 -05:00
eric e6dbae05db update springer 2017-12-23 18:15:59 -05:00
eric f701f1ba36 refactor can_scrape 2017-12-23 18:12:07 -05:00
eric d1cf6e6fb3 fix some scraping bugs 2017-12-15 19:26:50 -05:00
eric ebf68befeb add Springer publisher 2017-12-10 16:38:30 -05:00
eric 3c7c9ade00 add Springer to get_scraper 2017-12-07 17:36:35 -05:00
eric d53b3bcc8d delint 2017-12-07 17:36:08 -05:00
eric 5ccd7a0a47 add get_role to scraper 2017-12-07 17:35:52 -05:00
eric c6885ff84b fix springer descriptions 2017-12-07 16:35:11 -05:00
eric 81c3268f70 fix license url 2017-12-07 16:34:25 -05:00
eric 82784778c4 add springer scraper 2017-12-06 18:13:46 -05:00
eric 28fa60ffba fix cover finding 2017-11-21 11:10:46 -05:00
eric a09f3907b3 add pressbooks sites, improve pubdata scraper 2017-11-20 18:05:07 -05:00
eric 98cbef7104 gather isbns from schema.org
and stop raising unwanted exceptions
2017-11-06 12:42:52 -05:00
eric 6487916adb omit review metadata 2017-11-06 12:38:06 -05:00
eric b5e52effd9 optimize id access
See
https://docs.djangoproject.com/en/1.11/topics/db/optimization/#use-forei
gn-key-values-directly
2017-10-28 18:33:58 -04:00
eric 2a7773fafa add hathitrust scraper 2017-10-27 12:09:03 -04:00
eric f2fb171708 fix bug 2017-09-28 14:17:12 -04:00
eric fa4573a74d authlist cleaner, definition lists 2017-09-28 13:25:56 -04:00
eric 467ab8a425 add scraper selector 2017-09-27 19:20:14 -04:00
eric db03b59fb4 add code for pressbooks scraping 2017-09-27 17:54:44 -04:00
eric 1ce4323bc4 precheck every new subject
fix bug with '/' in subject
interpret ';' as list delimiter
add cleaner script
2017-09-15 15:55:37 -04:00
eric 5bbeb45053 improve merge_works
work_relations were not being updated
2017-09-04 16:10:24 -04:00
eric 6895302338 add OpenGraph type, title, and cover to scraper 2017-08-24 14:43:31 -04:00
eric e7847ae349 remove debug code 2017-08-23 12:24:04 -04:00
eric 0c687fdad4 add command to load from sitemaps 2017-08-23 12:21:56 -04:00
eric 1bd1f943f6 fix bug in edition assignment 2017-08-18 16:39:11 -04:00