eric
447ed4b2d5
fix cover loading
2018-04-17 14:20:44 -04:00
eric
8dd1fb1822
remove doab author loader
...
now uses oai functionality
2018-04-16 13:44:10 -04:00
eric
b849f3a6ef
finish mapping languages
2018-04-16 12:32:21 -04:00
eric
a6039e4015
better handling of language codes
2018-04-13 14:39:03 -04:00
eric
e433c13108
fix online_to_download bugs
2018-04-13 14:38:39 -04:00
eric
9a6b1efd0d
fix bugs for records with missing fields
2018-04-13 14:37:50 -04:00
eric
ba7b02b939
add alternate url pattern for doab_id
2018-04-12 15:09:07 -04:00
eric
bf7a9d8106
patch for missing language
2018-04-12 15:08:29 -04:00
eric
748b0eaa63
add test
2018-04-09 17:26:04 -04:00
eric
c26e365a64
fixed imports
2018-04-09 16:58:58 -04:00
eric
ca94c128de
online to download handling
...
+ fix bug that made everythong 'online'
+ handle online ebooks with multiple format downloads
+ download ebooks with volatile links
+ move contenttyper to core.loaders.utils
+ add handling for really html ebooks
2018-04-09 16:32:52 -04:00
eric
07fd095b9a
fix bugs
2018-04-09 11:54:16 -04:00
eric
0ba2906c62
delint
2018-04-07 18:38:33 -04:00
eric
e03fa239b4
revamp doab loading
...
- doab loading now done primarily by oai, no processing of csv.
- added pyoai and updated lxml
- doab ids or urls in ebook submission now handled by oai scrape
- doab_load_books removed
- doab_utils moved from Gluejar/DOAB
- licenses now recognizes OpenEdition
- new ebook type "online" will implement in UI after mobile launch;
ebooks now creaded for html contenttype
2018-04-07 17:11:36 -04:00
eric
533eb94152
load springer improvements
...
We've loaded about half the Springer Open books catalog, adding 20
books at a time. I wanted to load page 23 of results without having to
load pages 1-22. Also added some exception handling.
2018-03-22 16:13:55 -04:00
eric
ad9523314d
fix bug in ubiquity scraper
2018-02-20 13:07:44 -05:00
eric
33f4b75417
stricter RE
2018-01-04 16:53:29 -05:00
eric
ba381add02
add smashwords
2018-01-03 15:53:02 -05:00
eric
59388933a9
one scraper per file
2018-01-03 13:58:45 -05:00
eric
e837dd6ff2
added date validation
2018-01-03 13:30:36 -05:00
eric
c8837c3c74
make check_metas case insensitive for name
2018-01-03 11:54:48 -05:00
eric
3f3428a68b
add some opengraph support
2018-01-02 18:20:34 -05:00
eric
f1213d590c
fix can_scrape
2018-01-01 19:25:00 -05:00
eric
cf093c945d
add some custom code for ubiquity press sites
2017-12-23 18:29:16 -05:00
eric
e6dbae05db
update springer
2017-12-23 18:15:59 -05:00
eric
f701f1ba36
refactor can_scrape
2017-12-23 18:12:07 -05:00
eric
d1cf6e6fb3
fix some scraping bugs
2017-12-15 19:26:50 -05:00
eric
ebf68befeb
add Springer publisher
2017-12-10 16:38:30 -05:00
eric
3c7c9ade00
add Springer to get_scraper
2017-12-07 17:36:35 -05:00
eric
d53b3bcc8d
delint
2017-12-07 17:36:08 -05:00
eric
5ccd7a0a47
add get_role to scraper
2017-12-07 17:35:52 -05:00
eric
c6885ff84b
fix springer descriptions
2017-12-07 16:35:11 -05:00
eric
81c3268f70
fix license url
2017-12-07 16:34:25 -05:00
eric
82784778c4
add springer scraper
2017-12-06 18:13:46 -05:00
eric
28fa60ffba
fix cover finding
2017-11-21 11:10:46 -05:00
eric
a09f3907b3
add pressbooks sites, improve pubdata scraper
2017-11-20 18:05:07 -05:00
eric
98cbef7104
gather isbns from schema.org
...
and stop raising unwanted exceptions
2017-11-06 12:42:52 -05:00
eric
6487916adb
omit review metadata
2017-11-06 12:38:06 -05:00
eric
b5e52effd9
optimize id access
...
See
https://docs.djangoproject.com/en/1.11/topics/db/optimization/#use-forei
gn-key-values-directly
2017-10-28 18:33:58 -04:00
eric
2a7773fafa
add hathitrust scraper
2017-10-27 12:09:03 -04:00
eric
f2fb171708
fix bug
2017-09-28 14:17:12 -04:00
eric
fa4573a74d
authlist cleaner, definition lists
2017-09-28 13:25:56 -04:00
eric
467ab8a425
add scraper selector
2017-09-27 19:20:14 -04:00
eric
db03b59fb4
add code for pressbooks scraping
2017-09-27 17:54:44 -04:00
eric
1ce4323bc4
precheck every new subject
...
fix bug with '/' in subject
interpret ';' as list delimiter
add cleaner script
2017-09-15 15:55:37 -04:00
eric
5bbeb45053
improve merge_works
...
work_relations were not being updated
2017-09-04 16:10:24 -04:00
eric
6895302338
add OpenGraph type, title, and cover to scraper
2017-08-24 14:43:31 -04:00
eric
e7847ae349
remove debug code
2017-08-23 12:24:04 -04:00
eric
0c687fdad4
add command to load from sitemaps
2017-08-23 12:21:56 -04:00
eric
1bd1f943f6
fix bug in edition assignment
2017-08-18 16:39:11 -04:00