eric
b35aa2ce93
fix test failures for django 1.10.8
2018-07-22 13:14:27 -04:00
eric
ee03d2d434
add hosts
2018-07-12 12:56:09 -04:00
eric
da601a77f6
final fixes
2018-07-11 13:41:52 -04:00
eric
40794ee3f9
use rights info to set rights
2018-07-10 13:58:38 -04:00
eric
ec3d26118e
fr/en
2018-07-10 13:58:06 -04:00
eric
2f532b97f9
scrape multiple books from one url
2018-07-09 15:46:36 -04:00
eric
2f9dda8432
less agressive merging in doab
2018-06-18 17:04:40 -04:00
eric
3bc7d5c003
fix loader tests
2018-06-18 17:03:41 -04:00
eric
7593944dc0
reset default to 15 days
2018-06-15 15:30:04 -04:00
eric
bade8e7f4d
handle records without downloads
2018-06-15 10:34:23 -04:00
eric
05fae60ddb
delint
2018-05-11 11:46:04 -04:00
eric
db9b6e5221
harvest_online_ebooks should count books actually harvested
2018-05-10 16:17:16 -04:00
eric
6585bdd52a
provide fallback for hathi scraper
...
It turns out http://hdl.handle.net/2027/ is used for all of umich, not
just hathitrust
2018-04-27 10:54:41 -04:00
eric
a0dc106f6d
fix issue with merged works coming back from related editions
2018-04-26 14:57:55 -04:00
eric
8d5da39e5f
make populate edition synchronous for doab
2018-04-25 11:21:02 -04:00
eric
fa82411921
don't load chapters
2018-04-23 15:41:42 -04:00
eric
6bca7f0983
bugs
2018-04-18 21:39:40 -04:00
eric
bbd421d1f2
fix various bugs
2018-04-18 17:53:21 -04:00
eric
c9e7d5d5ac
avoid errors with using string methods on content_type
2018-04-18 14:56:26 -04:00
eric
3590c1a59f
default load_doab to last 45 days
2018-04-18 14:53:42 -04:00
eric
1d6af73cf2
handle isbns separated by '/'
2018-04-18 11:29:57 -04:00
eric
78d66a247e
don't fail if null edition
2018-04-17 14:21:21 -04:00
eric
447ed4b2d5
fix cover loading
2018-04-17 14:20:44 -04:00
eric
8dd1fb1822
remove doab author loader
...
now uses oai functionality
2018-04-16 13:44:10 -04:00
eric
b849f3a6ef
finish mapping languages
2018-04-16 12:32:21 -04:00
eric
a6039e4015
better handling of language codes
2018-04-13 14:39:03 -04:00
eric
e433c13108
fix online_to_download bugs
2018-04-13 14:38:39 -04:00
eric
9a6b1efd0d
fix bugs for records with missing fields
2018-04-13 14:37:50 -04:00
eric
ba7b02b939
add alternate url pattern for doab_id
2018-04-12 15:09:07 -04:00
eric
bf7a9d8106
patch for missing language
2018-04-12 15:08:29 -04:00
eric
748b0eaa63
add test
2018-04-09 17:26:04 -04:00
eric
c26e365a64
fixed imports
2018-04-09 16:58:58 -04:00
eric
ca94c128de
online to download handling
...
+ fix bug that made everythong 'online'
+ handle online ebooks with multiple format downloads
+ download ebooks with volatile links
+ move contenttyper to core.loaders.utils
+ add handling for really html ebooks
2018-04-09 16:32:52 -04:00
eric
07fd095b9a
fix bugs
2018-04-09 11:54:16 -04:00
eric
0ba2906c62
delint
2018-04-07 18:38:33 -04:00
eric
e03fa239b4
revamp doab loading
...
- doab loading now done primarily by oai, no processing of csv.
- added pyoai and updated lxml
- doab ids or urls in ebook submission now handled by oai scrape
- doab_load_books removed
- doab_utils moved from Gluejar/DOAB
- licenses now recognizes OpenEdition
- new ebook type "online" will implement in UI after mobile launch;
ebooks now creaded for html contenttype
2018-04-07 17:11:36 -04:00
eric
533eb94152
load springer improvements
...
We've loaded about half the Springer Open books catalog, adding 20
books at a time. I wanted to load page 23 of results without having to
load pages 1-22. Also added some exception handling.
2018-03-22 16:13:55 -04:00
eric
ad9523314d
fix bug in ubiquity scraper
2018-02-20 13:07:44 -05:00
eric
33f4b75417
stricter RE
2018-01-04 16:53:29 -05:00
eric
ba381add02
add smashwords
2018-01-03 15:53:02 -05:00
eric
59388933a9
one scraper per file
2018-01-03 13:58:45 -05:00
eric
e837dd6ff2
added date validation
2018-01-03 13:30:36 -05:00
eric
c8837c3c74
make check_metas case insensitive for name
2018-01-03 11:54:48 -05:00
eric
3f3428a68b
add some opengraph support
2018-01-02 18:20:34 -05:00
eric
f1213d590c
fix can_scrape
2018-01-01 19:25:00 -05:00
eric
cf093c945d
add some custom code for ubiquity press sites
2017-12-23 18:29:16 -05:00
eric
e6dbae05db
update springer
2017-12-23 18:15:59 -05:00
eric
f701f1ba36
refactor can_scrape
2017-12-23 18:12:07 -05:00
eric
d1cf6e6fb3
fix some scraping bugs
2017-12-15 19:26:50 -05:00
eric
ebf68befeb
add Springer publisher
2017-12-10 16:38:30 -05:00