eric
|
4969994a87
|
urllib2 didn't handle chunked method
|
2019-06-13 16:20:05 -04:00 |
eric
|
e3a5a50f34
|
catch S3 exception
|
2019-06-13 16:18:54 -04:00 |
eric
|
703db9ed98
|
add SciELO to good providers
|
2019-06-12 17:12:54 -04:00 |
eric
|
6814380aa4
|
tweak scielo handling
and add a management command to fix the old ones
|
2019-06-12 17:02:57 -04:00 |
eric
|
d5f5656d3c
|
fix missing logger
|
2019-06-12 17:02:11 -04:00 |
eric
|
e60e8bfbf8
|
get dl url from dl link
|
2019-06-07 15:20:05 -04:00 |
eric
|
e42d77589b
|
tighten exception handling
got a bunch of integrity errors failure; probably some other exception being throughn here.
|
2019-06-06 17:23:45 -04:00 |
eric
|
e5ba5caab4
|
revert search method
fulltext search returned too many results
|
2019-06-05 14:21:02 -04:00 |
eric
|
de3e6c499c
|
try to fix missing scheme
|
2019-05-05 12:50:52 -04:00 |
eric
|
14346ed868
|
delint
|
2019-03-27 21:46:25 -04:00 |
eric
|
c142533898
|
db cleaning
|
2019-03-27 21:22:56 -04:00 |
eric
|
e563da9655
|
refactor lang validation
|
2019-03-27 21:22:37 -04:00 |
eric
|
6fd33d989c
|
don't create bad works
|
2019-03-27 21:21:25 -04:00 |
eric
|
5fc6a2ee82
|
harvest more ebooks
|
2019-03-25 12:47:20 -04:00 |
eric
|
fe05ff9f88
|
don't stall on super big pdf files
|
2019-03-25 12:47:04 -04:00 |
eric
|
2396e23ae4
|
fix missing lang string
|
2019-03-25 12:46:20 -04:00 |
eric
|
174b46abd1
|
add mobied to ebf admin
|
2019-03-25 12:45:53 -04:00 |
eric
|
c190fc0bb1
|
fix undefined "stapled"
|
2019-03-08 23:45:54 -05:00 |
eric
|
9b12418ada
|
catch more pdf errors
|
2019-03-05 12:02:42 -05:00 |
eric
|
cefbc7c56f
|
bugfix
|
2019-03-05 10:12:51 -05:00 |
eric
|
d87578c5a0
|
harden stapler
|
2019-03-04 17:27:55 -05:00 |
eric
|
52b1621633
|
bugfix
|
2019-03-02 20:55:42 -05:00 |
eric
|
7c33cae82e
|
refinements
- handle dropbox urls with no params
- catch exceptions in stapler
- fix dedupe summary
|
2019-03-02 19:16:47 -05:00 |
eric
|
9bf2d85108
|
fix degruyter signifier
also propagate user_agent
|
2019-03-02 16:00:11 -05:00 |
eric
|
943031ca22
|
whoops
|
2019-03-01 22:38:46 -05:00 |
eric
|
02170c9bc2
|
management commands
1. run an update of providers
2. dedupe the online ebooks
3. should have half the onlines to harvest
|
2019-03-01 21:26:39 -05:00 |
eric
|
ac5c241e09
|
resolve doi in doab provider
- resolve the doi before setting the provider
- strip "www." from netloc
- strip url before setting provider
|
2019-03-01 21:23:54 -05:00 |
eric
|
1fdac9c548
|
remove dead code
|
2019-02-28 16:34:14 -05:00 |
eric
|
0282ed8136
|
delint
|
2019-02-28 16:22:23 -05:00 |
eric
|
72a40976bc
|
add degruyter handling
- move harvest to separate module
- add ratelimiter class
- add pdf stapler
- add a googlebot UA
- add base url storage in get_soup
|
2019-02-28 15:32:41 -05:00 |
eric
|
e162308191
|
change to a fulltext query and indices
(this is only a ~20% improvement)
|
2019-02-27 16:40:21 -05:00 |
eric
|
390f403e6c
|
missing import
|
2019-02-18 15:29:16 -05:00 |
eric
|
1a8f22411a
|
change to ku sso
|
2019-02-18 15:06:40 -05:00 |
eric
|
8652ce0b77
|
add rounds to ku
|
2019-01-18 12:03:04 -05:00 |
eric
|
c6771f2eed
|
fix limit on harvest_online
|
2018-12-10 14:30:54 -05:00 |
eric
|
260650ba92
|
handle application/binary
|
2018-12-10 14:28:39 -05:00 |
eric
|
24ab902e00
|
added ebook activation
|
2018-11-05 18:48:35 -05:00 |
eric
|
ed64dc2b3f
|
bugfix
|
2018-11-05 18:17:46 -05:00 |
eric
|
6535505e4d
|
Revert "Merge branch 'master' into master"
This reverts commit bd52df020d , reversing
changes made to e455d9a766 .
|
2018-11-03 17:23:07 -04:00 |
eshellman
|
bd52df020d
|
Merge branch 'master' into master
|
2018-11-03 17:06:09 -04:00 |
eric
|
f4d7e6f888
|
working ku code
|
2018-11-03 14:47:41 -04:00 |
eric
|
f98de7114e
|
add oapn id
|
2018-11-03 14:33:23 -04:00 |
eric
|
add0375ac3
|
working scraper
|
2018-11-02 14:03:30 -04:00 |
eshellman
|
b727aaf9a9
|
Merge pull request #813 from Gluejar/kuscrape
Kuscrape
|
2018-11-02 13:58:24 -04:00 |
eric
|
57769f65a1
|
Update core/loaders/multiscrape.py
update to facilitate merg
|
2018-11-02 13:24:23 -04:00 |
eric
|
53995ffb4a
|
allow scrapers to set parser
needed to support xml harvests
|
2018-10-29 22:42:49 -04:00 |
eric
|
3697789274
|
wip
|
2018-10-09 09:05:31 -04:00 |
eric
|
272616895d
|
fix github3 issue
|
2018-09-10 12:04:12 -04:00 |
eric
|
a87cdfc8ef
|
make sure cc url is not garbage
|
2018-09-09 22:12:42 -04:00 |
eric
|
04aed3bf16
|
add opentextbc to pressbooks list
|
2018-09-09 21:55:38 -04:00 |