eric
|
70581b26e7
|
2020 -> 2030
Fix test broken by passage of time
|
2020-01-01 10:13:46 -05:00 |
eric
|
07e4d9b937
|
weird pubdate instability
|
2019-11-30 22:50:11 -05:00 |
eric
|
e650596a64
|
pass tests
|
2019-11-30 18:00:54 -05:00 |
eric
|
c69de41628
|
refactor add_subject
|
2019-11-30 18:00:32 -05:00 |
eric
|
4ec9b73a1a
|
fix subject auth loading
|
2019-11-30 15:29:59 -05:00 |
eric
|
21810d5641
|
handle redirected ebook links
|
2019-11-06 12:40:35 -05:00 |
eric
|
49929763ce
|
seems import order mattered
|
2019-11-05 15:45:32 -05:00 |
eric
|
0bc92ea98d
|
add routledge md scraper
|
2019-11-05 15:37:55 -05:00 |
eric
|
e2571abc42
|
fix bisac loading, add headings
|
2019-11-05 15:34:12 -05:00 |
eric
|
ac599f5d69
|
fix strip
|
2019-07-01 17:21:22 -04:00 |
eric
|
917d90aee1
|
delint
|
2019-07-01 16:32:41 -04:00 |
eric
|
1c5c48ac42
|
Update cc.py
|
2019-07-01 16:25:18 -04:00 |
eric
|
0d748b2498
|
don't get fooled by version strings on CC
|
2019-07-01 16:21:22 -04:00 |
eric
|
4969994a87
|
urllib2 didn't handle chunked method
|
2019-06-13 16:20:05 -04:00 |
eric
|
e3a5a50f34
|
catch S3 exception
|
2019-06-13 16:18:54 -04:00 |
eric
|
703db9ed98
|
add SciELO to good providers
|
2019-06-12 17:12:54 -04:00 |
eric
|
6814380aa4
|
tweak scielo handling
and add a management command to fix the old ones
|
2019-06-12 17:02:57 -04:00 |
eric
|
d5f5656d3c
|
fix missing logger
|
2019-06-12 17:02:11 -04:00 |
eric
|
e60e8bfbf8
|
get dl url from dl link
|
2019-06-07 15:20:05 -04:00 |
eric
|
e42d77589b
|
tighten exception handling
got a bunch of integrity errors failure; probably some other exception being throughn here.
|
2019-06-06 17:23:45 -04:00 |
eric
|
e5ba5caab4
|
revert search method
fulltext search returned too many results
|
2019-06-05 14:21:02 -04:00 |
eric
|
de3e6c499c
|
try to fix missing scheme
|
2019-05-05 12:50:52 -04:00 |
eric
|
14346ed868
|
delint
|
2019-03-27 21:46:25 -04:00 |
eric
|
c142533898
|
db cleaning
|
2019-03-27 21:22:56 -04:00 |
eric
|
e563da9655
|
refactor lang validation
|
2019-03-27 21:22:37 -04:00 |
eric
|
6fd33d989c
|
don't create bad works
|
2019-03-27 21:21:25 -04:00 |
eric
|
5fc6a2ee82
|
harvest more ebooks
|
2019-03-25 12:47:20 -04:00 |
eric
|
fe05ff9f88
|
don't stall on super big pdf files
|
2019-03-25 12:47:04 -04:00 |
eric
|
2396e23ae4
|
fix missing lang string
|
2019-03-25 12:46:20 -04:00 |
eric
|
174b46abd1
|
add mobied to ebf admin
|
2019-03-25 12:45:53 -04:00 |
eric
|
c190fc0bb1
|
fix undefined "stapled"
|
2019-03-08 23:45:54 -05:00 |
eric
|
9b12418ada
|
catch more pdf errors
|
2019-03-05 12:02:42 -05:00 |
eric
|
cefbc7c56f
|
bugfix
|
2019-03-05 10:12:51 -05:00 |
eric
|
d87578c5a0
|
harden stapler
|
2019-03-04 17:27:55 -05:00 |
eric
|
52b1621633
|
bugfix
|
2019-03-02 20:55:42 -05:00 |
eric
|
7c33cae82e
|
refinements
- handle dropbox urls with no params
- catch exceptions in stapler
- fix dedupe summary
|
2019-03-02 19:16:47 -05:00 |
eric
|
9bf2d85108
|
fix degruyter signifier
also propagate user_agent
|
2019-03-02 16:00:11 -05:00 |
eric
|
943031ca22
|
whoops
|
2019-03-01 22:38:46 -05:00 |
eric
|
02170c9bc2
|
management commands
1. run an update of providers
2. dedupe the online ebooks
3. should have half the onlines to harvest
|
2019-03-01 21:26:39 -05:00 |
eric
|
ac5c241e09
|
resolve doi in doab provider
- resolve the doi before setting the provider
- strip "www." from netloc
- strip url before setting provider
|
2019-03-01 21:23:54 -05:00 |
eric
|
1fdac9c548
|
remove dead code
|
2019-02-28 16:34:14 -05:00 |
eric
|
0282ed8136
|
delint
|
2019-02-28 16:22:23 -05:00 |
eric
|
72a40976bc
|
add degruyter handling
- move harvest to separate module
- add ratelimiter class
- add pdf stapler
- add a googlebot UA
- add base url storage in get_soup
|
2019-02-28 15:32:41 -05:00 |
eric
|
e162308191
|
change to a fulltext query and indices
(this is only a ~20% improvement)
|
2019-02-27 16:40:21 -05:00 |
eric
|
390f403e6c
|
missing import
|
2019-02-18 15:29:16 -05:00 |
eric
|
1a8f22411a
|
change to ku sso
|
2019-02-18 15:06:40 -05:00 |
eric
|
8652ce0b77
|
add rounds to ku
|
2019-01-18 12:03:04 -05:00 |
eric
|
c6771f2eed
|
fix limit on harvest_online
|
2018-12-10 14:30:54 -05:00 |
eric
|
260650ba92
|
handle application/binary
|
2018-12-10 14:28:39 -05:00 |
eric
|
24ab902e00
|
added ebook activation
|
2018-11-05 18:48:35 -05:00 |