eric
ca94c128de
online to download handling
...
+ fix bug that made everythong 'online'
+ handle online ebooks with multiple format downloads
+ download ebooks with volatile links
+ move contenttyper to core.loaders.utils
+ add handling for really html ebooks
2018-04-09 16:32:52 -04:00
eric
07fd095b9a
fix bugs
2018-04-09 11:54:16 -04:00
eric
0ba2906c62
delint
2018-04-07 18:38:33 -04:00
eric
e03fa239b4
revamp doab loading
...
- doab loading now done primarily by oai, no processing of csv.
- added pyoai and updated lxml
- doab ids or urls in ebook submission now handled by oai scrape
- doab_load_books removed
- doab_utils moved from Gluejar/DOAB
- licenses now recognizes OpenEdition
- new ebook type "online" will implement in UI after mobile launch;
ebooks now creaded for html contenttype
2018-04-07 17:11:36 -04:00
eric
533eb94152
load springer improvements
...
We've loaded about half the Springer Open books catalog, adding 20
books at a time. I wanted to load page 23 of results without having to
load pages 1-22. Also added some exception handling.
2018-03-22 16:13:55 -04:00
eric
ad9523314d
fix bug in ubiquity scraper
2018-02-20 13:07:44 -05:00
eric
33f4b75417
stricter RE
2018-01-04 16:53:29 -05:00
eric
ba381add02
add smashwords
2018-01-03 15:53:02 -05:00
eric
59388933a9
one scraper per file
2018-01-03 13:58:45 -05:00
eric
e837dd6ff2
added date validation
2018-01-03 13:30:36 -05:00
eric
c8837c3c74
make check_metas case insensitive for name
2018-01-03 11:54:48 -05:00
eric
3f3428a68b
add some opengraph support
2018-01-02 18:20:34 -05:00
eric
f1213d590c
fix can_scrape
2018-01-01 19:25:00 -05:00
eric
cf093c945d
add some custom code for ubiquity press sites
2017-12-23 18:29:16 -05:00
eric
e6dbae05db
update springer
2017-12-23 18:15:59 -05:00
eric
f701f1ba36
refactor can_scrape
2017-12-23 18:12:07 -05:00
eric
d1cf6e6fb3
fix some scraping bugs
2017-12-15 19:26:50 -05:00
eric
ebf68befeb
add Springer publisher
2017-12-10 16:38:30 -05:00
eric
3c7c9ade00
add Springer to get_scraper
2017-12-07 17:36:35 -05:00
eric
d53b3bcc8d
delint
2017-12-07 17:36:08 -05:00
eric
5ccd7a0a47
add get_role to scraper
2017-12-07 17:35:52 -05:00
eric
c6885ff84b
fix springer descriptions
2017-12-07 16:35:11 -05:00
eric
81c3268f70
fix license url
2017-12-07 16:34:25 -05:00
eric
82784778c4
add springer scraper
2017-12-06 18:13:46 -05:00
eric
28fa60ffba
fix cover finding
2017-11-21 11:10:46 -05:00
eric
a09f3907b3
add pressbooks sites, improve pubdata scraper
2017-11-20 18:05:07 -05:00
eric
98cbef7104
gather isbns from schema.org
...
and stop raising unwanted exceptions
2017-11-06 12:42:52 -05:00
eric
6487916adb
omit review metadata
2017-11-06 12:38:06 -05:00
eric
b5e52effd9
optimize id access
...
See
https://docs.djangoproject.com/en/1.11/topics/db/optimization/#use-forei
gn-key-values-directly
2017-10-28 18:33:58 -04:00
eric
2a7773fafa
add hathitrust scraper
2017-10-27 12:09:03 -04:00
eric
f2fb171708
fix bug
2017-09-28 14:17:12 -04:00
eric
fa4573a74d
authlist cleaner, definition lists
2017-09-28 13:25:56 -04:00
eric
467ab8a425
add scraper selector
2017-09-27 19:20:14 -04:00
eric
db03b59fb4
add code for pressbooks scraping
2017-09-27 17:54:44 -04:00
eric
1ce4323bc4
precheck every new subject
...
fix bug with '/' in subject
interpret ';' as list delimiter
add cleaner script
2017-09-15 15:55:37 -04:00
eric
5bbeb45053
improve merge_works
...
work_relations were not being updated
2017-09-04 16:10:24 -04:00
eric
6895302338
add OpenGraph type, title, and cover to scraper
2017-08-24 14:43:31 -04:00
eric
e7847ae349
remove debug code
2017-08-23 12:24:04 -04:00
eric
0c687fdad4
add command to load from sitemaps
2017-08-23 12:21:56 -04:00
eric
1bd1f943f6
fix bug in edition assignment
2017-08-18 16:39:11 -04:00
eric
ca5d9e1053
fix edition note aignment
2017-08-09 21:14:38 -04:00
eric
f9d31b0f51
fix glue resolution
2017-08-07 21:46:21 -04:00
eric
489790fa2f
add ebook loading code
2017-08-07 16:17:00 -04:00
eric
e8bd8725cc
handle edition ids better
...
also, allow contributor to request unglue.it id
2017-08-04 17:12:05 -04:00
eric
08702a7b08
scrapes the metadata
...
also moves id validation to core
2017-08-03 16:15:06 -04:00
eric
7bc72692c5
add exception handling
2017-07-30 13:55:46 -04:00
eric
aaef670798
add scraper for webpages
...
gets title, description, language
adds beautiful soup to requirements
updates gitenberg.metadata import
2017-07-29 20:46:22 -04:00
eric
2adf3cc7cd
handle isbn and goog lookups
2017-07-27 15:13:04 -04:00
eric
7294a5c679
update doi regexp and display
...
https://www.crossref.org/display-guidelines/
2017-02-22 11:21:24 -05:00
eric
652d9a3456
modify doab load to handle authlists
...
also fix a few encoding issues and null data problems resulting in
non-loading and ftp redirects
2016-12-02 15:50:07 -05:00