regluit

Commit Graph

Author	SHA1	Message	Date
eric	ca94c128de	online to download handling + fix bug that made everythong 'online' + handle online ebooks with multiple format downloads + download ebooks with volatile links + move contenttyper to core.loaders.utils + add handling for really html ebooks	2018-04-09 16:32:52 -04:00
eric	07fd095b9a	fix bugs	2018-04-09 11:54:16 -04:00
eric	0ba2906c62	delint	2018-04-07 18:38:33 -04:00
eric	e03fa239b4	revamp doab loading - doab loading now done primarily by oai, no processing of csv. - added pyoai and updated lxml - doab ids or urls in ebook submission now handled by oai scrape - doab_load_books removed - doab_utils moved from Gluejar/DOAB - licenses now recognizes OpenEdition - new ebook type "online" will implement in UI after mobile launch; ebooks now creaded for html contenttype	2018-04-07 17:11:36 -04:00
eric	533eb94152	load springer improvements We've loaded about half the Springer Open books catalog, adding 20 books at a time. I wanted to load page 23 of results without having to load pages 1-22. Also added some exception handling.	2018-03-22 16:13:55 -04:00
eric	ad9523314d	fix bug in ubiquity scraper	2018-02-20 13:07:44 -05:00
eric	33f4b75417	stricter RE	2018-01-04 16:53:29 -05:00
eric	ba381add02	add smashwords	2018-01-03 15:53:02 -05:00
eric	59388933a9	one scraper per file	2018-01-03 13:58:45 -05:00
eric	e837dd6ff2	added date validation	2018-01-03 13:30:36 -05:00
eric	c8837c3c74	make check_metas case insensitive for name	2018-01-03 11:54:48 -05:00
eric	3f3428a68b	add some opengraph support	2018-01-02 18:20:34 -05:00
eric	f1213d590c	fix can_scrape	2018-01-01 19:25:00 -05:00
eric	cf093c945d	add some custom code for ubiquity press sites	2017-12-23 18:29:16 -05:00
eric	e6dbae05db	update springer	2017-12-23 18:15:59 -05:00
eric	f701f1ba36	refactor can_scrape	2017-12-23 18:12:07 -05:00
eric	d1cf6e6fb3	fix some scraping bugs	2017-12-15 19:26:50 -05:00
eric	ebf68befeb	add Springer publisher	2017-12-10 16:38:30 -05:00
eric	3c7c9ade00	add Springer to get_scraper	2017-12-07 17:36:35 -05:00
eric	d53b3bcc8d	delint	2017-12-07 17:36:08 -05:00
eric	5ccd7a0a47	add get_role to scraper	2017-12-07 17:35:52 -05:00
eric	c6885ff84b	fix springer descriptions	2017-12-07 16:35:11 -05:00
eric	81c3268f70	fix license url	2017-12-07 16:34:25 -05:00
eric	82784778c4	add springer scraper	2017-12-06 18:13:46 -05:00
eric	28fa60ffba	fix cover finding	2017-11-21 11:10:46 -05:00
eric	a09f3907b3	add pressbooks sites, improve pubdata scraper	2017-11-20 18:05:07 -05:00
eric	98cbef7104	gather isbns from schema.org and stop raising unwanted exceptions	2017-11-06 12:42:52 -05:00
eric	6487916adb	omit review metadata	2017-11-06 12:38:06 -05:00
eric	b5e52effd9	optimize id access See https://docs.djangoproject.com/en/1.11/topics/db/optimization/#use-forei gn-key-values-directly	2017-10-28 18:33:58 -04:00
eric	2a7773fafa	add hathitrust scraper	2017-10-27 12:09:03 -04:00
eric	f2fb171708	fix bug	2017-09-28 14:17:12 -04:00
eric	fa4573a74d	authlist cleaner, definition lists	2017-09-28 13:25:56 -04:00
eric	467ab8a425	add scraper selector	2017-09-27 19:20:14 -04:00
eric	db03b59fb4	add code for pressbooks scraping	2017-09-27 17:54:44 -04:00
eric	1ce4323bc4	precheck every new subject fix bug with '/' in subject interpret ';' as list delimiter add cleaner script	2017-09-15 15:55:37 -04:00
eric	5bbeb45053	improve merge_works work_relations were not being updated	2017-09-04 16:10:24 -04:00
eric	6895302338	add OpenGraph type, title, and cover to scraper	2017-08-24 14:43:31 -04:00
eric	e7847ae349	remove debug code	2017-08-23 12:24:04 -04:00
eric	0c687fdad4	add command to load from sitemaps	2017-08-23 12:21:56 -04:00
eric	1bd1f943f6	fix bug in edition assignment	2017-08-18 16:39:11 -04:00
eric	ca5d9e1053	fix edition note aignment	2017-08-09 21:14:38 -04:00
eric	f9d31b0f51	fix glue resolution	2017-08-07 21:46:21 -04:00
eric	489790fa2f	add ebook loading code	2017-08-07 16:17:00 -04:00
eric	e8bd8725cc	handle edition ids better also, allow contributor to request unglue.it id	2017-08-04 17:12:05 -04:00
eric	08702a7b08	scrapes the metadata also moves id validation to core	2017-08-03 16:15:06 -04:00
eric	7bc72692c5	add exception handling	2017-07-30 13:55:46 -04:00
eric	aaef670798	add scraper for webpages gets title, description, language adds beautiful soup to requirements updates gitenberg.metadata import	2017-07-29 20:46:22 -04:00
eric	2adf3cc7cd	handle isbn and goog lookups	2017-07-27 15:13:04 -04:00
eric	7294a5c679	update doi regexp and display https://www.crossref.org/display-guidelines/	2017-02-22 11:21:24 -05:00
eric	652d9a3456	modify doab load to handle authlists also fix a few encoding issues and null data problems resulting in non-loading and ftp redirects	2016-12-02 15:50:07 -05:00

1 2

65 Commits (ca94c128deda4d32ec1850a8cc79b18743c6e382)