Commit Graph

29 Commits (f476295b7be89ddcd2b5631707da39918a542c1f)

Author SHA1 Message Date
eric 6895302338 add OpenGraph type, title, and cover to scraper 2017-08-24 14:43:31 -04:00
eric e7847ae349 remove debug code 2017-08-23 12:24:04 -04:00
eric 0c687fdad4 add command to load from sitemaps 2017-08-23 12:21:56 -04:00
eric 1bd1f943f6 fix bug in edition assignment 2017-08-18 16:39:11 -04:00
eric ca5d9e1053 fix edition note aignment 2017-08-09 21:14:38 -04:00
eric f9d31b0f51 fix glue resolution 2017-08-07 21:46:21 -04:00
eric 489790fa2f add ebook loading code 2017-08-07 16:17:00 -04:00
eric e8bd8725cc handle edition ids better
also, allow contributor to request unglue.it id
2017-08-04 17:12:05 -04:00
eric 08702a7b08 scrapes the metadata
also moves id validation to core
2017-08-03 16:15:06 -04:00
eric 7bc72692c5 add exception handling 2017-07-30 13:55:46 -04:00
eric aaef670798 add scraper for webpages
gets title, description, language

adds beautiful soup to requirements
updates gitenberg.metadata import
2017-07-29 20:46:22 -04:00
eric 2adf3cc7cd handle isbn and goog lookups 2017-07-27 15:13:04 -04:00
eric 7294a5c679 update doi regexp and display
https://www.crossref.org/display-guidelines/
2017-02-22 11:21:24 -05:00
eric 652d9a3456 modify doab load to handle authlists
also fix a few encoding issues and null data problems resulting in
non-loading and ftp redirects
2016-12-02 15:50:07 -05:00
eric 1c52c42e60 doab author parsing and loader command 2016-11-29 15:37:02 -05:00
eric 671017fced pass edition to update_cover_doab 2016-11-18 13:28:59 -05:00
eric 60e4994756 remove debugging prints 2016-11-01 13:42:40 -04:00
eric b82b51f358 forgot to re-enable error catching 2016-10-31 22:19:00 -04:00
eric 1c7df5e00e get rid of some loader issues 2016-10-28 14:40:16 -04:00
eric 39cf8c9c0b fix load errors 2016-10-27 20:05:43 -04:00
eric 182887fdc2 remove async option 2016-10-12 16:19:43 -04:00
eric 167c7fc60a update doab loader 2016-10-12 16:07:54 -04:00
eric 3ac7769656 move doab.py into loaders 2016-10-11 15:46:03 -04:00
eric c3057b6aef add code to deal with OBP, fix bugs. 2016-06-10 17:57:53 -04:00
Raymond Yee 9364bd7a78 a bit more cleanup 2016-05-24 16:21:36 -07:00
Raymond Yee c39324831e get authors to match now once the utf8_general_ci collation we're using for authors taken into account. 2016-05-24 15:01:56 -07:00
Raymond Yee 7f9478e758 first pass at tests -- some cleanup needed 2016-05-23 17:03:55 -07:00
Raymond Yee 83756c5779 code in progress to test https://github.com/Gluejar/regluit/pull/584 2016-05-21 14:51:52 -07:00
eric cb3581e932 code for loading umich spreadsheet 2016-05-19 09:17:23 -04:00