Commit Graph

152 Commits (4f7f3c566c478487f4713d212ec97a9184753fc4)

Author SHA1 Message Date
eric e03fa239b4 revamp doab loading
- doab loading now done primarily by oai, no processing of csv.
- added pyoai and updated lxml
- doab ids or urls in ebook submission now handled by oai scrape
- doab_load_books removed
- doab_utils moved from Gluejar/DOAB
- licenses now recognizes OpenEdition
- new ebook type "online" will implement in UI after mobile launch;
ebooks now creaded for html contenttype
2018-04-07 17:11:36 -04:00
eric 4499b556c6 protect long descriptions
scraper was over-writing edited descriptions
2017-12-11 13:45:47 -05:00
eric a3f1509cc2 fix multiple editor setting 2017-12-07 17:33:29 -05:00
eric 6bba688f03 fix kw loading 2017-12-07 16:33:53 -05:00
eric 5c3137a85d delint 2017-12-07 12:50:08 -05:00
eric 82784778c4 add springer scraper 2017-12-06 18:13:46 -05:00
eric af4cac5cf8 http should be a work id 2017-11-21 15:47:02 -05:00
eric d04ebbb694 also add http ids 2017-10-30 19:52:21 -04:00
eric b5e52effd9 optimize id access
See
https://docs.djangoproject.com/en/1.11/topics/db/optimization/#use-forei
gn-key-values-directly
2017-10-28 18:33:58 -04:00
eric efbffa683c Open up editing privileges
keep track of who has added the work with a many-to-many table
2017-10-26 13:03:05 -04:00
eric 86e38d08bb improve namelist parsing 2017-10-06 16:04:59 -04:00
eric 467ab8a425 add scraper selector 2017-09-27 19:20:14 -04:00
eric 326dc6442f tg for tests 2017-09-15 16:50:31 -04:00
eric 1ce4323bc4 precheck every new subject
fix bug with '/' in subject
interpret ';' as list delimiter
add cleaner script
2017-09-15 15:55:37 -04:00
eric 5bbeb45053 improve merge_works
work_relations were not being updated
2017-09-04 16:10:24 -04:00
eric e2e1eac41e merge works when appropriate
pandata bookloader was not merging works
2017-08-24 14:42:35 -04:00
eric 0c687fdad4 add command to load from sitemaps 2017-08-23 12:21:56 -04:00
eric 2a8dff4336 loader shouldn't always believe metadata 2017-08-15 16:51:35 -04:00
eric ca5d9e1053 fix edition note aignment 2017-08-09 21:14:38 -04:00
eric 22e2b8587e fix edition asignment, add doi 2017-08-08 14:06:29 -04:00
eric 8de43cfda8 set user on ebooks loaded from webpage 2017-08-08 12:38:54 -04:00
eric 0ebbb21d47 add source to EbookFile
Want to be able to avoid downloading duplicate ebooks
2017-08-08 10:02:25 -04:00
eric 489790fa2f add ebook loading code 2017-08-07 16:17:00 -04:00
eric e8bd8725cc handle edition ids better
also, allow contributor to request unglue.it id
2017-08-04 17:12:05 -04:00
eric ada73a909c nits and tests 2017-08-03 17:09:42 -04:00
eric 08702a7b08 scrapes the metadata
also moves id validation to core
2017-08-03 16:15:06 -04:00
eric aaef670798 add scraper for webpages
gets title, description, language

adds beautiful soup to requirements
updates gitenberg.metadata import
2017-07-29 20:46:22 -04:00
eric 05af45d13e delint 2017-07-28 12:45:17 -04:00
eric 2adf3cc7cd handle isbn and goog lookups 2017-07-27 15:13:04 -04:00
eric db97a98ae8 https 2017-07-27 10:33:13 -04:00
eric 4cac608362 forgot to move reference 2017-03-16 11:50:10 -04:00
eric 31b6187a5c fix #120246845
2 isbns map to one google id
2016-12-30 10:24:01 -05:00
eric 1c7df5e00e get rid of some loader issues 2016-10-28 14:40:16 -04:00
eric 39cf8c9c0b fix load errors 2016-10-27 20:05:43 -04:00
eric d4f47b2a5e handle age_level in merge_works 2016-10-12 13:55:27 -04:00
eric 5fc4d631ff split version into label and iter 2016-09-23 14:53:58 -04:00
eric 5eabbbb4d2 implement versions in api 2016-08-24 15:43:28 -04:00
eric aafbd7c70b set translation relation in add_related 2016-08-16 11:42:58 -04:00
eric f3cb6c9edf switch to contrib_comments
removed in 1.8
2016-07-21 16:05:57 -04:00
eric f110e02297 match licenses
noted that rights for gitenberg ebooks was not getting set properly
2016-07-14 19:02:22 -04:00
Raymond Yee 15888b8a76 fix regluit.core.tests.BookLoaderTests.test_add_by_local_yaml by adjust how ebook_name mocked 2016-05-16 15:11:03 -07:00
Raymond Yee bf41bfccc6 change bookloader to load books by names of book in release and a command to deactivate currently broken ebooks 2016-05-16 12:43:11 -07:00
Raymond Yee bf914a53de make use of settings.CONTENT_TYPES to compute EBOOK_FORMATS 2016-03-07 13:30:40 -08:00
Raymond Yee c5b7c20593 mock_ebook -> test_mode (much clearer) 2016-03-07 13:17:33 -08:00
Raymond Yee abe04a02c7 * Modified core.bookloader.load_from_yaml to go from assuming that there is an epub to enumerating
ebooks from corresponding release specified in yaml_url

* add GitHubTests.test_ebooks_in_github_release

* modified bookloader.load_from_yaml to allow for mock loading of epub in core.tests.BookLoaderTests.test_add_by_local_yaml
2016-03-04 12:09:30 -08:00
Raymond Yee 1ab3711bbf placeholder for updating load_from_yaml to handle formats other than epub -- e.g., mobi, pdf 2016-03-03 09:49:48 -08:00
Raymond Yee f9320c6279 recognize that get_or_create returns (ebook, created) in load_from_yaml 2016-02-10 16:23:58 -08:00
Raymond Yee 45cb7d4eac [#113378215] prevent a duplicate ebooks with exact same metadata from being created by /api/loader/yaml
add dedupe_ebooks_with_same_urls.py command for deleting duplicate ebooks
2016-02-10 11:04:06 -08:00
eric a74a2c47b2 now handles loading multiple editions 2015-09-24 17:58:34 -04:00
eric d69921c109 loader now aware that agent_name is reversed 2015-09-12 19:20:08 -04:00