Commit Graph

7284 Commits (60bbc7b9f7a7264c8db565fc00cc5836776a6888)

Author SHA1 Message Date
eric c3d317da19 add pulp harvest 2020-07-30 13:22:23 -04:00
eric 462e097965 one time use 2020-07-30 13:21:44 -04:00
Eric Hellman 284effbb0e
Merge pull request #889 from Gluejar/maintenance2020
remove duplicate chaps from nomos
2020-07-30 11:31:49 -04:00
eric e4a34a0ba5 mgmt command to clear nomos 2020-07-30 11:21:22 -04:00
eric a4191a99d0 omit duplicates in nomos harvest 2020-07-30 10:36:53 -04:00
Eric Hellman 4280ffbe83
Merge pull request #888 from Gluejar/maintenance2020
misc cleanup
2020-07-29 20:00:17 -04:00
eric 64b03fd40f add OAPEN "harvest" 2020-07-29 19:52:32 -04:00
eric c0505f299b email-change reversion? 2020-07-29 15:57:07 -04:00
eric 8052fde357 some delinting 2020-07-29 15:38:58 -04:00
eric 7a6332d641 fix card-declined error for anon user 2020-07-29 15:26:35 -04:00
Eric Hellman 281a0e5848
Merge pull request #887 from Gluejar/maintenance2020
Exception handling, single ebook harvest
2020-07-29 14:17:41 -04:00
eric 6566afd92f support single ebook harvest 2020-07-29 13:34:11 -04:00
eric 73b863450e handle RecursionError 2020-07-29 12:51:14 -04:00
eric 1c380b0e9f add connection refused handling in get_soup 2020-07-29 11:52:45 -04:00
Eric Hellman e5371b5e21
Merge pull request #886 from Gluejar/maintenance2020
add harvests
2020-07-28 21:07:52 -04:00
eric b25b269a45 add springer harvest 2020-07-28 20:59:31 -04:00
eric 1f2b223c0f add frontiersin harves 2020-07-28 20:59:13 -04:00
eric 01f7273023 add nomos harvest 2020-07-28 20:58:45 -04:00
eric 066b81fb74 add digitalis harvest 2020-07-28 20:58:25 -04:00
eric 19d39cf4a6 add code to deal with ebooks already harvested from different source 2020-07-28 20:57:46 -04:00
Eric Hellman 68612ab55d
Merge pull request #885 from Gluejar/maintenance2020
add ksp.kit.edu harvest
2020-07-28 10:00:44 -04:00
eric b799b2a4c9 increase harvest limit to 500 2020-07-28 09:27:40 -04:00
eric bf73124250 deal with no head 2020-07-27 20:39:45 -04:00
eric 74005584d0 add kit.edu 2020-07-27 19:06:15 -04:00
Eric Hellman b2a8f9fc8c
Merge pull request #884 from Gluejar/maintenance2020
harvest for degruyter and transcript
2020-07-27 18:08:58 -04:00
eric a14a94aba1 fix jbe condition 2020-07-27 17:54:16 -04:00
eric e306f319ce add transcript verlag harvest 2020-07-27 17:53:59 -04:00
eric 5932bc09ed allow harvest to harvest multiple ebooks 2020-07-27 17:53:32 -04:00
eric 26e32a4738 sometimes there's no contenttype header! 2020-07-27 17:50:21 -04:00
eric 2f28b32fbf also use disposition from contenttyper 2020-07-27 17:49:04 -04:00
eric dd76c112e9 fix degruyter harvest 2020-07-27 17:47:53 -04:00
Eric Hellman 9fd50192e3
Merge pull request #883 from Gluejar/maintenance2020
add error handling for doab 404s
2020-07-26 16:15:26 -04:00
eric 15424eaf4d add error handling for doab 404s 2020-07-26 16:06:33 -04:00
Eric Hellman 0d447fe583
Merge pull request #882 from Gluejar/maintenance2020
improve handling when G doesn't return an item with same isbn
2020-07-24 13:38:19 -04:00
eric d86ce969b8 improve handling when G doesn't return an item with same isbn 2020-07-24 13:00:08 -04:00
Eric Hellman a2d295ce9b
Merge pull request #881 from Gluejar/maintenance2020
bugfix
2020-07-23 16:15:20 -04:00
eric 1a8813832e bugfix 2020-07-23 15:48:11 -04:00
Eric Hellman 57823a719b
Merge pull request #880 from Gluejar/maintenance2020
refactor harvest.py
2020-07-23 11:42:19 -04:00
eric eb6ca2d570 refactor harvest.py
also don't remake ebooks
2020-07-23 10:42:34 -04:00
Eric Hellman 9b871ae7ab
Merge pull request #879 from Gluejar/maintenance2020
doab and harvest
2020-07-22 19:47:21 -04:00
eric 28a49a5e11 add some providers 2020-07-22 19:28:23 -04:00
eric 79aa49a1f1 one more thing for doi 2020-07-22 19:28:02 -04:00
eric 42899559e2 enrich management command
can now harvest doab from a date  or  starting at an doab_id
2020-07-22 19:27:45 -04:00
eric e036570068 document puzzling method 2020-07-22 19:16:03 -04:00
eric 737d40593b add OBP harvest
also add support for harvesting books via post
2020-07-22 19:15:34 -04:00
eric 5882c07854 add dois from doab 2020-07-22 19:10:05 -04:00
eric 961da4f081 improved content typing
ContentTyper now
-follows head redirects
-considers content-disposition header
- checks to see if we already know format
- tries get if head not allowed (405)
2020-07-22 19:04:48 -04:00
Eric Hellman 1d779229f2
Merge pull request #878 from Gluejar/maintenance2020
remove facebook
2020-07-20 15:08:13 -04:00
eric 61d0c80b12 remove facebook 2020-07-20 13:29:47 -04:00
Eric Hellman a975830500
Merge pull request #877 from Gluejar/maintenance2020
Maintenance2020
2020-07-20 13:02:14 -04:00