update for 23-24

master
eric 2023-07-29 18:04:39 -04:00
parent 5486d766b3
commit 8fcae3d487
4 changed files with 98 additions and 30 deletions

View File

@ -1,22 +1,35 @@
# capstone-projects
This repo is used to organize Free Ebook Foundation projects for Stevens Institute of Technology Senior-year computer science capstone projects.
## 2023-2024
Proposed project:
- [AI for Accessible Alt-text](alt-text.md)
Students interested in this project should use Github issues and pull requests to develop and propose teams. For example, students interested in a project but needing team members, and teams needing additional members should create an issue describing their interest and needs. Use issues to ask questions or seek clarification about the projects. To propose a team for a specific project, create a pull request adding the names of team members to the project page. You may also want to include roles, capabilities and the approach of the team.
We will not accept a proposal PR until September 14. But do not wait until then to start a pull request even if your team is incomplete or you're still deciding - we will comment on PRs with the goal of improving them, and you can close the PR to withdraw the proposal. It is possible that this project coud be divided for two teams to work on.
I expect to meet with teams weekly via zoom and at least once in person - We will use Slack for meetings and discussions.
## 2022-2023
Proposed projects:
Completed project:
- [OAPEN Suggestion Service](oapen-doab.md#oapen-suggestion-service)
Proposed project:
- [Expert System for Open Access Book Website Analysis](oapen-doab.md#expert-system-for-open-access-book-website-analysis)
Students interested in these projects should use Github issues and pull requests to develop and propose teams. For example, students interested in a project but needing team members, and teams needing additional members should create an issue describing their interest and needs. Use issues to ask questions or seek clarification about the projects. To propose a team for a specific project, create a pull request adding the names of team members to the project page. You may also want to include roles, capabilities and the approach of the team.
We will not accept a proposal PR until September 14. But do not wait until then to start a pull request even if your team is incomplete or you're still deciding - we will comment on PRs with the goal of improving them, and you can close the PR to withdraw the proposal. If there are competing proposals, we will give preference to the best developed proposal. We am happy to schedule a Q&A session via Zoom- just request one via Github issues.
I expect to meet with teams weekly via zoom and at least once in person - We will use Slack for meetings and discussions. Note that the Suggestion Service project will need to have a regular scheduled meeting before noon to enable conferencing with Dr. Snijder, who resides in Amsterdam.
## 2021-2022
Completed project:
- [Free-Programming-Books-Search](fpb.md)
- [repo](https://github.com/EbookFoundation/free-programming-books-search/)
- [search page](https://EbookFoundation.github.io/free-programming-books-search/)
@ -26,6 +39,7 @@ I expect to meet with teams weekly via zoom and at least once in person - We wil
- [Project Gutenberg Bookshelves](bookshelves.md)
## 2019-2020
Completed projects:

BIN
ai_for_alt_text.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 934 KiB

65
alt-text.md Normal file
View File

@ -0,0 +1,65 @@
# OAPEN/DOAB projects
# Using AI and Crowdsourcing for Accessible Alt-text
## Background
Images in web pages and books can cause accessibility issues for the blind and reading-disabled. The most important mechanism to mitigate this is to provide "alt text" in an "alt" attribute to allow screen reader software to describe the images.
Project Gutenberg (PG) makes available on its website over 70,000 public domain books that have been transcribed into digital files by processes that use OCR, multi-pass proofreading, and html coding, most of which is done by Distributed Proofreaders (DP), a partner organization. Creation of alt-text descriptions of the accompanying images has lagged significantly, as this process requires special skills. A recent census of the corpus has found that over 486,000 images in the PG catalog have no alt text descriptions. In some cases, this is appropriate, as adjacent captions are provided, but the majority of the images are effectively hidden from blind and reading-disabled users.
With recent advances in machine vision and artificial intelligence, it seems timely to begin creating processes for alt text description analogous to the successful OCR/proofreading processes at DP. Large language models (LLM) such as ChatGPT have already been trained on the Project Gutenberg corpus, so they likely have the data needed to contextually describe the images contained in the text.
## Possible Solution
- Emulate DP OCR/Proofreading workflow with AI/Copyediting/Approval workflow
- Consider context of the image.
- Keep alt-text database separate to allow for tech improvements
Here's an example from a real book:
- [An image in context](https://gutenberg.org/cache/epub/40078/pg40078-images.html#ILL_008)
- [What an AI thinks of it](ai_for_alt_text.png)
## Objectives
How well can existing Machine Vision/AI systems do in providing useful alt text for images in PG texts? How might a web application that allows volunteer editors to proofcheck existing or AI suggested descriptions best integrate with AI capabilities?
To address these questions the team will:
- Evaluate AI platforms and feasibility
- Develop accessibility guidelines for alt-text
- Build an Alt-text Editing/Approval Environment that interacts with the chosen AI
### Advisors
- Eric Hellman, Free Ebook Foundation
### Associated Organizations
- [Project Gutenberg](https://gutenberg.org)
- [Distributed Proofreaders](https://pgdp.net)
### Proposed Team
(fill this in!)
-
-
-
-
-
### More about the team
(fill this in!)
### Resources
- [Alt-texts: The Ultimate Guide](https://axesslab.com/alt-texts/)
- [Tips and Tricks](https://www.w3.org/WAI/tutorials/images/tips/)
- [What is alt text?](https://moz.com/learn/seo/alt-text)

View File

@ -3,8 +3,6 @@
# OAPEN Suggestion Service
## (Project #1)
## Background
[OAPEN](https://oapen.org/) promotes and supports the transition to open access for academic books by providing open infrastructure services to stakeholders in scholarly communication. Over 24,000 books and book chapters are available from the OAPEN platform.
@ -13,27 +11,30 @@ OAPEN has experimented with a suggestion service based on semantic inferencing b
## Goals
This project will build
This project built
- an analysis/mining engine that will ingest the 24,000 texts at OAPEN and produce a trigram map
- a web-service application that will use the trigram map allow websites to present suggestions from the OAPEN catalog based on a book identifier.
The team will use off-the-shelf components such as Django, Node, and Postgres for deployment on Digital Ocean.
The team will used off-the-shelf components such as Docker, Node, and Postgres for deployment on Digital Ocean.
### Project Repo
[OAPEN Suggestion Service](https://github.com/EbookFoundation/oapen-suggestion-service)
### Advisors
- Ronald Snijder, OAPEN
- Eric Hellman, Free Ebook Foundation
### Proposed Team
### Team
-
-
-
-
-
- Celina Peralta
- Justin O'Boyle
- Maxim Zaremba
- Peter Rauscher
- Joseph Sofia (1st semester)
### More about the team
### Reference
@ -43,7 +44,7 @@ The team will use off-the-shelf components such as Django, Node, and Postgres fo
# Expert System for Open Access Book Website Analysis
## (Project #2)
## (this project was not worked on)
## Background
@ -66,15 +67,3 @@ Advisors
- Eric Hellman, Free Ebook Foundation
- Ronald Snijder, OAPEN
### Proposed Team
-
-
-
-
-
### More about the team