Added support for more models in ReplicateAPI. Completed automation script for Project Gutenberg. Updated README.
parent
ab3228b234
commit
068b4df6ea
|
@ -8,6 +8,9 @@
|
||||||
**/empty_alt_text_sample.txt
|
**/empty_alt_text_sample.txt
|
||||||
**/book_outputs
|
**/book_outputs
|
||||||
**/downloaded_books
|
**/downloaded_books
|
||||||
|
**/results
|
||||||
|
**/alts.txt
|
||||||
|
**/images.txt
|
||||||
|
|
||||||
**/keys.py
|
**/keys.py
|
||||||
**/vertex-key.json
|
**/vertex-key.json
|
144
README.md
144
README.md
|
@ -1,6 +1,10 @@
|
||||||
# Alt-Text
|
# Alt-Text
|
||||||
|
|
||||||
A PyPi package used for finding, generating, and setting alt-text for images in HTML and EPUB files.
|
A PyPi package used for finding, generating, and setting alt-text for images in HTML files.
|
||||||
|
|
||||||
|
Developed as a Computer Science Senior Design Project at [Stevens Institute of Technology](https://www.stevens.edu/) in collaboration with the [Free Ebook Foundation](https://ebookfoundation.org/).
|
||||||
|
|
||||||
|
[Learn more about the developers](#the-deveolpers).
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
|
@ -26,14 +30,18 @@ As of the moment, the image analyzation tools that Alt-Text uses are not fully b
|
||||||
|
|
||||||
Description Engines are used to generate descriptions of an image. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.
|
Description Engines are used to generate descriptions of an image. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.
|
||||||
|
|
||||||
##### ReplicateMiniGPT4API
|
##### ReplicateAPI
|
||||||
|
|
||||||
ReplicateMiniGPT4API Engine uses the [Replicate API](https://replicate.com/), hence you will need to get an API key via [Logging in with Github](https://replicate.com/signin) on the Replicate website.
|
ReplicateAPI Engine uses the [Replicate API](https://replicate.com/), hence you will need to get an API key via [Logging in with Github](https://replicate.com/signin) on the Replicate website.
|
||||||
|
|
||||||
##### GoogleVertexAPI
|
##### GoogleVertexAPI
|
||||||
|
|
||||||
GoogleVertexAPI Engine uses the [Vertex AI API](https://cloud.google.com/vertex-ai), hence you will need to get access from the [Google API Marketplace](https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com). Additionally, Alt-Text uses Service Account Keys to get authenticated with Google Cloud, hence you will need to [Create a Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete#creating) with permission for the Vertex AI API and have its according JSON.
|
GoogleVertexAPI Engine uses the [Vertex AI API](https://cloud.google.com/vertex-ai), hence you will need to get access from the [Google API Marketplace](https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com). Additionally, Alt-Text uses Service Account Keys to get authenticated with Google Cloud, hence you will need to [Create a Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete#creating) with permission for the Vertex AI API and have its according JSON.
|
||||||
|
|
||||||
|
##### BlipLocal
|
||||||
|
|
||||||
|
The BlipLocal Engine uses a modified version of the [cobanov/image-captioning repository](https://github.com/cobanov/image-captioning), which allows for the use of Blip locally via a CLI. To get started, you must download [this fork](https://github.com/xxmistacruzxx/image-captioning) of the repository and download/install the [BLIP-Large](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth) checkpoint as described in the README.
|
||||||
|
|
||||||
#### OCR Engines
|
#### OCR Engines
|
||||||
|
|
||||||
Optical Character Recognition Engines are used to find text within images. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.
|
Optical Character Recognition Engines are used to find text within images. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.
|
||||||
|
@ -42,9 +50,126 @@ Optical Character Recognition Engines are used to find text within images. If yo
|
||||||
|
|
||||||
The Tesseract Engine uses [Tesseract](https://github.com/tesseract-ocr/tesseract), hence you will need to install the [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html).
|
The Tesseract Engine uses [Tesseract](https://github.com/tesseract-ocr/tesseract), hence you will need to install the [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html).
|
||||||
|
|
||||||
|
#### Language Engines
|
||||||
|
|
||||||
|
Language Engines are used to generate a alt-text given an image description (from the [Description Engine](#Description-Engines)), characters found in an image (from the [OCR Engine](#OCR-Engines)), and context within the Ebook. If you are to use one of these, you will need to fulfill that specific Engine's dependencies before use.
|
||||||
|
|
||||||
|
##### OpenAI API
|
||||||
|
|
||||||
|
The OpenAI API Engine gives access to [Open AI's GPT Models via their API](https://platform.openai.com/docs/models). To use this, you will need an [API Key](https://openai.com/blog/openai-api) with access to the appropriate tier (more info on their [pricing page](https://openai.com/pricing)).
|
||||||
|
|
||||||
|
##### PrivateGPT
|
||||||
|
|
||||||
|
The PrivateGPT Engine gives allows for easy integration with an instance of [PrivateGPT](https://github.com/zylon-ai/private-gpt). To use this, you'll need a running instance of a [PrivateGPT API Server](https://docs.privategpt.dev/overview/welcome/introduction).
|
||||||
|
|
||||||
## Quickstart & Usage
|
## Quickstart & Usage
|
||||||
|
|
||||||
To be added...
|
### Setup
|
||||||
|
|
||||||
|
#### Standard Setup
|
||||||
|
|
||||||
|
The standard setup assumes that you have access to a [Description Engine](#Description-Engines) and [Language Engine](#Language-Engines) (the [OCR Engine](#OCR-Engines) being optional).
|
||||||
|
|
||||||
|
```python
|
||||||
|
from alttext.alttext import AltTextHTML
|
||||||
|
|
||||||
|
alt = AltTextHTML(
|
||||||
|
ReplicateAPI("REPLICATE_KEY"),
|
||||||
|
# Tesseract(),
|
||||||
|
OpenAIAPI("OPENAI_KEY", "gpt-3.5-turbo"),
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Legacy Setup
|
||||||
|
|
||||||
|
This setup assumes that you have access to a [Description Engine]() (the [OCR Engine]() and [Language Engine]() being optional).
|
||||||
|
|
||||||
|
```python
|
||||||
|
from alttext.alttext import AltTextHTML
|
||||||
|
|
||||||
|
alt = AltTextHTML(
|
||||||
|
ReplicateAPI("REPLICATE_KEY"),
|
||||||
|
# Tesseract(),
|
||||||
|
# OpenAIAPI("OPENAI_KEY", "gpt-3.5-turbo"),
|
||||||
|
options = {"version": 1}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Options
|
||||||
|
|
||||||
|
Below are the default options for the `AltTextHTML` class. You can change these by passing a `dict` into the `options` parameter during instantiation. When passing options, you only need the options you'd like to change from the default values in the `dict`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
DEFOPTIONS = {
|
||||||
|
"withContext": True,
|
||||||
|
"withHash": True,
|
||||||
|
"multiThreaded": True,
|
||||||
|
"version": 2,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
#### Loading an Ebook
|
||||||
|
|
||||||
|
```python
|
||||||
|
# from a file
|
||||||
|
alt.parseFile("/path/to/ebook.html")
|
||||||
|
|
||||||
|
# or from a string
|
||||||
|
alt.parse("<HTML>...</HTML>")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Getting Images
|
||||||
|
|
||||||
|
```python
|
||||||
|
# getting all images
|
||||||
|
imgs : list[bs4.element.Tag] = alt.getAllImgs()
|
||||||
|
|
||||||
|
# getting all images with no alt attribute or where alt = ""
|
||||||
|
imgs_noalt : list[bs4.element.Tag] = alt.getNoAltImgs()
|
||||||
|
|
||||||
|
# get a specific image by src
|
||||||
|
img : bs4.element.Tag = alt.getImg("path_as_in_html/image.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Generating Alt-Text
|
||||||
|
|
||||||
|
```python
|
||||||
|
# generate alt-text for a single image by src
|
||||||
|
alt_text : str = alt.genAltText("path_as_in_html/image.png")
|
||||||
|
|
||||||
|
# generate an association from an image tag
|
||||||
|
# example_association = {
|
||||||
|
# "src" : "path_as_in_html/image.png"
|
||||||
|
# "alt" : "generated alt text"
|
||||||
|
# "hash" : 1234
|
||||||
|
# }
|
||||||
|
association : dict = alt.genAssociation(img : bs4.element.Tag)
|
||||||
|
|
||||||
|
# generate a list of associations given a list of image tags
|
||||||
|
associations : list[dict] = alt.genAltAssociations(imgs : list[bs4.element.Tag])
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Setting Alt-Text
|
||||||
|
|
||||||
|
```python
|
||||||
|
# setting alt-text for a single image by src
|
||||||
|
new_img_tag : bs4.element.Tag = alt.setAlt("path_as_in_html/image.png", "new alt")
|
||||||
|
|
||||||
|
# setting alt-text for multiple images given a list of associations
|
||||||
|
new_img_tags : list[bs4.element.Tag] = alt.setAlts(associations : list[dict])
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Exporting Current HTML Status
|
||||||
|
|
||||||
|
```python
|
||||||
|
# getting current html as string
|
||||||
|
html : str = alt.export()
|
||||||
|
|
||||||
|
# exporting to a file
|
||||||
|
path : str = alt.exportToFile("path/to/new_html.html")
|
||||||
|
```
|
||||||
|
|
||||||
## Our Mission
|
## Our Mission
|
||||||
|
|
||||||
|
@ -52,9 +177,9 @@ The Alt-Text project is developed for the [Free Ebook Foundation](https://ebookf
|
||||||
|
|
||||||
As Ebooks become a more prominant way to consume written materials, it only becomes more important for them to be accessible to all people. Alternative text (aka alt-text) in Ebooks are used as a way for people to understand images in Ebooks if they are unable to use images as intended (e.g. a visual impaired person using a screen reader to read an Ebook).
|
As Ebooks become a more prominant way to consume written materials, it only becomes more important for them to be accessible to all people. Alternative text (aka alt-text) in Ebooks are used as a way for people to understand images in Ebooks if they are unable to use images as intended (e.g. a visual impaired person using a screen reader to read an Ebook).
|
||||||
|
|
||||||
While this feature exists, it is still not fully utilized and many Ebooks lack alt-text in some, or even all their images. To illustrate this, the [Gutenberg Project](https://gutenberg.org/), the creator of the Ebook and now a distributor of Public Domain Ebooks, have over 70,000 Ebooks in their collection and of those, there are about 470,000 images without alt-text.
|
While this feature exists, it is still not fully utilized and many Ebooks lack alt-text in some, or even all their images. To illustrate this, the [Gutenberg Project](https://gutenberg.org/), the creator of the Ebook and now a distributor of Public Domain Ebooks, have over 70,000 Ebooks in their collection and of those, there are about 470,000 images without alt-text (not including images with insufficient alt-text).
|
||||||
|
|
||||||
The Alt-Text project's goal is to use the power of AI, Automation, and the Internet to craft a solution capable of automatically generating descriptions for images lacking alt-text in Ebooks, closing the accessibility gap and improving collections, such as the [Gutenberg Project](https://gutenberg.org/).
|
The Alt-Text project's goal is to use the power of various AI technologies, such as machine vision and large language models, to craft a solution capable of assisting in the creation of alt-text for Ebooks, closing the accessibility gap and improving collections, such as the [Gutenberg Project](https://gutenberg.org/).
|
||||||
|
|
||||||
### Contact Information
|
### Contact Information
|
||||||
|
|
||||||
|
@ -90,7 +215,7 @@ The emails and relevant information of those involved in the Alt-Text project ca
|
||||||
|
|
||||||
## APIs, Tools, & Libraries Used
|
## APIs, Tools, & Libraries Used
|
||||||
|
|
||||||
Alt-Text is developed using an assortment of modern Python tools...
|
Alt-Text is developed using an assortment of tools...
|
||||||
|
|
||||||
### Development Tools
|
### Development Tools
|
||||||
|
|
||||||
|
@ -100,13 +225,18 @@ Alt-Text is developed using...
|
||||||
- [EbookLib](https://pypi.org/project/EbookLib/)
|
- [EbookLib](https://pypi.org/project/EbookLib/)
|
||||||
- [Replicate](https://pypi.org/project/replicate/)
|
- [Replicate](https://pypi.org/project/replicate/)
|
||||||
- [Google-Cloud-AIPlatform](https://pypi.org/project/google-cloud-aiplatform/)
|
- [Google-Cloud-AIPlatform](https://pypi.org/project/google-cloud-aiplatform/)
|
||||||
|
- [PyTorch](https://pypi.org/project/torch/)
|
||||||
- [PyTesseract](https://pypi.org/project/pytesseract/)
|
- [PyTesseract](https://pypi.org/project/pytesseract/)
|
||||||
|
- [OpenAI Python API](https://pypi.org/project/openai/)
|
||||||
|
|
||||||
### APIs and Supplementary Tools
|
### APIs and Supplementary Tools
|
||||||
|
|
||||||
- [Replicate API](https://replicate.com/)
|
- [Replicate API](https://replicate.com/)
|
||||||
- [Vertex AI API](https://cloud.google.com/vertex-ai)
|
- [Vertex AI API](https://cloud.google.com/vertex-ai)
|
||||||
|
- [cobanov/image-captioning](https://github.com/cobanov/image-captioning)
|
||||||
- [Tesseract](https://github.com/tesseract-ocr/tesseract)
|
- [Tesseract](https://github.com/tesseract-ocr/tesseract)
|
||||||
|
- [OpenAI API](https://openai.com/blog/openai-api)
|
||||||
|
- [PrivateGPT](https://github.com/zylon-ai/private-gpt)
|
||||||
|
|
||||||
### Packaging/Distribution Tools
|
### Packaging/Distribution Tools
|
||||||
|
|
||||||
|
|
|
@ -5,18 +5,22 @@ import os
|
||||||
from .descengine import DescEngine
|
from .descengine import DescEngine
|
||||||
|
|
||||||
REPLICATE_MODELS = {
|
REPLICATE_MODELS = {
|
||||||
|
"blip-2": "andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",
|
||||||
"blip": "salesforce/blip:2e1dddc8621f72155f24cf2e0adbde548458d3cab9f00c0139eea840d0ac4746",
|
"blip": "salesforce/blip:2e1dddc8621f72155f24cf2e0adbde548458d3cab9f00c0139eea840d0ac4746",
|
||||||
"clip_prefix_caption": "rmokady/clip_prefix_caption:9a34a6339872a03f45236f114321fb51fc7aa8269d38ae0ce5334969981e4cd8",
|
"llava-13b": "yorickvp/llava-13b:b5f6212d032508382d61ff00469ddda3e32fd8a0e75dc39d8a4191bb742157fb",
|
||||||
"clip-caption-reward": "j-min/clip-caption-reward:de37751f75135f7ebbe62548e27d6740d5155dfefdf6447db35c9865253d7e06",
|
|
||||||
"img2prompt": "methexis-inc/img2prompt:50adaf2d3ad20a6f911a8a9e3ccf777b263b8596fbd2c8fc26e8888f8a0edbb5",
|
"img2prompt": "methexis-inc/img2prompt:50adaf2d3ad20a6f911a8a9e3ccf777b263b8596fbd2c8fc26e8888f8a0edbb5",
|
||||||
|
"clip_prefix_caption": "rmokady/clip_prefix_caption:9a34a6339872a03f45236f114321fb51fc7aa8269d38ae0ce5334969981e4cd8",
|
||||||
|
"clip-interrogator": "pharmapsychotic/clip-interrogator:8151e1c9f47e696fa316146a2e35812ccf79cfc9eba05b11c7f450155102af70",
|
||||||
|
"clip-caption-reward": "j-min/clip-caption-reward:de37751f75135f7ebbe62548e27d6740d5155dfefdf6447db35c9865253d7e06",
|
||||||
"minigpt4": "daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423",
|
"minigpt4": "daanelson/minigpt-4:b96a2f33cc8e4b0aa23eacfce731b9c41a7d9466d9ed4e167375587b54db9423",
|
||||||
"image-captioning-with-visual-attention": "nohamoamary/image-captioning-with-visual-attention:9bb60a6baa58801aa7cd4c4fafc95fcf1531bf59b84962aff5a718f4d1f58986",
|
"image-captioning-with-visual-attention": "nohamoamary/image-captioning-with-visual-attention:9bb60a6baa58801aa7cd4c4fafc95fcf1531bf59b84962aff5a718f4d1f58986",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
class ReplicateAPI(DescEngine):
|
class ReplicateAPI(DescEngine):
|
||||||
def __init__(self, key: str, model: str = "blip") -> None:
|
def __init__(self, key: str, modelName: str = "blip") -> None:
|
||||||
self.__setKey(key)
|
self.__setKey(key)
|
||||||
self.__setModel(model)
|
self.__setModel(modelName)
|
||||||
return None
|
return None
|
||||||
|
|
||||||
def __getModel(self) -> str:
|
def __getModel(self) -> str:
|
||||||
|
@ -42,10 +46,18 @@ class ReplicateAPI(DescEngine):
|
||||||
base64_utf8_str = base64.b64encode(imgData).decode("utf-8")
|
base64_utf8_str = base64.b64encode(imgData).decode("utf-8")
|
||||||
model = self.__getModel()
|
model = self.__getModel()
|
||||||
ext = src.split(".")[-1]
|
ext = src.split(".")[-1]
|
||||||
prompt = "Create alternative-text for this image."
|
|
||||||
if context != None:
|
|
||||||
prompt = f"Create alternative-text for this image given the following context...\n{context}"
|
|
||||||
|
|
||||||
dataurl = f"data:image/{ext};base64,{base64_utf8_str}"
|
dataurl = f"data:image/{ext};base64,{base64_utf8_str}"
|
||||||
output = replicate.run(model, input={"image": dataurl, "prompt": prompt})
|
|
||||||
return output
|
input = {"image": dataurl}
|
||||||
|
if self.model == REPLICATE_MODELS["blip-2"]:
|
||||||
|
input["caption"] = True
|
||||||
|
input["question"] = ""
|
||||||
|
if self.model == REPLICATE_MODELS["llava-13b"]:
|
||||||
|
input["prompt"] = "What is this a picture of?"
|
||||||
|
if self.model == REPLICATE_MODELS["minigpt4"]:
|
||||||
|
input["prompt"] = "What is this a picture of?"
|
||||||
|
|
||||||
|
output = replicate.run(model, input=input)
|
||||||
|
if self.model == REPLICATE_MODELS["llava-13b"]:
|
||||||
|
return "".join(output)
|
||||||
|
return output
|
||||||
|
|
|
@ -10,9 +10,13 @@ import keys
|
||||||
|
|
||||||
sys.path.append("../")
|
sys.path.append("../")
|
||||||
from src.alttext.alttext import AltTextHTML
|
from src.alttext.alttext import AltTextHTML
|
||||||
|
from src.alttext.descengine.descengine import DescEngine
|
||||||
from src.alttext.descengine.replicateapi import ReplicateAPI
|
from src.alttext.descengine.replicateapi import ReplicateAPI
|
||||||
|
from src.alttext.descengine.bliplocal import BlipLocal
|
||||||
|
from src.alttext.descengine.googlevertexapi import GoogleVertexAPI
|
||||||
from src.alttext.ocrengine.tesseract import Tesseract
|
from src.alttext.ocrengine.tesseract import Tesseract
|
||||||
from src.alttext.langengine.openaiapi import OpenAIAPI
|
from src.alttext.langengine.openaiapi import OpenAIAPI
|
||||||
|
from src.alttext.langengine.privategpt import PrivateGPT
|
||||||
|
|
||||||
|
|
||||||
class AltTextGenerator(AltTextHTML):
|
class AltTextGenerator(AltTextHTML):
|
||||||
|
@ -29,10 +33,15 @@ class AltTextGenerator(AltTextHTML):
|
||||||
|
|
||||||
# Description generation timing
|
# Description generation timing
|
||||||
# print("starting desc")
|
# print("starting desc")
|
||||||
genDesc_start_time = time.time()
|
genDesc = None
|
||||||
desc = self.genDesc(imgdata, src, context)
|
with open("./results/llava-13b.csv", mode="r") as csvfile:
|
||||||
genDesc_end_time = time.time()
|
reader = csv.DictReader(csvfile)
|
||||||
genDesc_total_time = genDesc_end_time - genDesc_start_time
|
for row in reader:
|
||||||
|
if row["book"] == book_id and row["image"] == src:
|
||||||
|
genDesc = row["genDesc"]
|
||||||
|
break
|
||||||
|
if genDesc == None:
|
||||||
|
raise Exception("Description not found in llava-13b.csv")
|
||||||
|
|
||||||
# OCR processing timing
|
# OCR processing timing
|
||||||
# print("starting ocr")
|
# print("starting ocr")
|
||||||
|
@ -44,7 +53,11 @@ class AltTextGenerator(AltTextHTML):
|
||||||
# Refinement processing timing
|
# Refinement processing timing
|
||||||
# print("starting refinement")
|
# print("starting refinement")
|
||||||
refine_start_time = time.time()
|
refine_start_time = time.time()
|
||||||
refined_desc = self.langEngine.refineAlt(desc, chars, context, None)
|
if context[0] is not None:
|
||||||
|
context[0] = context[0][:1000]
|
||||||
|
if context[1] is not None:
|
||||||
|
context[1] = context[1][:1000]
|
||||||
|
refined_desc = self.langEngine.refineAlt(genDesc, chars[:1000], context, None)
|
||||||
refine_end_time = time.time()
|
refine_end_time = time.time()
|
||||||
refine_total_time = refine_end_time - refine_start_time
|
refine_total_time = refine_end_time - refine_start_time
|
||||||
|
|
||||||
|
@ -60,10 +73,7 @@ class AltTextGenerator(AltTextHTML):
|
||||||
"status": status, # Set false if failed, set true is worked
|
"status": status, # Set false if failed, set true is worked
|
||||||
"beforeContext": context[0],
|
"beforeContext": context[0],
|
||||||
"afterContext": context[1],
|
"afterContext": context[1],
|
||||||
"genDesc": desc,
|
"genDesc": genDesc,
|
||||||
"genDesc-Start": genDesc_start_time,
|
|
||||||
"genDesc-End": genDesc_end_time,
|
|
||||||
"genDesc-Time": genDesc_total_time,
|
|
||||||
"genOCR": chars,
|
"genOCR": chars,
|
||||||
"genOCR-Start": ocr_start_time,
|
"genOCR-Start": ocr_start_time,
|
||||||
"genOCR-End": ocr_end_time,
|
"genOCR-End": ocr_end_time,
|
||||||
|
@ -95,11 +105,14 @@ def benchmarkBooks(booksDir: str, srcsDir: str):
|
||||||
generator = AltTextGenerator(
|
generator = AltTextGenerator(
|
||||||
ReplicateAPI(keys.ReplicateEricKey()),
|
ReplicateAPI(keys.ReplicateEricKey()),
|
||||||
Tesseract(),
|
Tesseract(),
|
||||||
OpenAIAPI(keys.OpenAIKey(), "gpt-3.5-turbo"),
|
# OpenAIAPI(keys.OpenAIKey(), "gpt-4-0125-preview"),
|
||||||
|
PrivateGPT("http://127.0.0.1:8001"),
|
||||||
)
|
)
|
||||||
|
|
||||||
records = []
|
records = []
|
||||||
for bookId in os.listdir(booksDir):
|
for bookId in os.listdir(srcsDir):
|
||||||
|
bookId = bookId.split("_")[1].split(".")[0]
|
||||||
|
time.sleep(1)
|
||||||
try:
|
try:
|
||||||
bookPath = os.path.join(booksDir, bookId)
|
bookPath = os.path.join(booksDir, bookId)
|
||||||
|
|
||||||
|
@ -120,13 +133,77 @@ def benchmarkBooks(booksDir: str, srcsDir: str):
|
||||||
record = generator.genAltTextV2(src, bookId, src, bookPath)
|
record = generator.genAltTextV2(src, bookId, src, bookPath)
|
||||||
records.append(record)
|
records.append(record)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error processing image {src} in book {bookId}: {e}")
|
print(f"ERROR processing image {bookId} | {src}: {e}")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error processing book {bookId}: {e}")
|
print(f"ERROR processing book {bookId}: {e}")
|
||||||
|
|
||||||
generateCSV("test_benchmark.csv", records)
|
generateCSV("private-gpt.csv", records)
|
||||||
|
|
||||||
|
|
||||||
|
def benchmarkDescEngine(
|
||||||
|
descEngine: DescEngine, booksDir: str, srcsDir: str, outputFilename: str
|
||||||
|
):
|
||||||
|
generator = AltTextHTML(descEngine)
|
||||||
|
|
||||||
|
records = []
|
||||||
|
for bookId in os.listdir(srcsDir):
|
||||||
|
bookId = bookId.split("_")[1].split(".")[0]
|
||||||
|
try:
|
||||||
|
print("STARTING BOOK ID: ", bookId)
|
||||||
|
bookPath = os.path.join(booksDir, bookId)
|
||||||
|
|
||||||
|
htmlpath = None
|
||||||
|
for object in os.listdir(bookPath):
|
||||||
|
if object.endswith(".html"):
|
||||||
|
htmlpath = os.path.join(bookPath, object)
|
||||||
|
break
|
||||||
|
generator.parseFile(htmlpath)
|
||||||
|
|
||||||
|
srcs = []
|
||||||
|
with open(f"{srcsDir}/ebook_{bookId}.txt", "r") as file:
|
||||||
|
for line in file:
|
||||||
|
srcs.append(line.split(f"{bookId}/")[1].strip())
|
||||||
|
|
||||||
|
for src in srcs:
|
||||||
|
time.sleep(8)
|
||||||
|
try:
|
||||||
|
print("STARTING IMAGE: ", src)
|
||||||
|
context = generator.getContext(generator.getImg(src))
|
||||||
|
genDesc_start_time = time.time()
|
||||||
|
desc = generator.genDesc(generator.getImgData(src), src, context)
|
||||||
|
print(f"TEST: {desc}")
|
||||||
|
genDesc_end_time = time.time()
|
||||||
|
genDesc_total_time = genDesc_end_time - genDesc_start_time
|
||||||
|
record = {
|
||||||
|
"book": bookId,
|
||||||
|
"image": src,
|
||||||
|
"path": bookPath,
|
||||||
|
# "beforeContext": context[0],
|
||||||
|
# "afterContext": context[1],
|
||||||
|
"genDesc": desc.replace('"', "'"),
|
||||||
|
"genDesc-Start": genDesc_start_time,
|
||||||
|
"genDesc-End": genDesc_end_time,
|
||||||
|
"genDesc-Time": genDesc_total_time,
|
||||||
|
}
|
||||||
|
records.append(record)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"ERROR processing image {bookId} | {src}: {e}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"ERROR processing book {bookId}: {e}")
|
||||||
|
|
||||||
|
generateCSV(outputFilename, records)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
print("RUNNING AUTOMATE.PY")
|
print("RUNNING AUTOMATE.PY")
|
||||||
benchmarkBooks("./downloaded_books", "./book_outputs")
|
benchmarkBooks("./downloaded_books", "./book_outputs")
|
||||||
|
# benchmarkDescEngine(
|
||||||
|
# ReplicateAPI(
|
||||||
|
# keys.ReplicateEricKey(), modelName="image-captioning-with-visual-attention"
|
||||||
|
# ),
|
||||||
|
# BlipLocal("C:/Users/dacru/Desktop/ALT/image-captioning"),
|
||||||
|
# GoogleVertexAPI(keys.VertexProject(), keys.VertexRegion(), keys.VertexGAC()),
|
||||||
|
# "./downloaded_books",
|
||||||
|
# "./book_outputs2",
|
||||||
|
# "vertexai.csv",
|
||||||
|
# )
|
||||||
|
|
|
@ -0,0 +1,83 @@
|
||||||
|
import random
|
||||||
|
import requests
|
||||||
|
import bs4
|
||||||
|
import time
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
def extractImage(imgs: list[bs4.element.Tag]) -> list[bs4.element.Tag]:
|
||||||
|
if len(imgs) == 0:
|
||||||
|
return None
|
||||||
|
index = random.randint(0, len(imgs) - 1)
|
||||||
|
img = imgs[index]
|
||||||
|
if img.has_attr("alt") and img.attrs["alt"].strip() != "":
|
||||||
|
return img
|
||||||
|
return extractImage(imgs[:index] + imgs[index + 1 :])
|
||||||
|
|
||||||
|
|
||||||
|
def collect(
|
||||||
|
num: int, image_output: str = "images.txt", alt_output: str = "alts.txt"
|
||||||
|
) -> int:
|
||||||
|
"""
|
||||||
|
Collect images with alt-text from random ebooks
|
||||||
|
|
||||||
|
Args:
|
||||||
|
num (int): Number of images to collect.
|
||||||
|
image_output (str, optional): Path to output image URLs. Defaults to "images.txt".
|
||||||
|
alt_output (str, optional): Path to output alt-text. Defaults to "alts.txt".
|
||||||
|
"""
|
||||||
|
count = 0
|
||||||
|
while count < num:
|
||||||
|
time.sleep(0.5)
|
||||||
|
bookid = random.randint(1, 70000)
|
||||||
|
bookurl = f"https://gutenberg.org/cache/epub/{bookid}/pg{bookid}-images.html"
|
||||||
|
|
||||||
|
response = requests.get(bookurl)
|
||||||
|
if response.status_code != 200:
|
||||||
|
print(f"Failed to fetch book {bookid}.")
|
||||||
|
continue
|
||||||
|
|
||||||
|
soup = bs4.BeautifulSoup(response.text, "html.parser")
|
||||||
|
div = soup.find("div", id="pg-machine-header")
|
||||||
|
if not div:
|
||||||
|
print(f"No 'pg-machine-header' found in book {bookid}.")
|
||||||
|
continue
|
||||||
|
|
||||||
|
languageP = div.find_all(recursive=False)[3]
|
||||||
|
if languageP.text.strip() != "Language: English":
|
||||||
|
print(f"Book {bookid} is not in English.")
|
||||||
|
continue
|
||||||
|
|
||||||
|
imgs: list[bs4.element.Tag] = soup.find_all("img")
|
||||||
|
img = extractImage(imgs)
|
||||||
|
if img is None:
|
||||||
|
print(
|
||||||
|
f"Out of {len(imgs)} images, no images with alt-text found in book {bookid}."
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
with open(image_output, "a") as imagefile:
|
||||||
|
imagefile.write(f"{bookid} cache/epub/{bookid}/{img['src']}\n")
|
||||||
|
with open(alt_output, "a") as altfile:
|
||||||
|
altfile.write(f"{img['alt'].encode('ascii', 'ignore').decode()}\n")
|
||||||
|
|
||||||
|
count += 1
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def split(input_file, book_output, image_output):
|
||||||
|
with open(input_file, "r") as file:
|
||||||
|
for line in file:
|
||||||
|
book_number = line.split()[0] # Extracting book number
|
||||||
|
image = line.split()[1] # Extracting image
|
||||||
|
|
||||||
|
with open(book_output, "a") as output_file:
|
||||||
|
output_file.write(f"{book_number}\n")
|
||||||
|
with open(image_output, "a") as output_file:
|
||||||
|
output_file.write(f"{image}\n")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# collect(150)
|
||||||
|
split("images.txt", "books.txt", "images2.txt")
|
|
@ -10,7 +10,7 @@ download_folder = "downloaded_books/download_files"
|
||||||
extraction_folder = "downloaded_books"
|
extraction_folder = "downloaded_books"
|
||||||
|
|
||||||
|
|
||||||
def download_and_unzip_books(folder_path, download_folder, extraction_folder):
|
def downloadAndUnzipBooks(folder_path, download_folder, extraction_folder):
|
||||||
base_url = "https://www.gutenberg.org/cache/epub/{book_id}/pg{book_id}-h.zip"
|
base_url = "https://www.gutenberg.org/cache/epub/{book_id}/pg{book_id}-h.zip"
|
||||||
|
|
||||||
# Ensure the download and extraction folders exist
|
# Ensure the download and extraction folders exist
|
||||||
|
@ -68,4 +68,5 @@ def download_and_unzip_books(folder_path, download_folder, extraction_folder):
|
||||||
print(f"No book ID found in {filename}")
|
print(f"No book ID found in {filename}")
|
||||||
|
|
||||||
|
|
||||||
download_and_unzip_books(folder_path, download_folder, extraction_folder)
|
if __name__ == "__main__":
|
||||||
|
downloadAndUnzipBooks(folder_path, download_folder, extraction_folder)
|
||||||
|
|
|
@ -4,31 +4,23 @@
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
|
||||||
input_file = "./empty_alt_text_sample.txt" # The file path of whatever initial .txt you are working with
|
input_file = "./images.txt"
|
||||||
output_folder = "./book_outputs"
|
output_folder = "./book_outputs"
|
||||||
|
|
||||||
|
|
||||||
def createIndividualBookFiles(input_file, output_folder):
|
def splitSampleByBook(input_file, output_folder):
|
||||||
# Ensure the output folder exists
|
|
||||||
if not os.path.exists(output_folder):
|
if not os.path.exists(output_folder):
|
||||||
os.makedirs(output_folder)
|
os.makedirs(output_folder)
|
||||||
|
|
||||||
# Keep track of the last book number processed
|
|
||||||
last_book_number = None
|
|
||||||
|
|
||||||
with open(input_file, "r") as file:
|
with open(input_file, "r") as file:
|
||||||
for line in file:
|
for line in file:
|
||||||
book_number = line.split()[0] # Extracting book number
|
book_number = line.split()[0] # Extracting book number
|
||||||
# Check if this line is for a new book
|
output_file_name = f"ebook_{book_number}.txt"
|
||||||
if book_number != last_book_number:
|
output_path = os.path.join(output_folder, output_file_name)
|
||||||
output_file_name = f"ebook_{book_number}.txt"
|
|
||||||
output_path = os.path.join(output_folder, output_file_name)
|
|
||||||
# print(f"Creating/Updating file for book {book_number}")
|
|
||||||
last_book_number = book_number
|
|
||||||
|
|
||||||
# Append to the file (creates a new file if it doesn't exist)
|
|
||||||
with open(output_path, "a") as output_file:
|
with open(output_path, "a") as output_file:
|
||||||
output_file.write(line)
|
output_file.write(line)
|
||||||
|
|
||||||
|
|
||||||
createIndividualBookFiles(input_file, output_folder)
|
if __name__ == "__main__":
|
||||||
|
splitSampleByBook(input_file, output_folder)
|
||||||
|
|
|
@ -10,6 +10,7 @@ from src.alttext.langengine.openaiapi import OpenAIAPI
|
||||||
import keys
|
import keys
|
||||||
|
|
||||||
# HTML BOOK FILEPATHS
|
# HTML BOOK FILEPATHS
|
||||||
|
HTML_ADVENTURES = "../books/pg76-h/pg76-images.html"
|
||||||
HTML_BIRD = "../books/pg30221-h/pg30221-images.html"
|
HTML_BIRD = "../books/pg30221-h/pg30221-images.html"
|
||||||
HTML_HUNTING = "../books/pg37122-h/pg37122-images.html"
|
HTML_HUNTING = "../books/pg37122-h/pg37122-images.html"
|
||||||
HTML_MECHANIC = "../books/pg71856-h/pg71856-images.html"
|
HTML_MECHANIC = "../books/pg71856-h/pg71856-images.html"
|
||||||
|
@ -33,11 +34,20 @@ def testHTML():
|
||||||
OpenAIAPI(keys.OpenAIKey(), "gpt-3.5-turbo"),
|
OpenAIAPI(keys.OpenAIKey(), "gpt-3.5-turbo"),
|
||||||
)
|
)
|
||||||
|
|
||||||
alt.parseFile(HTML_HUNTING)
|
# imgs = alt.getAllImgs()
|
||||||
imgs = alt.getAllImgs()
|
|
||||||
src = imgs[7].attrs["src"]
|
alt.parseFile(HTML_ADVENTURES)
|
||||||
print(src)
|
img = alt.getImg("images/c01-21.jpg")
|
||||||
print(alt.genAltText(src))
|
src = img.attrs["src"]
|
||||||
|
imgData = alt.getImgData(src)
|
||||||
|
chars = alt.genChars(imgData, src)
|
||||||
|
desc = alt.genDesc(imgData, src, alt.getContext(img))
|
||||||
|
altText = alt.genAltText(src)
|
||||||
|
print(chars)
|
||||||
|
print("=====================================")
|
||||||
|
print(desc)
|
||||||
|
print("=====================================")
|
||||||
|
print(altText)
|
||||||
|
|
||||||
# desc = alt.genDesc(alt.getImgData(src), src)
|
# desc = alt.genDesc(alt.getImgData(src), src)
|
||||||
# print(desc)
|
# print(desc)
|
||||||
|
|
Loading…
Reference in New Issue