Add OCR support for SiteReview CAPTCHA using tesseract

Add support for input file list of potential domains
Add additional error checking for ExpiredDomains.net parsing
Change -q/--query switch to -k/--keyword to better match its purpose
master
Andrew Chiles 2018-04-11 14:46:15 +02:00
parent 3b6078e9fd
commit 26dd64870d
4 changed files with 181 additions and 86 deletions

4
.gitignore vendored
View File

@ -1 +1,3 @@
*.html *.html
*.txt
*.jpg

View File

@ -4,38 +4,52 @@ Authors Joe Vest (@joevest) & Andrew Chiles (@andrewchiles)
Domain name selection is an important aspect of preparation for penetration tests and especially Red Team engagements. Commonly, domains that were used previously for benign purposes and were properly categorized can be purchased for only a few dollars. Such domains can allow a team to bypass reputation based web filters and network egress restrictions for phishing and C2 related tasks. Domain name selection is an important aspect of preparation for penetration tests and especially Red Team engagements. Commonly, domains that were used previously for benign purposes and were properly categorized can be purchased for only a few dollars. Such domains can allow a team to bypass reputation based web filters and network egress restrictions for phishing and C2 related tasks.
This Python based tool was written to quickly query the Expireddomains.net search engine for expired/available domains with a previous history of use. It then optionally queries for domain reputation against services like Symantec Web Filter (BlueCoat), IBM X-Force, and Cisco Talos. The primary tool output is a timestamped HTML table style report. This Python based tool was written to quickly query the Expireddomains.net search engine for expired/available domains with a previous history of use. It then optionally queries for domain reputation against services like Symantec WebPulse (BlueCoat), IBM X-Force, and Cisco Talos. The primary tool output is a timestamped HTML table style report.
## Changes ## Changes
- 9 April 2018
+ Added -t switch for timing control. -t <1-5>
+ Added Google SafeBrowsing and PhishTank reputation checks
+ Fixed bug in IBMXForce response parsing
- 7 April 2018
+ Fixed support for Symantec WebPulse Site Review (formerly Blue Coat WebFilter)
+ Added Cisco Talos Domain Reputation check
+ Added feature to perform a reputation check against a single non-expired domain. This is useful when monitoring reputation for domains used in ongoing campaigns and engagements.
- 6 June 2017 - 11 April 2018
+ Added python 3 support + Added OCR support for CAPTCHA solving with tesseract. Thanks to t94j0 for the idea in [AIRMASTER](https://github.com/t94j0/AIRMASTER)
+ Code cleanup and bug fixes + Added support for input file list of potential domains (-f/--filename)
+ Added Status column (Available, Make Offer, Price, Backorder, etc) + Changed -q/--query switch to -k/--keyword to better match its purpose
+ Added additional error checking for ExpiredDomains.net parsing
- 9 April 2018
+ Added -t switch for timing control. -t <1-5>
+ Added Google SafeBrowsing and PhishTank reputation checks
+ Fixed bug in IBMXForce response parsing
- 7 April 2018
+ Fixed support for Symantec WebPulse Site Review (formerly Blue Coat WebFilter)
+ Added Cisco Talos Domain Reputation check
+ Added feature to perform a reputation check against a single non-expired domain. This is useful when monitoring reputation for domains used in ongoing campaigns and engagements.
- 6 June 2017
+ Added python 3 support
+ Code cleanup and bug fixes
+ Added Status column (Available, Make Offer, Price, Backorder, etc)
## Features ## Features
- Retrieve specified number of recently expired and deleted domains (.com, .net, .org primarily) from ExpiredDomains.net - Retrieve specified number of recently expired and deleted domains (.com, .net, .org primarily) from ExpiredDomains.net
- Retrieve available domains based on keyword search from ExpiredDomains.net - Retrieve available domains based on keyword search from ExpiredDomains.net
- Perform reputation checks against the Symantec Web Filter (BlueCoat), IBM x-Force, Cisco Talos, Google SafeBrowsing, and PhishTank services - Perform reputation checks against the Symantec WebPulse Site Review (BlueCoat), IBM x-Force, Cisco Talos, Google SafeBrowsing, and PhishTank services
- Sort results by domain age (if known) - Sort results by domain age (if known)
- Text-based table and HTML report output with links to reputation sources and Archive.org entry - Text-based table and HTML report output with links to reputation sources and Archive.org entry
## Usage ## Installation
Install Requirements Install Python requirements
pip3 install -r requirements.txt pip3 install -r requirements.txt
or
pip3 install requests texttable beautifulsoup4 lxml Optional - Install additional OCR support dependencies
- Debian/Ubuntu: `apt-get install tesseract-ocr python3-imaging`
- MAC OSX: `brew install tesseract`
## Usage
List DomainHunter options List DomainHunter options
@ -48,9 +62,13 @@ List DomainHunter options
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-q QUERY, --query QUERY -k KEYWORD, --keyword KEYWORD
Keyword used to refine search results Keyword used to refine search results
-c, --check Perform domain reputation checks -c, --check Perform domain reputation checks
-f FILENAME, --filename FILENAME
Specify input file of line delimited domain names to
check
--ocr Perform OCR on CAPTCHAs when present
-r MAXRESULTS, --maxresults MAXRESULTS -r MAXRESULTS, --maxresults MAXRESULTS
Number of results to return when querying latest Number of results to return when querying latest
expired/deleted domains expired/deleted domains
@ -63,7 +81,7 @@ List DomainHunter options
Fastest(5) = no delay Fastest(5) = no delay
-w MAXWIDTH, --maxwidth MAXWIDTH -w MAXWIDTH, --maxwidth MAXWIDTH
Width of text table Width of text table
-v, --version show program's version number and exit -V, --version show program's version number and exit
Use defaults to check for most recent 100 domains and check reputation Use defaults to check for most recent 100 domains and check reputation
@ -89,9 +107,13 @@ Perform all reputation checks for a single domain
[*] Cisco Talos: mydomain.com [*] Cisco Talos: mydomain.com
[+] mydomain.com: Web Hosting (Score: Neutral) [+] mydomain.com: Web Hosting (Score: Neutral)
Search for available domains with search term of "dog", max results of 100, and check reputation Perform all reputation checks for a list of domains at max speed with OCR of CAPTCHAs
python3 ./domainhunter.py -f <domainslist.txt> -t 5 --ocr
Search for available domains with keyword term of "dog", max results of 100, and check reputation
python3 ./domainhunter.py -q dog -r 100 -c python3 ./domainhunter.py -k dog -r 100 -c
____ ___ __ __ _ ___ _ _ _ _ _ _ _ _ _____ _____ ____ ____ ___ __ __ _ ___ _ _ _ _ _ _ _ _ _____ _____ ____
| _ \ / _ \| \/ | / \ |_ _| \ | | | | | | | | | \ | |_ _| ____| _ \ | _ \ / _ \| \/ | / \ |_ _| \ | | | | | | | | | \ | |_ _| ____| _ \
| | | | | | | |\/| | / _ \ | || \| | | |_| | | | | \| | | | | _| | |_) | | | | | | | | |\/| | / _ \ | || \| | | |_| | | | | \| | | | | _| | |_) |

View File

@ -4,17 +4,17 @@
## Author: @joevest and @andrewchiles ## Author: @joevest and @andrewchiles
## Description: Checks expired domains, reputation/categorization, and Archive.org history to determine ## Description: Checks expired domains, reputation/categorization, and Archive.org history to determine
## good candidates for phishing and C2 domain names ## good candidates for phishing and C2 domain names
# Add OCR support for BlueCoat/SiteReview CAPTCHA using tesseract
# To-do: # Add support for input file list of potential domains
# Code cleanup/optimization # Add additional error checking for ExpiredDomains.net parsing
# Add Authenticated "Members-Only" option to download CSV/txt (https://member.expireddomains.net/domains/expiredcom/) # Changed -q/--query switch to -k/--keyword to better match its purpose
import time import time
import random import random
import argparse import argparse
import json import json
import base64
__version__ = "20180409" __version__ = "20180411"
## Functions ## Functions
@ -29,9 +29,7 @@ def doSleep(timing):
time.sleep(random.randrange(10,20)) time.sleep(random.randrange(10,20))
elif timing == 4: elif timing == 4:
time.sleep(random.randrange(5,10)) time.sleep(random.randrange(5,10))
else: # There's no elif timing == 5 here because we don't want to sleep for -t 5
# Maxiumum speed - no delay
pass
def checkBluecoat(domain): def checkBluecoat(domain):
try: try:
@ -43,7 +41,6 @@ def checkBluecoat(domain):
print('[*] BlueCoat: {}'.format(domain)) print('[*] BlueCoat: {}'.format(domain))
response = s.post(url,headers=headers,json=postData,verify=False) response = s.post(url,headers=headers,json=postData,verify=False)
responseJSON = json.loads(response.text) responseJSON = json.loads(response.text)
if 'errorType' in responseJSON: if 'errorType' in responseJSON:
@ -51,17 +48,46 @@ def checkBluecoat(domain):
else: else:
a = responseJSON['categorization'][0]['name'] a = responseJSON['categorization'][0]['name']
# # Print notice if CAPTCHAs are blocking accurate results # Print notice if CAPTCHAs are blocking accurate results and attempt to solve if --ocr
# if a == 'captcha': if a == 'captcha':
# print('[-] Error: Blue Coat CAPTCHA received. Change your IP or manually solve a CAPTCHA at "https://sitereview.bluecoat.com/sitereview.jsp"') if ocr:
# #raw_input('[*] Press Enter to continue...') # This request is performed in a browser, but is not needed for our purposes
#captcharequestURL = 'https://sitereview.bluecoat.com/resource/captcha-request'
#print('[*] Requesting CAPTCHA')
#response = s.get(url=captcharequestURL,headers=headers,cookies=cookies,verify=False)
print('[*] Received CAPTCHA challenge!')
captcha = solveCaptcha('https://sitereview.bluecoat.com/resource/captcha.jpg',s)
if captcha:
b64captcha = base64.b64encode(captcha.encode('utf-8')).decode('utf-8')
# Send CAPTCHA solution via GET since inclusion with the domain categorization request doens't work anymore
captchasolutionURL = 'https://sitereview.bluecoat.com/resource/captcha-request/{0}'.format(b64captcha)
print('[*] Submiting CAPTCHA at {0}'.format(captchasolutionURL))
response = s.get(url=captchasolutionURL,headers=headers,verify=False)
# Try the categorization request again
response = s.post(url,headers=headers,json=postData,verify=False)
responseJSON = json.loads(response.text)
if 'errorType' in responseJSON:
a = responseJSON['errorType']
else:
a = responseJSON['categorization'][0]['name']
else:
print('[-] Error: Failed to solve BlueCoat CAPTCHA with OCR! Manually solve at "https://sitereview.bluecoat.com/sitereview.jsp"')
else:
print('[-] Error: BlueCoat CAPTCHA received. Try --ocr flag or manually solve a CAPTCHA at "https://sitereview.bluecoat.com/sitereview.jsp"')
return a return a
except:
print('[-] Error retrieving Bluecoat reputation!') except Exception as e:
print('[-] Error retrieving Bluecoat reputation! {0}'.format(e))
return "-" return "-"
def checkIBMxForce(domain): def checkIBMXForce(domain):
try: try:
url = 'https://exchange.xforce.ibmcloud.com/url/{}'.format(domain) url = 'https://exchange.xforce.ibmcloud.com/url/{}'.format(domain)
headers = {'User-Agent':useragent, headers = {'User-Agent':useragent,
@ -184,7 +210,7 @@ def checkMXToolbox(domain):
def downloadMalwareDomains(malwaredomainsURL): def downloadMalwareDomains(malwaredomainsURL):
url = malwaredomainsURL url = malwaredomainsURL
response = s.get(url,headers=headers,verify=False) response = s.get(url=url,headers=headers,verify=False)
responseText = response.text responseText = response.text
if response.status_code == 200: if response.status_code == 200:
return responseText return responseText
@ -203,7 +229,7 @@ def checkDomain(domain):
bluecoat = checkBluecoat(domain) bluecoat = checkBluecoat(domain)
print("[+] {}: {}".format(domain, bluecoat)) print("[+] {}: {}".format(domain, bluecoat))
ibmxforce = checkIBMxForce(domain) ibmxforce = checkIBMXForce(domain)
print("[+] {}: {}".format(domain, ibmxforce)) print("[+] {}: {}".format(domain, ibmxforce))
ciscotalos = checkTalos(domain) ciscotalos = checkTalos(domain)
@ -211,9 +237,43 @@ def checkDomain(domain):
print("") print("")
return return
def solveCaptcha(url,session):
# Downloads CAPTCHA image and saves to current directory for OCR with tesseract
# Returns CAPTCHA string or False if error occured
jpeg = 'captcha.jpg'
try:
response = session.get(url=url,headers=headers,verify=False, stream=True)
if response.status_code == 200:
with open(jpeg, 'wb') as f:
response.raw.decode_content = True
shutil.copyfileobj(response.raw, f)
else:
print('[-] Error downloading CAPTCHA file!')
return False
text = pytesseract.image_to_string(Image.open(jpeg))
text = text.replace(" ", "")
return text
except Exception as e:
print("[-] Error solving CAPTCHA - {0}".format(e))
return False
## MAIN ## MAIN
if __name__ == "__main__": if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Finds expired domains, domain categorization, and Archive.org history to determine good candidates for C2 and phishing domains')
parser.add_argument('-k','--keyword', help='Keyword used to refine search results', required=False, default=False, type=str, dest='keyword')
parser.add_argument('-c','--check', help='Perform domain reputation checks', required=False, default=False, action='store_true', dest='check')
parser.add_argument('-f','--filename', help='Specify input file of line delimited domain names to check', required=False, default=False, type=str, dest='filename')
parser.add_argument('--ocr', help='Perform OCR on CAPTCHAs when present', required=False, default=False, action='store_true')
parser.add_argument('-r','--maxresults', help='Number of results to return when querying latest expired/deleted domains', required=False, default=100, type=int, dest='maxresults')
parser.add_argument('-s','--single', help='Performs detailed reputation checks against a single domain name/IP.', required=False, default=False, dest='single')
parser.add_argument('-t','--timing', help='Modifies request timing to avoid CAPTCHAs. Slowest(0) = 90-120 seconds, Default(3) = 10-20 seconds, Fastest(5) = no delay', required=False, default=3, type=int, choices=range(0,6), dest='timing')
parser.add_argument('-w','--maxwidth', help='Width of text table', required=False, default=400, type=int, dest='maxwidth')
parser.add_argument('-V','--version', action='version',version='%(prog)s {version}'.format(version=__version__))
args = parser.parse_args()
# Load dependent modules
try: try:
import requests import requests
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
@ -221,24 +281,30 @@ if __name__ == "__main__":
except Exception as e: except Exception as e:
print("Expired Domains Reputation Check") print("Expired Domains Reputation Check")
print("[-] Missing dependencies: {}".format(str(e))) print("[-] Missing basic dependencies: {}".format(str(e)))
print("[*] Install required dependencies by running `pip install -r requirements.txt`") print("[*] Install required dependencies by running `pip3 install -r requirements.txt`")
quit(0) quit(0)
parser = argparse.ArgumentParser(description='Finds expired domains, domain categorization, and Archive.org history to determine good candidates for C2 and phishing domains') # Load OCR related modules if --ocr flag is set since these can be difficult to get working
parser.add_argument('-q','--query', help='Keyword used to refine search results', required=False, default=False, type=str, dest='query') if args.ocr:
parser.add_argument('-c','--check', help='Perform domain reputation checks', required=False, default=False, action='store_true', dest='check') try:
parser.add_argument('-r','--maxresults', help='Number of results to return when querying latest expired/deleted domains', required=False, default=100, type=int, dest='maxresults') import pytesseract
parser.add_argument('-s','--single', help='Performs detailed reputation checks against a single domain name/IP.', required=False, default=False, dest='single') from PIL import Image
parser.add_argument('-t','--timing', help='Modifies request timing to avoid CAPTCHAs. Slowest(0) = 90-120 seconds, Default(3) = 10-20 seconds, Fastest(5) = no delay', required=False, default=3, type=int, choices=range(0,6), dest='timing') import shutil
parser.add_argument('-w','--maxwidth', help='Width of text table', required=False, default=400, type=int, dest='maxwidth') except Exception as e:
parser.add_argument('-v','--version', action='version',version='%(prog)s {version}'.format(version=__version__)) print("Expired Domains Reputation Check")
args = parser.parse_args() print("[-] Missing OCR dependencies: {}".format(str(e)))
print("[*] Install required Python dependencies by running `pip3 install -r requirements.txt`")
print("[*] Ubuntu\Debian - Install tesseract by running `apt-get install tesseract-ocr python3-imaging`")
print("[*] MAC OSX - Install tesseract with homebrew by running `brew install tesseract`")
quit(0)
## Variables ## Variables
query = args.query keyword = args.keyword
check = args.check check = args.check
filename = args.filename
maxresults = args.maxresults maxresults = args.maxresults
@ -251,6 +317,8 @@ if __name__ == "__main__":
malwaredomainsURL = 'http://mirror1.malwaredomains.com/files/justdomains' malwaredomainsURL = 'http://mirror1.malwaredomains.com/files/justdomains'
expireddomainsqueryURL = 'https://www.expireddomains.net/domain-name-search' expireddomainsqueryURL = 'https://www.expireddomains.net/domain-name-search'
ocr = args.ocr
timestamp = time.strftime("%Y%m%d_%H%M%S") timestamp = time.strftime("%Y%m%d_%H%M%S")
useragent = 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)' useragent = 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)'
@ -291,26 +359,21 @@ If you plan to use this content for illegal purpose, don't. Have a nice day :)'
checkDomain(single) checkDomain(single)
quit(0) quit(0)
# Calculate estimated runtime based on timing variable if checking domain categorization for all returned domains # Perform detailed domain reputation checks against input file
if check: if filename:
if timing == 0: try:
seconds = 90 with open(filename, 'r') as domainsList:
elif timing == 1: for line in domainsList.read().splitlines():
seconds = 60 checkDomain(line)
elif timing == 2: doSleep(timing)
seconds = 30 except KeyboardInterrupt:
elif timing == 3: print('Caught keyboard interrupt. Exiting!')
seconds = 20 quit(0)
elif timing == 4: except Exception as e:
seconds = 10 print('[-] {}'.format(e))
else: quit(1)
seconds = 0 quit(0)
runtime = (maxresults * seconds) / 60
print("[*] Peforming Domain Categorization Lookups:")
print("[*] Estimated duration is {} minutes. Modify lookup speed with -t switch.\n".format(int(runtime)))
else:
pass
# Generic Proxy support # Generic Proxy support
# TODO: add as a parameter # TODO: add as a parameter
proxies = { proxies = {
@ -328,15 +391,15 @@ If you plan to use this content for illegal purpose, don't. Have a nice day :)'
urls = [] urls = []
# Use the keyword string to narrow domain search if provided # Use the keyword string to narrow domain search if provided
if query: if keyword:
print('[*] Fetching expired or deleted domains containing "{}"'.format(query)) print('[*] Fetching expired or deleted domains containing "{}"'.format(keyword))
for i in range (0,maxresults,25): for i in range (0,maxresults,25):
if i == 0: if i == 0:
urls.append("{}/?q={}".format(expireddomainsqueryURL,query)) urls.append("{}/?q={}".format(expireddomainsqueryURL,keyword))
headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?q={}&start=1'.format(query) headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?q={}&start=1'.format(keyword)
else: else:
urls.append("{}/?start={}&q={}".format(expireddomainsqueryURL,i,query)) urls.append("{}/?start={}&q={}".format(expireddomainsqueryURL,i,keyword))
headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?start={}&q={}'.format((i-25),query) headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?start={}&q={}'.format((i-25),keyword)
# If no keyword provided, retrieve list of recently expired domains in batches of 25 results. # If no keyword provided, retrieve list of recently expired domains in batches of 25 results.
else: else:
@ -375,9 +438,9 @@ If you plan to use this content for illegal purpose, don't. Have a nice day :)'
# Turn the HTML into a Beautiful Soup object # Turn the HTML into a Beautiful Soup object
soup = BeautifulSoup(domains, 'lxml') soup = BeautifulSoup(domains, 'lxml')
table = soup.find("table")
try: try:
table = soup.find("table")
for row in table.findAll('tr')[1:]: for row in table.findAll('tr')[1:]:
# Alternative way to extract domain name # Alternative way to extract domain name
@ -388,7 +451,7 @@ If you plan to use this content for illegal purpose, don't. Have a nice day :)'
if len(cells) >= 1: if len(cells) >= 1:
output = "" output = ""
if query: if keyword:
c0 = row.find('td').find('a').text # domain c0 = row.find('td').find('a').text # domain
c1 = cells[1].find(text=True) # bl c1 = cells[1].find(text=True) # bl
@ -460,7 +523,7 @@ If you plan to use this content for illegal purpose, don't. Have a nice day :)'
elif check == True: elif check == True:
bluecoat = checkBluecoat(c0) bluecoat = checkBluecoat(c0)
print("[+] {}: {}".format(c0, bluecoat)) print("[+] {}: {}".format(c0, bluecoat))
ibmxforce = checkIBMxForce(c0) ibmxforce = checkIBMXForce(c0)
print("[+] {}: {}".format(c0, ibmxforce)) print("[+] {}: {}".format(c0, ibmxforce))
# Sleep to avoid captchas # Sleep to avoid captchas
doSleep(timing) doSleep(timing)
@ -470,8 +533,14 @@ If you plan to use this content for illegal purpose, don't. Have a nice day :)'
# Append parsed domain data to list # Append parsed domain data to list
data.append([c0,c3,c4,available,status,bluecoat,ibmxforce]) data.append([c0,c3,c4,available,status,bluecoat,ibmxforce])
except Exception as e: except Exception as e:
print(e) #print(e)
pass
# Check for valid results before continuing
if not(data):
print("[-] No results found for keyword: {0}".format(keyword))
quit(0)
# Sort domain list by column 2 (Birth Year) # Sort domain list by column 2 (Birth Year)
sortedData = sorted(data, key=lambda x: x[1], reverse=True) sortedData = sorted(data, key=lambda x: x[1], reverse=True)

View File

@ -2,3 +2,5 @@ requests==2.13.0
texttable==0.8.7 texttable==0.8.7
beautifulsoup4==4.5.3 beautifulsoup4==4.5.3
lxml lxml
pillow==5.0.0
pytesseract