59 lines
2.2 KiB
Plaintext
59 lines
2.2 KiB
Plaintext
This version of John is UTF-8 aware, using a new option flag. In short,
|
|
this flag says "my wordlists and input files are encoded in UTF-8" and
|
|
it can be used without knowing if the particular format is affected or
|
|
not.
|
|
|
|
Without this support, John will assume ISO-8859-1 when converting
|
|
plaintexts or salts to UTF-16. This affects many Microsoft formats
|
|
like NT, mscash and mssql. Other formats, like DES or MD5 are never
|
|
doing such conversion, so you would just use wordlists encoded in the
|
|
same format as the hashes once were generated in. The --utf8 flag is
|
|
supported for such formats too, but will only affect the new reject
|
|
rules, see below.
|
|
|
|
To enable UTF-8 conversion, just add the --utf8 flag. This will affect
|
|
not only wordlist files, but also salts, usernames (when used as salt)
|
|
and any info used by --single. You can also test all the UTF-8 aware
|
|
formats at once using "john --test --utf8"
|
|
|
|
To convert a wordlist to UTF-8, use iconv(1):
|
|
|
|
$ iconv -f koi8r -t utf-8 cslang >cslang.utf8
|
|
|
|
You can convert from/to a large number of formats, see iconv's man page.
|
|
|
|
|
|
Two new reject rules are implemented:
|
|
|
|
-u reject rule unless the --utf8 flag is used
|
|
|
|
-U reject rule if the --utf8 flag is used
|
|
|
|
Note that this can be used to enable/disable certain rules even for
|
|
formats that does not use UTF-16 internally (eg. raw md5 of utf-8
|
|
plaintexts).
|
|
|
|
|
|
A new character class is also implemented, ?b (b for binary). It matches
|
|
8-bit characters. This can be used with or without the UTF-8 support.
|
|
|
|
|
|
Caveats:
|
|
- UTF-8 conversion is often slower than the default ISO-8859-1 one, so
|
|
it's advisable to only use this when you have to. That is, when your
|
|
wordlist contains characters not included in ISO-8859-1.
|
|
- Some wordlist rules may cut UTF-8 multibyte sequences in the middle,
|
|
resulting in garbage. You can reject such rules with -U to have them in
|
|
use only when --utf8 is not used.
|
|
- Beware of UTF-8 BOM's. Do not use them (they will cripple the first word
|
|
in your wordlist, that's all).
|
|
|
|
--
|
|
|
|
These contributions to John are hereby placed in the public domain. In
|
|
case that is not applicable, they are Copyright 2009, 2010, 2011 by me and
|
|
hereby released to the general public. Redistribution and use in source
|
|
and binary forms, with or without modification, is permitted.
|
|
|
|
magnum
|