This version of John is UTF-8 aware, using a new option flag. In short, this flag says "my wordlists and input files are encoded in UTF-8" and it can be used without knowing if the particular format is affected or not. Without this support, John will assume ISO-8859-1 when converting plaintexts or salts to UTF-16. This affects many Microsoft formats like NT, mscash and mssql. Other formats, like DES or MD5 are never doing such conversion, so you would just use wordlists encoded in the same format as the hashes once were generated in. The --utf8 flag is supported for such formats too, but will only affect the new reject rules, see below. To enable UTF-8 conversion, just add the --utf8 flag. This will affect not only wordlist files, but also salts, usernames (when used as salt) and any info used by --single. You can also test all the UTF-8 aware formats at once using "john --test --utf8" To convert a wordlist to UTF-8, use iconv(1): $ iconv -f koi8r -t utf-8 cslang >cslang.utf8 You can convert from/to a large number of formats, see iconv's man page. Two new reject rules are implemented: -u reject rule unless the --utf8 flag is used -U reject rule if the --utf8 flag is used Note that this can be used to enable/disable certain rules even for formats that does not use UTF-16 internally (eg. raw md5 of utf-8 plaintexts). A new character class is also implemented, ?b (b for binary). It matches 8-bit characters. This can be used with or without the UTF-8 support. Caveats: - UTF-8 conversion is often slower than the default ISO-8859-1 one, so it's advisable to only use this when you have to. That is, when your wordlist contains characters not included in ISO-8859-1. - Some wordlist rules may cut UTF-8 multibyte sequences in the middle, resulting in garbage. You can reject such rules with -U to have them in use only when --utf8 is not used. - Beware of UTF-8 BOM's. Do not use them (they will cripple the first word in your wordlist, that's all). -- These contributions to John are hereby placed in the public domain. In case that is not applicable, they are Copyright 2009, 2010, 2011 by me and hereby released to the general public. Redistribution and use in source and binary forms, with or without modification, is permitted. magnum