These are "public" functions that can be used for supporting UTF-8 in your own formats for JtR. All are put in unicode.c. If you are not aware of UCS-2, just consider it a synonym to UTF-16. For our purposes, they are the same. Convert from ISO-8859-1 or UTF-8 to UTF-16LE: ============================================= Source format depends on --utf8 flag given to john when running. If length is exceeded or malformed data is found, it will return a negative number telling you how much of the source that was read. That is, a return code of -32 means you have an UNKNOWN number of characters of UCS-2 [use strlen16()] in the destination buffer and you should truncate your "saved_plain" (if applicable) at 32. #include "unicode.h" int plaintowcs(UTF16 *dst, int maxdstlen, const UTF8 *src, int srclen); Convert UTF-8 to UTF-16LE: ========================== Always from UTF-8. This is optimised for speed. If length is exceeded or malformed data is found, it will return a negative number telling you how much of the source that was read. That is, a return code of -32 means you have an UNKNOWN number of characters of UCS-2 [use strlen16()] in the destination buffer and you should truncate your "saved_plain" at 32. #include "unicode.h" int utf8towcs(UTF16 *target, int maxtargetlen, const UTF8 *source, int sourcelen); Convert UTF-16LE to UTF-8: ========================== Currently used in NT_fmt.c in order to avoid using a saved_plain buffer. #include "unicode.h" extern char * utf16toutf8 (const UTF16* source); Return length (in characters) of a UTF16 string: ================================================ Number of octets is the result * sizeof(UTF16) #include "unicode.h" int strlen16(const UTF16 *str); Create an NT hash: ================== This will convert from ISO-8859 or UTF-8 depending on the --utf8 option. The function will use Alain's fast NT hashing if length is <= 27 characters, otherwise it will use Solar's MD4. Lengths up to MAX_PLAINTEXT_LENGTH is thus supported with no hassle for you. If length is exceeded or malformed data is found, it will return a negative number telling you how much of the source that was read. That is, a return code of -32 means you should truncate your "saved_plain" at 32. #include "unicode.h" int E_md4hash(const UTF8 *passwd, int len, unsigned char *p16); NOTE that there are more functions available in ConvertUTF.c.original (ConvertUTF.h.original) and if we need any of them, we should copy them to unicode.c/h and simplify/optimize them.