metasploit-framework/data/john/doc/UTF8-DEVEL.txt

These are "public" functions that can be used for supporting UTF-8 in your
own formats for JtR. All are put in unicode.c.

If you are not aware of UCS-2, just consider it a synonym to UTF-16. For our
purposes, they are the same.


Convert from ISO-8859-1 or UTF-8 to UTF-16LE:
=============================================
Source format depends on --utf8 flag given to john when running.
If length is exceeded or malformed data is found, it will return a negative
number telling you how much of the source that was read. That is, a return
code of -32 means you have an UNKNOWN number of characters of UCS-2 [use
strlen16()] in the destination buffer and you should truncate your
"saved_plain" (if applicable) at 32.

    #include "unicode.h"
    int plaintowcs(UTF16 *dst, int maxdstlen, const UTF8 *src, int srclen);


Convert UTF-8 to UTF-16LE:
==========================
Always from UTF-8. This is optimised for speed. If length is exceeded or
malformed data is found, it will return a negative number telling you how much
of the source that was read. That is, a return code of -32 means you have an
UNKNOWN number of characters of UCS-2 [use strlen16()] in the destination
buffer and you should truncate your "saved_plain" at 32.

    #include "unicode.h"
    int utf8towcs(UTF16 *target, int maxtargetlen,
           const UTF8 *source, int sourcelen);


Convert UTF-16LE to UTF-8:
==========================
Currently used in NT_fmt.c in order to avoid using a saved_plain buffer.

    #include "unicode.h"
    extern char * utf16toutf8 (const UTF16* source);


Return length (in characters) of a UTF16 string:
================================================
Number of octets is the result * sizeof(UTF16)

    #include "unicode.h"
    int strlen16(const UTF16 *str);


Create an NT hash:
==================
This will convert from ISO-8859 or UTF-8 depending on the --utf8 option. The
function will use Alain's fast NT hashing if length is <= 27 characters,
otherwise it will use Solar's MD4. Lengths up to MAX_PLAINTEXT_LENGTH is thus
supported with no hassle for you. If length is exceeded or malformed data is
found, it will return a negative number telling you how much of the source
that was read. That is, a return code of -32 means you should truncate your
"saved_plain" at 32.

    #include "unicode.h"
    int E_md4hash(const UTF8 *passwd, int len, unsigned char *p16);


NOTE that there are more functions available in ConvertUTF.c.original
(ConvertUTF.h.original) and if we need any of them, we should copy them to
unicode.c/h and simplify/optimize them.
Check in a snapshot of jtr git-svn-id: file:///home/svn/framework3/trunk@13135 4d416f70-5f16-0410-b530-b9f4589650da 2011-07-09 02:11:54 +00:00			`These are "public" functions that can be used for supporting UTF-8 in your`
			`own formats for JtR. All are put in unicode.c.`

			`If you are not aware of UCS-2, just consider it a synonym to UTF-16. For our`
			`purposes, they are the same.`


			`Convert from ISO-8859-1 or UTF-8 to UTF-16LE:`
			`=============================================`
			`Source format depends on --utf8 flag given to john when running.`
			`If length is exceeded or malformed data is found, it will return a negative`
			`number telling you how much of the source that was read. That is, a return`
			`code of -32 means you have an UNKNOWN number of characters of UCS-2 [use`
			`strlen16()] in the destination buffer and you should truncate your`
			`"saved_plain" (if applicable) at 32.`

			`#include "unicode.h"`
			`int plaintowcs(UTF16 dst, int maxdstlen, const UTF8 src, int srclen);`



			`Convert UTF-8 to UTF-16LE:`
			`==========================`
			`Always from UTF-8. This is optimised for speed. If length is exceeded or`
			`malformed data is found, it will return a negative number telling you how much`
			`of the source that was read. That is, a return code of -32 means you have an`
			`UNKNOWN number of characters of UCS-2 [use strlen16()] in the destination`
			`buffer and you should truncate your "saved_plain" at 32.`

			`#include "unicode.h"`
			`int utf8towcs(UTF16 *target, int maxtargetlen,`
			`const UTF8 *source, int sourcelen);`



			`Convert UTF-16LE to UTF-8:`
			`==========================`
			`Currently used in NT_fmt.c in order to avoid using a saved_plain buffer.`

			`#include "unicode.h"`
			`extern char * utf16toutf8 (const UTF16* source);`



			`Return length (in characters) of a UTF16 string:`
			`================================================`
			`Number of octets is the result * sizeof(UTF16)`

			`#include "unicode.h"`
			`int strlen16(const UTF16 *str);`



			`Create an NT hash:`
			`==================`
			`This will convert from ISO-8859 or UTF-8 depending on the --utf8 option. The`
			`function will use Alain's fast NT hashing if length is <= 27 characters,`
			`otherwise it will use Solar's MD4. Lengths up to MAX_PLAINTEXT_LENGTH is thus`
			`supported with no hassle for you. If length is exceeded or malformed data is`
			`found, it will return a negative number telling you how much of the source`
			`that was read. That is, a return code of -32 means you should truncate your`
			`"saved_plain" at 32.`

			`#include "unicode.h"`
			`int E_md4hash(const UTF8 passwd, int len, unsigned char p16);`



			`NOTE that there are more functions available in ConvertUTF.c.original`
			`(ConvertUTF.h.original) and if we need any of them, we should copy them to`
			`unicode.c/h and simplify/optimize them.`