For most hashes where MD5 is used, building a proper md5 format is likely not the best bet overall. A format is not trivial. It requires maintainance and will likely requires specific enhancements to get it to perform optimally on all hardware. Likely there will need to be 'generic' C code done, then it will need code to tie it into CPU specific optimizations, such as SSE, MMX, intrisic SSE, GPU, ... ... ... This will also mean that to stay up to date, the format will require ongoing work and mainainance. However, there is one format which may reduce a lot of this maintainance work to very little. Now, that format itself will need to be kept up to date, but any formats that are built upon its internal workings. That format is md5-gen. In this 'format', there is a scripting language, where a format developer only need to describe the actual operations properly, and the format is 'done', and working. This document will go over how to 'build' a format that uses this md5-gen format, how to optimize it to work faster, and how to build a 'thin' quasi format which insulates the end user from the md5-gen format line building. **** Introduction **** To start off with, a little background on 'how' and 'where' to build the scripts that run md5-gen, what interanal data structures are available to be used. The 'where' which a format developer can easily build into john, is to add a new md5-gen format 'script', into john.ini file (john.conf). This file usually is located in the current directory where john is run out of (but the --config=file can override the default behavior). Within the john.conf, a new 'section' can be added for a md5 genercic format. The new 'section' will be set by using this section naming: [List.Generic:md5_gen(NUM)] You replace the NUM with the sub-format number (from 1001 to 9999). Pick a number that is not used. Within this 'section', there will be multiple lines added. These lines are primarily of the form: Type=Value The actual contents of these scripts will be addressed later. That will be the 'How', and preforming this is actually outside of the intro section. The 'Data' and runtime information is this: Inside of the md5-gen format, there are 2 input buffers (actually ALL data is arrays of 128 of each buffer type). There is input1 and input2 buffers. The main operations on these buffers is to clear them, and to append data, to build string which will later be md5 hashed. There are also 2 output buffers. These buffers will receive the md5 hashing from the 2 input buffers. NOTE, when the format processing is complete, the results MUST be placed into output1 buffer. This is where all of the comparison functions will check against. In the format, there is a salt (if the format is salted). There may also be a second salt value. There are also 'keys' value(s). These are the passwords being tested at this given time. There are also 8 'constant' strings which can be used within a format. A format such as md5-po has a couple of constants within it. There are also numerous optimization 'flags' which do special things when loading keys or salts, and there are numourous special 'optimization' primative functions within the format, for speedup of certain operations. **** Simple format building **** We will start out with a few simple formats, and simply 'show' how to build a straight forward script. The scripts may or may not be optimal. Later we will optimize these somewhat. When building the formats here, there will be comments interspersed, listing just what is being done, and why. we will build these formats: md5_gen(1030) md5($p.$p) md5_gen(1031) md5($s.md5($p).$p) md5_gen(1032) md5(md5($s).md5($p).$p) [List.Generic:md5_gen(1030)] Expression=md5_gen(1030): md5($p.$p) Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Test=md5_gen(1030)42b72f913c3201fc62660d512f5ac746:test1 Here is the exact same format, with some comments added, describing the sub-sections, and exactly what is being done. #first line is the section name. It MUST be of the format shown. [List.Generic:md5_gen(1030)] # #the next line, is a required line. It serves 2 purposes. It is output #in john, when the format 'starts'. Also, the md5_gen(#) part is used #to destinguish this exact format (so the command line of --sub=md5_gen(1030) #would specify this and only this format) # Expression=md5_gen(1030): md5($p.$p) # #This is the set of functions. This is the ONLY section of the format #where order IS important. The functions ARE handled one after the #other, from top to bottom, to perform the string operations, and md5 #operations which are needed to perform the hash of this format #The functions ARE a required part of the format. # #first step, clean the input. All work for this format is done using #only input 1 and output 1 buffers. Func=MD5GenBaseFunc__clean_input # #Step 2, append the keys. Note, the buffer is clean, so this is simply #the same as Input=keys (but required 2 steps, the clean and append keys). Func=MD5GenBaseFunc__append_keys # #Step 3, append keys again (the format is ($p.$p) or keys appended to keys. Func=MD5GenBaseFunc__append_keys # #Step 4, final step performs md5 of $p.$p This will properly leave the #results in output1 Func=MD5GenBaseFunc__crypt # #This is test string. These ARE required. You can provide more than #one. 5 or 6 are best, to make sure the format is valid. # Test=md5_gen(1030)42b72f913c3201fc62660d512f5ac746:test1 Ok, here is the second format. The format being done is md5($s.md5($p).$p) Here are a few comments about this format: 1. There is a Flag= value. This is because this is a Salted format. This REQUIRES the MGF_SALTED flag. 2. We only use input 1 and output 1. 3. There are a couple of calls to crypt (md5). The first simply gets md5($p) and puts it into output1, which will later be appeneded in base-16 format as we build our string. 4. After the first crypt (md5), we clear our input buffer, then put the salt in, append the base-16 of md5($p), and then append $p 5. Finally, and call to crypt is done, which leaves the results in output1, so the rest of the md5-gen format can properly compare it. [List.Generic:md5_gen(1031)] Expression=md5_gen(1031): md5($s.md5($p).$p) Flag=MGF_SALTED Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_salt Func=MD5GenBaseFunc__append_from_last_output_as_base16 Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Test=md5_gen(1031)a459f60614498dbdd9a79dcc9c538749$aabbccdd:test1 Now, here is the final format: md5(md5($s).md5($p).$p) [List.Generic:md5_gen(1032)] Expression=md5_gen(1032): md5(md5($s).md5($p).$p) Flag=MGF_SALTED Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_salt Func=MD5GenBaseFunc__crypt Func=MD5GenBaseFunc__clean_input2 Func=MD5GenBaseFunc__append_keys2 Func=MD5GenBaseFunc__crypt2 Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_from_last_output_as_base16 Func=MD5GenBaseFunc__append_from_last_output2_to_input1_as_base16 Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Test=md5_gen(1032)042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 Ok, now that these have been built, here are a few 'benchmarks' listing that they are WORKING, and what speed they are working: Here is MinGW build 'x86' john_x86 -test -for=md5-gen -sub=md5_gen(1030) Benchmarking: md5_gen(1030) md5_gen(1030): md5($p.$p) [128x1 (MD5_Go)]... DONE Raw: 3530K c/s john_x86 -test -for=md5-gen -sub=md5_gen(1031) Benchmarking: md5_gen(1031) md5_gen(1031): md5($s.md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1945K c/s Only one salt: 1890K c/s john_x86 -test -for=md5-gen -sub=md5_gen(1032) Benchmarking: md5_gen(1032) md5_gen(1032): md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1016K c/s Only one salt: 1031K c/s Here is MinGW build of SSE2 john_sse2 -test -for=md5-gen -sub=md5_gen(1030) Benchmarking: md5_gen(1030) md5_gen(1030): md5($p.$p) SSE2 [SSE2 32x4 (.S)]... DONE Raw: 7250K c/s john_sse2 -test -for=md5-gen -sub=md5_gen(1031) Benchmarking: md5_gen(1031) md5_gen(1031): md5($s.md5($p).$p) SSE2 [SSE2 32x4 (.S)]... DONE Many salts: 5065K c/s Only one salt: 4436K c/s john_sse2 -test -for=md5-gen -sub=md5_gen(1032) Benchmarking: md5_gen(1032) md5_gen(1032): md5(md5($s).md5($p).$p) SSE2 [SSE2 32x4 (.S)]... FAILED (get_hash[0](0)) Here is some timings to check against: john_x86 -test -for=md5-gen -sub=md5_gen(0) Benchmarking: md5_gen(0): md5($p) (raw-md5) [128x1 (MD5_Go)]... DONE Raw: 4005K c/s john_sse2 -test -for=md5-gen -sub=md5_gen(0) Benchmarking: md5_gen(0): md5($p) (raw-md5) SSE2 [SSE2 32x4 (.S)]... DONE Raw: 10740K c/s **** Optimizations of prior formats **** For format 1030, the speed should be very close to that of md5_gen(0). In both formats, there is only 1 call to md5(). However, we are seeing that the (1030) is slower than (0). The explanation of this, is that the (0) format has an optimization used, which we can not use in the (1030). The (1030) is likely about as optimal as it can be made in the current md5-gen format. The optimization for format (0) is: Flag=MGF_KEYS_INPUT What that does, is to place the keys directly into the input field, and then later, when john gets the keys back (it does this if a hash is cracked), john gets them from the input. In the (1030) format, we load the keys, into the 'keys' arrays. We then have to call a function to clean input buffer 1, and to append the keys (twice). Thus, what we have is additional memory movement, and that slows things down. However, to use the MGF_KEYS_INPUT optimization, we would have had to keep the input1 buffer prestine and ONLY put in the keys (passwords). Since we had to append the keys twice, we simply 'blew' that requirement, and thus, could NOT use it. At a later time, we will show a format WHERE we can use this optimization. For format 1031, there also appears to be no optimizations available. For 1032, there are optimizations. In this format, we notice that we have this sub expression: md5($s). Well, there is an optimization, which when it loads the input file, it converts all salts into md5($s) and uses that value instead. So, at startup time, we perform md5 hashes of all salts, but at runtime, we simply place the salt into the building string, instead of performing a MD5 on the salt. So, in the 1032, we had 3 calls to crypt. By using this optimization, we can reduce that to 2 crypts. The starting format is: md5(md5($s).md5($p).$p) This optimization makes the format 'behave' at runtime, like it is md5($s.md5($p).$p), which was format 1031. Note, after we make this optimzation, the timings will be almost identical to the 1031 timings. Also note, the Test string for 1032 and 1042 are exactly the same. These are the same formats. It is just that 1042 performs fewer crypt calls per test. Also note, in the 'original' run of SSE2, the 1032 format failed. This failure, is due to the SSE2 / MMX code only working for strings up to 54 bytes (optimization reason). The length of this string: md5($s).md5($p) is 64 bytes by itself, and we also append $p to that. Thus, our string is OVER 54 bytes in length, and thus, can not be used in SSE2 mode. We do have a couple work arounds for this, to get it working properly on SSE2 builds. We can use a flag which simply stops SSE2 dead in its tracks (and preforms all work using x86 code). This is flag MGF_NOTSSE2Safe [List.Generic:md5_gen(1042)] Expression=md5_gen(1042): md5(md5($s).md5($p).$p) Flag=MGF_SALTED Flag=MGF_SALT_AS_HEX Flag=MGF_NOTSSE2Safe Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Func=MD5GenBaseFunc__clean_input Func=MD5GenBaseFunc__append_salt Func=MD5GenBaseFunc__append_from_last_output_as_base16 Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Test=md5_gen(1042)042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 Once the above changes have been done, here are the speeds: john_x86 -test=5 -for=md5-gen -sub=md5_gen(1031) Benchmarking: md5_gen(1031) md5_gen(1031): md5($s.md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 2007K c/s Only one salt: 1913K c/s john_x86 -test=5 -for=md5-gen -sub=md5_gen(1032) Benchmarking: md5_gen(1032) md5_gen(1032): md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1052K c/s Only one salt: 1030K c/s john_x86 -test=5 -for=md5-gen -sub=md5_gen(1042) Benchmarking: md5_gen(1042) md5_gen(1042): md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 1420K c/s Only one salt: 1372K c/s john_sse2 -test=5 -for=md5-gen -sub=md5_gen(1042) Benchmarking: md5_gen(1042) md5_gen(1042): md5(md5($s).md5($p).$p) SSE2 [128x1 (MD5_Go)]... DONE Many salts: 1416K c/s Only one salt: 1372K c/s We can also perform even more optimizations in the format. What we do in this format, is we md5 the salt (when we first load the file). Thus the salts which john works with, are really md5($s) (same as we did in format 1042). Then we use a different flag, which puts the md5($p) into offset 32 of input1 (where we want it). Then we simply overwrite the data in input 1 with the salt (which is md5($s) in base-16 format), then force set length to 64, then append the keys, then crypt. [List.Generic:md5_gen(1052)] Expression=md5_gen(1052): md5(md5($s).md5($p).$p) Flag=MGF_SALTED Flag=MGF_SALT_AS_HEX Flag=MGF_KEYS_BASE16_IN1_Offset32 Flag=MGF_NOTSSE2Safe Func=MD5GenBaseFunc__overwrite_salt_to_input1_no_size_fix Func=MD5GenBaseFunc__set_input_len_64 Func=MD5GenBaseFunc__append_keys Func=MD5GenBaseFunc__crypt Test=md5_gen(1052)042d1f15ed57929a2ac8ee4f0a924679$aabbccdd:test1 Here are the benchmarks for the above format: john_x86 -test=5 -for=md5-gen -sub=md5_gen(1052) Benchmarking: md5_gen(1052) md5_gen(1052): md5(md5($s).md5($p).$p) [128x1 (MD5_Go)]... DONE Many salts: 2251K c/s Only one salt: 1369K c/s john_sse2 -test=5 -for=md5-gen -sub=md5_gen(1052) Benchmarking: md5_gen(1052) md5_gen(1052): md5(md5($s).md5($p).$p) SSE2 [128x1 (MD5_Go)]... DONE Many salts: 2251K c/s Only one salt: 1369K c/s Now, note the speed for 'many salts'. It is very close to the speed of (1031), actually faster. This speed is the speed john will have for a normal password cracking, where you have dozens (or hundreds, or 1000's) of password hashes to crack. To understand WHY this format is this much faster (the 'Many salts', is the normal way to benchmark the speed of a salted hash), is to understand what is happening under the hood within john's 'crypt all' loop. while (!feof(password_file)) { for (i = 0 to max_num_passwords) SetKey(i, getnextpassword(password_file)); if (salted) { while (z