==================== PRELUDE: ==================== The original implementation was ca. 2004 by Ryan Lim as an academic project. It was later picked up and maintained at bindshell.net, adding fixes for the JtR 1.7 releases and various cipher patches. In 2008, it was picked up by AoZ and stripped back down to the original MPI-only changes to improve its compatibility with the 'jumbo' patchsets, which had better-maintained alternate cipher support. This is often referred to as "the mpi10 patch" In 2010, it was extended by magnum to support all cracking modes. This should be referred to as "the fullmpi patch" to avoid confusion. With the exception of Markov it is far from perfect but it works just fine and should support correct resuming in all modes. It is well tested but you have absolutely NO guarantees. ==================== COMPILING: ==================== Unless using OMP, you should consider applying the nsk-3 patch, also known as "Faster bitslice DES key setup". To enable MPI in John, un-comment these two line in Makefile: ----8<--------------8<--------------8<--------------8<--------------8<---------- # Uncomment the TWO lines below for MPI (can be used together with OMP as well) CC = mpicc -DHAVE_MPI MPIOBJ = john-mpi.o ----8<--------------8<--------------8<--------------8<--------------8<---------- You must have an operational MPI environment prior to both compiling and using the MPI version; configuring one is outside the scope of this document but for a single, multi-core, host you don't need much configuration. MPICH2 or OpenMPI seems to do the job fine, for example. Most testing of fullmpi is now done under latest stable OpenMPI. Debian Linux example for installing OpenMPI: sudo apt-get install libopenmpi-dev openmpi-bin Note that this patch works just fine together with OMP enabled as well. When MPI is in use (with more than one process), OMP is (by default) automatically disabled. Advanced users may want to change this setting (change MPIOMPmutex to N in john.conf) and start one MPI node per multi-core host, letting OMP do the rest. Warnings are printed; these can be muted in john.conf too. ==================== USAGE: ==================== Typical invocation is as follows: mpiexec -np 4 ./john --incremental passwd The above will launch four parallel processes that will split the Incremental keyspace in a more-or-less even fashion. If you run it to completion, some nodes will however finish very early due to how this mode is implemented, decreasing the overall performance. This problem gets much worse with a lot of nodes. In MARKOV mode, the range is automatically split evenly across the nodes, just like you could do manually. This does not introduce any overhead, assuming job runs to completion - and also assuming your MPI compiler behaves. The single and wordlist modes scale fairly well and cleartexts will not be tried by more than one node (except when different word + rule combinations result in the same candidate, but that problem is not MPI specific). In SINGLE mode, and sometimes in Wordlist mode (see below), john will distribute (actually leapfrog) the rules (after preprocessor expansion). This works very well but will not likely result in a perfectly even workload across nodes. WORDLIST mode with rules will work the same way. Without rules, or when rules can't be split across the nodes, john will distribute (again, it really just leapfrogs) the words instead. This is practically the same as using the External:Parallel example filter in john.conf, but much more user friendly. If the --mem-file-size parameter (default 5000000) will allow the file to be loaded in memory, this will be preferred and each node will only load its own share of words. In this case, there is no further leapfrogging and no other overhead. Note that the limit is per node, so using the default and four nodes, a 16 MB file WILL be loaded to memory, with 4 MB on each node. You can override the leapfrogging selection. This is debug code really and should eventually be replace by proper options: --mem-file-size=0 (force split loading, no leapfrog) --mem-file-size=1 (force leapfrogging of words) --mem-file-size=2 (force leapfrogging of rules) In EXTERNAL mode, john will distribute candidates in the same way as in Wordlist mode without rules. That is, all candidates will be produced on all nodes, and then skipped by all nodes but one. This is the mode where the fullmpi patch performs worst. When attacking very fast formats, this scales VERY poorly. You may send a USR1 signal to the parent MPI process (or HUP to all individual processes) to cause the subprocesses to print out their status. Be aware that they may not appear in order, because they blindly share the same terminal. skill -USR1 -c mpiexec Another approach would be to do a normal status print. This must be done with mpiexec and using the same -np as used for starting the job: mpiexec -np 4 ./john --status Which will dump the status of each process as recorded in the .rec files. This way you also get a line with total statistics. ==================== CAVEATS: ==================== - This implementation does not account for heterogeneous clusters or nodes that come and go. - In interest of cooperating with other patches, benchmarking is less accurate. Specifically, it assumes all participant cores are the same as the fastest. - Benchmark virtual c/s will appear inflated if launching more processes than cores available. It will basically indicate what the speed would be with that many real cores. - There is no inter-process communication of cracked hashes yet. This means that if one node cracks a hash, all other nodes will continue to waste time on it. The current workaround is aborting and restarting the jobs regularly. This also means that you may have to manually stop some or all nodes after all hashes are cracked. - Aborting a job using ctrl-c will often kill all nodes without updating state files and logs. I have tried to mitigate this but it is still a good idea to send a -USR1 to the parent before killing them. You should lower the SAVE parameter in john.conf to 60 (seconds) if running MPI, this will be the maximum time of repeated work after restarting. ============================================================ Following is the verbatim original content of this file: ============================================================ This distribution of John the Ripper (1.6.36) requires MPI to compile. If you don't have MPI, download and install it before proceeeding. Any bugs, patches, comments or love letters should be sent to jtr-mpi@hash.ryanlim.com. Hate mail, death threates should be sent to /dev/null. Enjoy. -- Ryan Lim