Testing the cluster will be done using a proof-of-work concept. SHA256 hashes produce a 64 character hex string from which the initial input cannot be determined. If the string "aabemr" is hashed the output is:
Note that this hash has four leading zeros. This will be basis of our test. We will be looking for combinations of six lower case letters that, when hashed using the SHA256 algorithm, will produce a hash with four or more leading zeros.
Set up performance
For this project I will use six letter combinations of lower case letters with repeats. This mimics a password cracker that tries to brute force a password by testing every available possibility. ISP's often use combinations of uppercase, lowercase and numbers to set wifi passwords. The length of these passwords is usually ten characters. The number of combinations can be calculated using:
If the value for n is 62 (26 uppercase, 26 lowercase and 10 numerals) and the value for r is 10 the value for C is 461,738,052,776 which probably does make brute forcing passwords almost impossible for an ordinary mortal.
We will be using n = 26 and r = 6 so our number of combinations is 736,281 which should be enough to make timing worthwhile.
The code to generate these strings is unassuming:
import itertools
combos = list(itertools.combinations_with_replacement('abcdefghijlkmnopqrstuvwxyz', 6))
nonce_list = []
for combo in combos:
nonce = ''
for letter in combo:
nonce += nonce.join(letter)
nonce_list.append(nonce)
stop = time.perf_counter()
print(f'Data generated in {stop - start} sec.')
On a Raspberry Pi 3B this takes ~13 sec to run. The combos list requires ~71 MB of memory and the nonce_list requires ~47 MB of memory. The final nonce list contains 736,281 entries.
The proof-of-work function
The POW function needs to look through a list of nonce's and find those that, when hashed using the SHA256 algorithm, generate a hash with at least four leading zeros. The function looks like this:
def check_hash(nonce_list):
results = []
for nonce in nonce_list:
hash = sha256(nonce.encode()).hexdigest()
if hash[:4] == '0000':
results.append([nonce, hash])
return results
This structure also allows us to compare the effects of parallel processing since the task being performed is the same for both the single computer processing or the cluster processing.
Mining code - single machine
The code to analyse the entire nonce list is:
import time
import itertools
from hashlib import sha256
alpha = 'abcdefghijklmnopqrstuvwxyz'
combos = list(itertools.combinations_with_replacement('abcdefghijklmnopqrstuvwxyz', 6))
def check_hash(nonce_list):
results = []
for nonce in nonce_list:
hash = sha256(nonce.encode()).hexdigest()
if hash[:4] == '0000':
results.append([nonce, hash])
return results
print('Code starting. Generating data. Please wait.')
start = time.process_time()
nonce_list = []
for combo in combos:
nonce = ''
for letter in combo:
nonce += nonce.join(letter)
nonce_list.append(nonce)
lap_time = time.process_time()
print(f'Data generated in {lap_time - start} sec.')
print(f'Processing data.')
results = check_hash(nonce_list)
for result in results:
print(result)
stop = time.process_time()
print('Execution time: ', (stop - lap_time))
print('Total time: ', (stop - start))
As you can see, it takes three times as long to make the data as it does to analyse it. That fact aside we have a benchmark time of 5.5 sec to beat with parallel processing.
Mining code - parallel processing
To run this code on the cluster it needs to be modified to this:
import time
import itertools
from hashlib import sha256
from mpi4py import MPI
from socket import gethostname
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
alpha = 'abcdefghijklmnopqrstuvwxyz'
combos = list(itertools.combinations_with_replacement(alpha, 6))
def check_hash(nonce_list):
results = []
for nonce in nonce_list:
hash = sha256(nonce.encode()).hexdigest()
if hash[:4] == '0000':
results.append([nonce, hash])
#print(gethostname(), results)
return results
start = time.perf_counter()
if rank == 0:
print('Generating data. Please wait.')
nonce_list = []
for combo in combos:
nonce = ''
for letter in combo:
nonce += nonce.join(letter)
nonce_list.append(nonce)
data_time = time.perf_counter()
print(f'Data generated in {data_time - start} sec.')
else:
pass
NODES = 4 # This has to be hard coded to match the nodes in the execution command.
if rank == 0:
print('Preparing data for scatter')
data = []
for i in range(NODES):
data.append([])
for i in range(len(nonce_list)):
data[i%NODES].append(nonce_list[i])
divide_time = time.perf_counter()
print(f'Prepared for scatter in {divide_time - data_time} sec.')
else:
data = None
# This is the actual parallel processing part so this is the part I will time.
if rank == 0:
print('Parallel processing commencing.')
parallel_start = time.perf_counter()
data = comm.scatter(data, root=0)
results = check_hash(data)
new_data = comm.gather(results, root=0)
if rank == 0:
for result in new_data:
print(result)
stop = time.perf_counter()
if rank == 0: # Only the master should print the execution time.
print('Processing time: ', (stop - parallel_start))
print('Total time: ', (stop - start))
Even when the task is scattered over only four processes the output is:
andrew@master:~ $ python execute.py mining_parallel.py 4
mining_parallel.py 100% 1724 813.4KB/s 00:00
mining_parallel.py 100% 1724 793.8KB/s 00:00
mining_parallel.py 100% 1724 783.6KB/s 00:00
Generating data. Please wait.
Data generated in 14.442740144993877 sec.
Preparing data for scatter
Prepared for scatter in 1.1794833989988547 sec.
Parallel processing commencing.
[['bcjlwz', '0000c9185bb896a481c681a3924717f8a7978571f4fee4bc7dc3d80294ee1bd9'],
['dgjkpt', '0000df90701603046058377eff15e8392f33661cf05a994a4cec491a0a3a2848'],
['ffmnot', '00005cefea5db1363b60dc309f5d0bac0c6560dbe048a1943949ea80fcca57c7'],
['gnptyz', '00003c09995fc8d10f7de62bdaedfb10fc672592e8e7b107d904d267e867b1d9']]
[['bfikrt', '0000805b0c879dfd5caed75add6709226647c533a94cbd40f3903716e49261f4'],
['cghmqu', '00007805f52991eca88484bf6cf228e3b1f1a40909a613fd1fa0f6ffe16b5747'],
['filuuz', '0000a28ef6055e9a1813881f28e6eb8a0ba939868017705c5a76156375948bab']]
[['aabemr', '00008e6ca6344edf9c18f10aece54f9310e4168d04dcc311dc86076ec07cff8c'],
['ahirsv', '00000f27f60cb204372da2817b04a7141bef0420d2668f91104fb418719f0e72'],
['ccklvx', '0000197ba3cba0c4963bc51bbf8cb5ecb9d0b0b4d6eaf8a8b22cf7659dd5bb98'],
['cgikkz', '000023e46b7bd23d513456e39abbd0b11c8525241cf621933086340f03d439c9'],
['dfgizz', '00002511546d6f76d576efef2a63bd190a5a8c1ee5c70fd14d432873495fe364']]
[['acknop', '0000b08e6f1d2394ecdbc8a6e9f0059691ecc09f4d1259d2aab94cb30497c1a6'],
['lmqrst', '00004174ccf6831504a807bbe7fbfc47228d34c8d483b9ad3aaf4bd1336c5ff5']]
Processing time: 3.4473998640023638
Total time: 19.069701376996818
The newline feeds at the end of each nonce, hash pair are my formatting to make this easier to interpret. The processing time can be seen as 3.45 sec. This makes parallel processing across all four nodes about 40% faster.
The final test is to use the full 16 cores at our disposal. The line,
NODES = 4
needs to be changed to,
NODES = 16
The final output is:
andrew@master:~ $ python execute.py mining_parallel.py 16
mining_parallel.py 100% 1725 717.3KB/s 00:00
mining_parallel.py 100% 1725 823.0KB/s 00:00
mining_parallel.py 100% 1725 767.1KB/s 00:00
Generating data. Please wait.
Data generated in 19.46768968399556 sec.
Preparing data for scatter
Prepared for scatter in 1.1892309429968009 sec.
Parallel processing commencing.
[['gnptyz', '00003c09995fc8d10f7de62bdaedfb10fc672592e8e7b107d904d267e867b1d9']]
[['filuuz', '0000a28ef6055e9a1813881f28e6eb8a0ba939868017705c5a76156375948bab']]
[]
[['acknop', '0000b08e6f1d2394ecdbc8a6e9f0059691ecc09f4d1259d2aab94cb30497c1a6']]
[['dgjkpt', '0000df90701603046058377eff15e8392f33661cf05a994a4cec491a0a3a2848'],
['ffmnot', '00005cefea5db1363b60dc309f5d0bac0c6560dbe048a1943949ea80fcca57c7']]
[['bfikrt', '0000805b0c879dfd5caed75add6709226647c533a94cbd40f3903716e49261f4']]
[['ahirsv', '00000f27f60cb204372da2817b04a7141bef0420d2668f91104fb418719f0e72'],
['cgikkz', '000023e46b7bd23d513456e39abbd0b11c8525241cf621933086340f03d439c9']]
[['lmqrst', '00004174ccf6831504a807bbe7fbfc47228d34c8d483b9ad3aaf4bd1336c5ff5']]
[['bcjlwz', '0000c9185bb896a481c681a3924717f8a7978571f4fee4bc7dc3d80294ee1bd9']]
[['cghmqu', '00007805f52991eca88484bf6cf228e3b1f1a40909a613fd1fa0f6ffe16b5747']]
[['aabemr', '00008e6ca6344edf9c18f10aece54f9310e4168d04dcc311dc86076ec07cff8c'],
['ccklvx', '0000197ba3cba0c4963bc51bbf8cb5ecb9d0b0b4d6eaf8a8b22cf7659dd5bb98'],
['dfgizz', '00002511546d6f76d576efef2a63bd190a5a8c1ee5c70fd14d432873495fe364']]
[]
[]
[]
[]
[]
Processing time: 2.5700334959983593
Total time: 23.22703177999938
This time we can see that some nodes (six actually) did not find any suitable hashes at all. Note also that the processing time has dropped to 2.57 sec. Even with the added overhead of scattering and gathering the cluster is still nearly twice as fast as the single computer which was probably accessing all four of its cores anyway