Parallel processing results

Strategy

Testing the cluster will be done using a proof-of-work concept. SHA256 hashes produce a 64 character hex string from which the initial input cannot be determined. If the string "aabemr" is hashed the output is:

0000 8e6c a634 4edf 9c18 f10a ece5 4f93 10e4 168d 04dc c311 dc86 076e c07c ff8c

Note that this hash has four leading zeros. This will be basis of our test. We will be looking for combinations of six lower case letters that, when hashed using the SHA256 algorithm, will produce a hash with four or more leading zeros.

Set up performance

For this project I will use six letter combinations of lower case letters with repeats. This mimics a password cracker that tries to brute force a password by testing every available possibility. ISP's often use combinations of uppercase, lowercase and numbers to set wifi passwords. The length of these passwords is usually ten characters. The number of combinations can be calculated using:

If the value for n is 62 (26 uppercase, 26 lowercase and 10 numerals) and the value for r is 10 the value for C is 461,738,052,776 which probably does make brute forcing passwords almost impossible for an ordinary mortal.

We will be using n = 26 and r = 6 so our number of combinations is 736,281 which should be enough to make timing worthwhile.

The code to generate these strings is unassuming:

import itertools
combos = list(itertools.combinations_with_replacement('abcdefghijlkmnopqrstuvwxyz', 6))
nonce_list = []
for combo in combos:
	nonce = ''
	for letter in combo:
		nonce += nonce.join(letter)
	nonce_list.append(nonce)
stop = time.perf_counter()
print(f'Data generated in {stop - start} sec.')

On a Raspberry Pi 3B this takes ~13 sec to run. The combos list requires ~71 MB of memory and the nonce_list requires ~47 MB of memory. The final nonce list contains 736,281 entries.

The proof-of-work function

The POW function needs to look through a list of nonce's and find those that, when hashed using the SHA256 algorithm, generate a hash with at least four leading zeros. The function looks like this:

def check_hash(nonce_list):
	results = []
	for nonce in nonce_list:
		hash = sha256(nonce.encode()).hexdigest()
		if hash[:4] == '0000':
			results.append([nonce, hash])
	return results

This structure also allows us to compare the effects of parallel processing since the task being performed is the same for both the single computer processing or the cluster processing.

Mining code - single machine

The code to analyse the entire nonce list is:

import time
import itertools
from hashlib import sha256

alpha = 'abcdefghijklmnopqrstuvwxyz'
combos = list(itertools.combinations_with_replacement('abcdefghijklmnopqrstuvwxyz', 6))

def check_hash(nonce_list):
	results = []
	for nonce in nonce_list:
		hash = sha256(nonce.encode()).hexdigest()
		if hash[:4] == '0000':
			results.append([nonce, hash])
	return results

print('Code starting. Generating data.  Please wait.')
start = time.process_time()

nonce_list = []
for combo in combos:
	nonce = ''
	for letter in combo:
		nonce += nonce.join(letter)
	nonce_list.append(nonce)

lap_time = time.process_time()
print(f'Data generated in {lap_time - start} sec.')
print(f'Processing data.')
results = check_hash(nonce_list)

for result in results:
	print(result)

stop = time.process_time()
print('Execution time: ', (stop - lap_time))
print('Total time: ', (stop - start))

When run, the output is:

andrew@master:~ $ python mining_single.py 
Code starting. Generating data.  Please wait.
Data generated in 13.289261234 sec.
Processing data.
['aabemr', '00008e6ca6344edf9c18f10aece54f9310e4168d04dcc311dc86076ec07cff8c']
['acknop', '0000b08e6f1d2394ecdbc8a6e9f0059691ecc09f4d1259d2aab94cb30497c1a6']
['ahirsv', '00000f27f60cb204372da2817b04a7141bef0420d2668f91104fb418719f0e72']
['bcjlwz', '0000c9185bb896a481c681a3924717f8a7978571f4fee4bc7dc3d80294ee1bd9']
['bfikrt', '0000805b0c879dfd5caed75add6709226647c533a94cbd40f3903716e49261f4']
['ccklvx', '0000197ba3cba0c4963bc51bbf8cb5ecb9d0b0b4d6eaf8a8b22cf7659dd5bb98']
['cghmqu', '00007805f52991eca88484bf6cf228e3b1f1a40909a613fd1fa0f6ffe16b5747']
['cgikkz', '000023e46b7bd23d513456e39abbd0b11c8525241cf621933086340f03d439c9']
['dfgizz', '00002511546d6f76d576efef2a63bd190a5a8c1ee5c70fd14d432873495fe364']
['dgjkpt', '0000df90701603046058377eff15e8392f33661cf05a994a4cec491a0a3a2848']
['ffmnot', '00005cefea5db1363b60dc309f5d0bac0c6560dbe048a1943949ea80fcca57c7']
['filuuz', '0000a28ef6055e9a1813881f28e6eb8a0ba939868017705c5a76156375948bab']
['gnptyz', '00003c09995fc8d10f7de62bdaedfb10fc672592e8e7b107d904d267e867b1d9']
['lmqrst', '00004174ccf6831504a807bbe7fbfc47228d34c8d483b9ad3aaf4bd1336c5ff5']
Execution time:  5.500000069
Total time:  18.789261303

As you can see, it takes three times as long to make the data as it does to analyse it. That fact aside we have a benchmark time of 5.5 sec to beat with parallel processing.

Mining code - parallel processing

To run this code on the cluster it needs to be modified to this:

import time
import itertools
from hashlib import sha256
from mpi4py import MPI
from socket import gethostname

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

alpha = 'abcdefghijklmnopqrstuvwxyz'
combos = list(itertools.combinations_with_replacement(alpha, 6))

def check_hash(nonce_list):
	results = []
	for nonce in nonce_list:
		hash = sha256(nonce.encode()).hexdigest()
		if hash[:4] == '0000':
			results.append([nonce, hash])
	#print(gethostname(), results)
	return results


start = time.perf_counter()

if rank == 0:
	print('Generating data.  Please wait.')
	nonce_list = []
	for combo in combos:
		nonce = ''
		for letter in combo:
			nonce += nonce.join(letter)
		nonce_list.append(nonce)
	data_time = time.perf_counter()
	print(f'Data generated in {data_time - start} sec.')

else:
	pass

NODES = 4  # This has to be hard coded to match the nodes in the execution command.

if rank == 0:
	print('Preparing data for scatter')
	data = []
	for i in range(NODES):
		data.append([])

	for i in range(len(nonce_list)):
		data[i%NODES].append(nonce_list[i])
	divide_time = time.perf_counter()
	print(f'Prepared for scatter in {divide_time - data_time} sec.')

else:
	data = None

# This is the actual parallel processing part so this is the part I will time.
if rank == 0:
	print('Parallel processing commencing.')

parallel_start = time.perf_counter()
data = comm.scatter(data, root=0)
results = check_hash(data)

new_data = comm.gather(results, root=0)

if rank == 0:
	for result in new_data:
		print(result)

stop = time.perf_counter()
if rank == 0:  # Only the master should print the execution time.
	print('Processing time: ', (stop - parallel_start))
	print('Total time: ', (stop - start))

Even when the task is scattered over only four processes the output is:

andrew@master:~ $ python execute.py mining_parallel.py 4
mining_parallel.py                                                                                                                                                                100% 1724   813.4KB/s   00:00    
mining_parallel.py                                                                                                                                                                100% 1724   793.8KB/s   00:00    
mining_parallel.py                                                                                                                                                                100% 1724   783.6KB/s   00:00    
Generating data.  Please wait.
Data generated in 14.442740144993877 sec.
Preparing data for scatter
Prepared for scatter in 1.1794833989988547 sec.
Parallel processing commencing.
[['bcjlwz', '0000c9185bb896a481c681a3924717f8a7978571f4fee4bc7dc3d80294ee1bd9'], 
    ['dgjkpt', '0000df90701603046058377eff15e8392f33661cf05a994a4cec491a0a3a2848'], 
    ['ffmnot', '00005cefea5db1363b60dc309f5d0bac0c6560dbe048a1943949ea80fcca57c7'], 
    ['gnptyz', '00003c09995fc8d10f7de62bdaedfb10fc672592e8e7b107d904d267e867b1d9']]
[['bfikrt', '0000805b0c879dfd5caed75add6709226647c533a94cbd40f3903716e49261f4'], 
    ['cghmqu', '00007805f52991eca88484bf6cf228e3b1f1a40909a613fd1fa0f6ffe16b5747'], 
    ['filuuz', '0000a28ef6055e9a1813881f28e6eb8a0ba939868017705c5a76156375948bab']]
[['aabemr', '00008e6ca6344edf9c18f10aece54f9310e4168d04dcc311dc86076ec07cff8c'], 
    ['ahirsv', '00000f27f60cb204372da2817b04a7141bef0420d2668f91104fb418719f0e72'], 
    ['ccklvx', '0000197ba3cba0c4963bc51bbf8cb5ecb9d0b0b4d6eaf8a8b22cf7659dd5bb98'], 
    ['cgikkz', '000023e46b7bd23d513456e39abbd0b11c8525241cf621933086340f03d439c9'], 
    ['dfgizz', '00002511546d6f76d576efef2a63bd190a5a8c1ee5c70fd14d432873495fe364']]
[['acknop', '0000b08e6f1d2394ecdbc8a6e9f0059691ecc09f4d1259d2aab94cb30497c1a6'], 
    ['lmqrst', '00004174ccf6831504a807bbe7fbfc47228d34c8d483b9ad3aaf4bd1336c5ff5']]
Processing time:  3.4473998640023638
Total time:  19.069701376996818

The newline feeds at the end of each nonce, hash pair are my formatting to make this easier to interpret. The processing time can be seen as 3.45 sec. This makes parallel processing across all four nodes about 40% faster.

The final test is to use the full 16 cores at our disposal. The line,

NODES = 4

needs to be changed to,

NODES = 16

The final output is:

andrew@master:~ $ python execute.py mining_parallel.py 16
mining_parallel.py                                                                                                                                                                100% 1725   717.3KB/s   00:00    
mining_parallel.py                                                                                                                                                                100% 1725   823.0KB/s   00:00    
mining_parallel.py                                                                                                                                                                100% 1725   767.1KB/s   00:00    
Generating data.  Please wait.
Data generated in 19.46768968399556 sec.
Preparing data for scatter
Prepared for scatter in 1.1892309429968009 sec.
Parallel processing commencing.
[['gnptyz', '00003c09995fc8d10f7de62bdaedfb10fc672592e8e7b107d904d267e867b1d9']]
[['filuuz', '0000a28ef6055e9a1813881f28e6eb8a0ba939868017705c5a76156375948bab']]
[]
[['acknop', '0000b08e6f1d2394ecdbc8a6e9f0059691ecc09f4d1259d2aab94cb30497c1a6']]
[['dgjkpt', '0000df90701603046058377eff15e8392f33661cf05a994a4cec491a0a3a2848'], 
    ['ffmnot', '00005cefea5db1363b60dc309f5d0bac0c6560dbe048a1943949ea80fcca57c7']]
[['bfikrt', '0000805b0c879dfd5caed75add6709226647c533a94cbd40f3903716e49261f4']]
[['ahirsv', '00000f27f60cb204372da2817b04a7141bef0420d2668f91104fb418719f0e72'], 
    ['cgikkz', '000023e46b7bd23d513456e39abbd0b11c8525241cf621933086340f03d439c9']]
[['lmqrst', '00004174ccf6831504a807bbe7fbfc47228d34c8d483b9ad3aaf4bd1336c5ff5']]
[['bcjlwz', '0000c9185bb896a481c681a3924717f8a7978571f4fee4bc7dc3d80294ee1bd9']]
[['cghmqu', '00007805f52991eca88484bf6cf228e3b1f1a40909a613fd1fa0f6ffe16b5747']]
[['aabemr', '00008e6ca6344edf9c18f10aece54f9310e4168d04dcc311dc86076ec07cff8c'], 
    ['ccklvx', '0000197ba3cba0c4963bc51bbf8cb5ecb9d0b0b4d6eaf8a8b22cf7659dd5bb98'], 
    ['dfgizz', '00002511546d6f76d576efef2a63bd190a5a8c1ee5c70fd14d432873495fe364']]
[]
[]
[]
[]
[]
Processing time:  2.5700334959983593
Total time:  23.22703177999938

This time we can see that some nodes (six actually) did not find any suitable hashes at all. Note also that the processing time has dropped to 2.57 sec. Even with the added overhead of scattering and gathering the cluster is still nearly twice as fast as the single computer which was probably accessing all four of its cores anyway

Last modified: 22 April 2024