This post is about the bench-marking of the Free Tiger hashing algorithm, Stage-II of the project.
Compilation and build:
Although the package was last updated in 2012, I was able to build the package successfully without any changes to the code. The package has an included tester which tests the operation of a particular hashing functions with string data ranging from a few bytes to 2MBs. The build ran successfully and put out numbers in milliseconds indicating how much time it took to hash the given string. The results are attached below. The package doesn’t have any architect specific files and the test I ran was on a 64-bit, x86-64 machine with Fedora 28.
The test setup inside the tester is pretty straight forward. The test data is in the form of c-strings and the tester tests the functioning of one hashing function at a time. So I changed the tester to use the tiger_1025 hashing function which is optimized to handle large string values. I was able to add a block of code which reads a large text file, in this case it’s 2MBs, and feeds it to the hashing function.
Bench-marking Results and Analysis
The numbers I got are an average based on 50,000 runs of the tester using the same data set on each iteration. The test gives a pretty good idea of how much time each type of data takes to process and at the end of the test you also get the rate of processing data in bits/sec which can be essential in scenarios where we’re dealing with huge data sets.
I bench-marked the tiger_1025 function, but to benchmark other functions, you only have to change the call at one other location in the tester. So, the same data set and same tester can be used over and over for different functions or for all the functions in the algorithm. The arguments for each function do differ a little but there are detailed pretty nicely so changing up the function call is easy.
|String type||Small strings||512 bit strings||128 byte/1024 bit strings||64Kb strings||2+MB strings|
|Time taken (secs)||0.0041363||0.004556093||0.005078058||0.39185025||13.31463184|
During the hashing most of the time is spent in randomizing and salting, and this may explain why the time taken to has Small strings, 512 bit strings, and 1024 bit strings is so close to each other.
As shown in this graph, Size – time relationship of the algorithm, the relationship looks exponential but this is not true because the points plotted in the graph do not have the same intervals. To analyze the exact relationship, a lot more data would be required.
For the 3rd stage of this project I’ll be working on the function in discussion here to see if I can make improvements over the current implementation and hopefully improve the run-times or the algorithm overall.