Next Article in Journal
Influence Analysis of SiC MOSFET’s Parasitic Capacitance on DAB Converter Output
Next Article in Special Issue
Acceleration of Trading System Back End with FPGAs Using High-Level Synthesis Flow
Previous Article in Journal
Hybrid PDA/FIR Filtering for Indoor Localization Using Wireless Sensor Networks
 
 
Article
Peer-Review Record

FPGA-Based High-Throughput Key-Value Store Using Hashing and B-Tree for Securities Trading System

Electronics 2023, 12(1), 183; https://doi.org/10.3390/electronics12010183
by Sunil Puranik 1,*, Mahesh Barve 1, Swapnil Rodi 1 and Rajendra Patrikar 2
Reviewer 1:
Reviewer 2:
Electronics 2023, 12(1), 183; https://doi.org/10.3390/electronics12010183
Submission received: 9 December 2022 / Revised: 22 December 2022 / Accepted: 23 December 2022 / Published: 30 December 2022
(This article belongs to the Special Issue Applications Enabled by FPGA-Based Technology)

Round 1

Reviewer 1 Report

The authors propose a novel key-value storage (KVS) system that is able to batch-store key-value pairs and deals with potential key collisions with ultra-low-latency. The proposed system has to meet a number of criteria, such as fixed key length and a support of at least 40 million keys.  The proposal includes some clever ideas to overcome limitations and increase efficiency in storing and retrieving keys. The algorithms are explicitly laid out and profiled. There are a few issues that should be addressed:
  1. I would like to see a direct comparison of the performance of the proposed method against one or two standard methods. What is the speed advantage? How does it compare with respect to other resources? Does it scale?
  2. Some images appear to be slightly distorted, as if they were stretched. This should be improved. Also, there is a missing space between the figure label colon and the caption, e.g.“Figure 3:Hash Table...” should be “Figure 3: Hash Table...”
  3. Figure 2: There is an arrow pointing upwards but not into the block “Cmd_ ID_alloc_ &_ Reseq”, but into another arrow entering the block. What does this mean, are the values somehow merged before entering the block?
  4. P4: “front-End” should be “front-end”
  5. P4: The name “KVS_Top” is mentioned before being introduced. Is this the name of the entire algorithm, or is it just a part of it? Please clarify.
  6. P6: Right after a list describing the individual buckets, the text continues with a re-iteration of the bucket sizes, although they are all laid out just above. This is unnecessary and can be removed. Also, the authors state that their algorithm does not use buckets of “fixed length”, while their bucket sizes are indeed also fixed, as they do not vary in time. What the authors mean here is that they are of different size, or varying size, instead being of equal size. In the same sentence, there is a “which” that has lost its reference. It is grammatically unclear if the proposed method “results in a wastage of BRAM” or the standard method using “fixed” bucket sizes. This paragraph should be re-worked.
  7. P7: Instead of speaking of “1 16-key bucket” and “2 8-key buckets”, etc, the authors should use number words and write “one 16-key bucket”, “two 8-key buckets”, etc, to improve readability.
  8. P7: The authors should stay consistent throughout with their naming their text. First, they speak of “16-key bucket”, and then a few lines later it is “16_key_bucket”.
  9. What do the expressions “3:0” and “3:2” after the bucket number represent?
  10. P7: “The BRAM which 283 stores the data corresponding to the keys is called data memory.” Is this definition of the generic term “data memory” really necessary?
  11. P14: The list can be improved by removing the repetition of the list item descriptor, e.g. 
    “1. CMD/Key_Generator – Command/key_ generator block generates a 36 bit index 524 randomly” can be replaced by 
    “1. CMD/Key_Generator – generates a 36 bit index 524 randomly.”, and so on. Also “This block” or similar redundant expressions simply can be removed, e.g. 
    “4. Statistics_Report_gen: This block maintains the statistics” can be replaced by 
    “4. Statistics_Report_gen: maintains the statistics”
Concluding, with the above issues are adequately addressed, I recommend the publication of the paper.

Author Response

Reviewer 1

We would like to thank the reviewer for the comments and feedback. Please find our response inline below:

 

Comments and Suggestions for Authors

 

The authors propose a novel key-value storage (KVS) system that is able to batch-store key-value pairs and deals with potential key collisions with ultra-low-latency. The proposed system has to meet a number of criteria, such as fixed key length and a support of at least 40 million keys.  The proposal includes some clever ideas to overcome limitations and increase efficiency in storing and retrieving keys. The algorithms are explicitly laid out and profiled. There are a few issues that should be addressed:

  1. I would like to see a direct comparison of the performance of the proposed method against one or two standard methods. What is the speed advantage? How does it compare with respect to other resources? Does it scale?

Author Response – One of the standard methods for look-up is cuckoo hashing[Ref 16]. This gives a performance of 200 Million searches/sec compared to our method which gives 33 Million searches/sec. However, due to the “cycles” which can result during Insert operations, Inserts take very high response times. So it is not suitable for our use case where Inserts need to be performed in bulk. Also in our design, the objective was to conserve memory. We have done a comparison of  BRAM utilization of our scheme (buckets with varying lengths) with the approach which uses fixed-length buckets for storing colliding keys. [ref 9]. Memory utilization of our approach is less than 20% of the memory used with fixed length buckets scheme. Another approach using Bloom Filter [ref 11] gives the performance of 160,000 searches/sec, which is much less than the performance we get on searches. (33 Million searches/sec for kvs_bram).

Our approach is scalable for the number of keys. The number of keys can be increased and limited only by the availability of BRAM for kvs_bram and HBM for kvs_hbm. However, with KVS_hbm, if the number of levels of B-tree increases, it would increase the latency. The performance still can be maintained using the pipelined design. If the key length is increased, the time for hashing and key comparison would increase slightly, reducing the performance. But this reduction will not be significant, due to the pipelined nature of the design. Time for hashing and comparisons would increase by 1 or 2 clocks maximum even if the key length is increased to 256/512 bits.

  1. Some images appear to be slightly distorted, as if they were stretched. This should be improved. Also, there is a missing space between the figure label colon and the caption, e.g.“Figure 3:Hash Table...” should be “Figure 3: Hash Table...”

Author Response- This has been corrected.

  1. Figure 2: There is an arrow pointing upwards but not into the block “Cmd_ ID_alloc_ &_ Reseq”, but into another arrow entering the block. What does this mean, are the values somehow merged before entering the block?

Author Response – Yes, this means values are merged. The results from kvs_bram and kvs_hbm are merged and checked by Cmd_ ID_alloc_ &_ Reseq block.

  1. P4: “front-End” should be “front-end”

Author Response – This has been corrected.

  1. P4: The name “KVS_Top” is mentioned before being introduced. Is this the name of the entire algorithm, or is it just a part of it? Please clarify.

Author Response – Yes, KVS_top is the name of our top level design. We have modified the manuscript to explain this.

  1. P6: Right after a list describing the individual buckets, the text continues with a re-iteration of the bucket sizes, although they are all laid out just above. This is unnecessary and can be removed. Also, the authors state that their algorithm does not use buckets of “fixed length”, while their bucket sizes are indeed also fixed, as they do not vary in time. What the authors mean here is that they are of different size, or varying size, instead being of equal size. In the same sentence, there is a “which” that has lost its reference. It is grammatically unclear if the proposed method “results in a wastage of BRAM” or the standard method using “fixed” bucket sizes. This paragraph should be re-worked.-

Author Response – This has been corrected. We have tried to remove the word “variable” which gives the impression that bucket capacity varies with time. Please let us know if this is ok.

  1. P7: Instead of speaking of “1 16-key bucket” and “2 8-key buckets”, etc, the authors should use number words and write “one 16-key bucket”, “two 8-key buckets”, etc, to improve readability.

Author Response - This has been corrected.

  1. P7: The authors should stay consistent throughout with their naming their text. First, they speak of “16-key bucket”, and then a few lines later it is “16_key_bucket”.

Author Response - This has been corrected.

  1. What do the expressions “3:0” and “3:2” after the bucket number represent?

Author Response – Bucket_no is a 4 bit vector. So 3:0 means all 4 bits of bucket_no while 3:2 means bits 3 and 2 of bucket_no.

  1. P7: “The BRAM which 283 stores the data corresponding to the keys is called data memory.” Is this definition of the generic term “data memory” really necessary?

Author Response – There is a “bucket_memory” which stores the keys in buckets and data corresponding to the keys is stored in a memory called “data memory”.

  1. P14: The list can be improved by removing the repetition of the list item descriptor, e.g. 
    “1. CMD/Key_Generator – Command/key_ generator block generates a 36 bit index 524 randomly” can be replaced by 
    “1. CMD/Key_Generator – generates a 36 bit index 524 randomly.”, and so on. Also “This block” or similar redundant expressions simply can be removed, e.g. 
    “4. Statistics_Report_gen: This block maintains the statistics” can be replaced by 
    “4. Statistics_Report_gen: maintains the statistics”

Author Response – This has been corrected.

Reviewer 2 Report

The paper presents a specific implementation of a KVS algorithm for trading application. The paper is well written and the results clearly presented. The only question that I have regards the scalability of the design. 

 

- Is it possible to scale the design or, if more performance/keys are required the architecture must be redesigned from scratch? The usage of the FPGA resources is low, even in case of RAM memories, is it possible to further improve the performance by using additional resources? Please add a short pragraph addressing this point.

 

Minor issues:

- The resolution of Figure 1 is very low, please improve it.

- Increase the font size in Figure 2. Please move it on page 5 to improve the readability of the paper.

Author Response

Reviewer 2

 

We would like to thank the reviewer for the comments and feedback. Please find our response inline below:

 

 

Comments and Suggestions for Authors

The paper presents a specific implementation of a KVS algorithm for trading application. The paper is well written and the results clearly presented. The only question that I have regards the scalability of the design. 

 

- Is it possible to scale the design or, if more performance/keys are required the architecture must be redesigned from scratch? The usage of the FPGA resources is low, even in case of RAM memories, is it possible to further improve the performance by using additional resources? Please add a short pragraph addressing this point.

Author Response – The design is scalable for the number of keys. It is limited only by the amount of BRAM available for kvs_bram and HBM availability for kvs_hbm. However, with KVS_hbm, if the number of levels of B-tree increases, it would increase the latency. However, the performance still can be maintained using the pipelined design. If the key length is increased, the time for hashing and key comparison would increase slightly, reducing the performance. But this reduction will not be significant, due to the pipelined nature of the design. Time for hashing and comparisons would increase by 1 or 2 clocks maximum even if key length is increased to 256/512 bits.

If the design is to be scaled for the key_length, many blocks in the design can be reused (hash algorithm, comparator logic etc) as these are parameterized. For improving the performance of kvs_hbm, we are at present modifying the design of kvs_hbm to make the B-tree processing logic pipelined. This requires major modifications to the design. The comparator logic and logic for fetching keys and data from memory can be reused. 

- The resolution of Figure 1 is very low, please improve it.

Author Response – This has been corrected. We have changed the figure 1

- Increase the font size in Figure 2. Please move it on page 5 to improve the readability of the paper.

Author Response – This has been corrected. Figure has been stretched to make text more clear.

Back to TopTop