Abstract:
In most distributed data storage systems, a leader based replication is used. However, this architecture introduces significant latency in a geo-distributed environment, especially when writes are directed only to a leader node situated at large distances. The aim of this study was to develop a distributed key-value data store with leaderless replication. This would allow write operations to be performed on the nearest geographical node, reducing latency caused by physical distance. The system was implemented in the Go programming language and employs a leaderless consensus algorithm called Caesar to ensure data consistency. The final system can be configured to use either HTTP or gRPC communication protocol, and either B-tree or LSM (log-structured merge-tree) storage index. A benchmarking on a local computer was done in order to assess the performance of the implemented system. The system achieved a peak throughput of 14,000 read requests per second across three nodes. Under peak load, the maximum observed latency for read operations was 10 ms. 95% of read requests completed in under 1.58 ms, as monitored via Grafana dashboards. Write operations showed a higher latency, with 95% of write requests completing within 150 ms at a sustained load of 120 writes per second. The benchmarking on a local computer confirmed the system's capacity for high-throughput, low-latency operations, but the results are constrained by the non-distributed testing setup.