Optimizing selfish mining strategies through deep reinforcement learning

Wijewardhana, W. T. R. N. D. K.; Vidanagamachchi, S. M.; Arachchilage, N. A. G.

Optimizing selfish mining strategies through deep reinforcement learning

Files

ICAPS 2024-Proceedings Book_20241027-49-217-pages-133.pdf (591.12 KB)

Date

2024

Authors

Wijewardhana, W. T. R. N. D. K.

Vidanagamachchi, S. M.

Arachchilage, N. A. G.

Publisher

Faculty of Science, University of Kelaniya Sri Lanka

Abstract

Selfish mining is a type of mining attack where miners strategically release blocks to create forks in the main branch with the intention of acquiring a large portion of the mining reward. Traditional strategies use a Markov Decision Process (MDP) with a non-linear objective function that requires variable blockchain parameters, which are hard to determine, while model-free approaches like multidimensional Q-learning overcome this by learning optimal policies without prior blockchain information. Despite this, existing algorithms remain largely impractical for real blockchain networks, as they fail to account for realistic blockchain features, exhibit inefficient learning in large state spaces, and suffer from slow convergence rates. In this work, we propose a novel model-free Deep Reinforcement Learning (DRL) algorithm for optimal selfish mining, enabling dynamic learning without requiring prior knowledge of network parameters. The study aims to leverage deep neural networks along with advanced exploration and experience replay mechanisms to achieve faster convergence and improved learning efficiency in large state spaces which are inherent in real-world blockchain instances. The non-linearity of the objective function is addressed by incorporating two Double DQNs (DDQNs), one for adversary and one for honest network, which work together to effectively optimize the non-linear objective function. The proposed model is evaluated by constructing a Bitcoin-like Proof-of-Work blockchain simulator which takes into account various real-world blockchain parameters such as stale block rates, propagation delays, and eclipse attacks. Our simulations indicate that the proposed model achieves optimal gains while enhancing the robustness and convergence of the algorithm in large state spaces and dynamically adjusting the mining policy as the blockchain environment evolves.

Keywords

Blockchain, Bitcoin, Selfish mining, Deep reinforcement learning

Citation

Wijewardhana W. T. R. N. D. K.; Vidanagamachchi S. M.; Arachchilage N. A. G. (2024), Optimizing selfish mining strategies through deep reinforcement learning, Proceedings of the International Conference on Applied and Pure Sciences (ICAPS 2024-Kelaniya) Volume 4, Faculty of Science, University of Kelaniya Sri Lanka. Page 133

URI

http://repository.kln.ac.lk/handle/123456789/28878

Collections

ICAPS 2024

Full item page

Optimizing selfish mining strategies through deep reinforcement learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By