Benchmarking hybrid architectures of BERT-based embeddings with CNN and LSTM for real-time phishing URL detection

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

International Conference on Applied and Pure Sciences, 2025

Abstract

Phishing and malicious URL attacks are escalating threats in the digital landscape, demanding real-time detection. Current research is constrained by feature-engineered datasets and limited model combinations. This study introduces a novel, exhaustive investigation into hybrid deep learning architectures for malicious URL classification, using raw URL strings without pre-extracted features. We combine pre-trained transformers (BERT, URLBERT, DomURLs-BERT) with Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to capture semantic and sequential/spatial patterns. Six models are evaluated on a large benchmark dataset. Comparative analysis reveals performance differentials, identifying the most effective architecture based on accuracy and F1 score. This work provides a comprehensive benchmarking framework and demonstrates the promise of BERT-based string-level processing for real-world phishing detection.

Description

Citation

Maduwanthi, W. V. C., Tharaka, Y. M. S. & Hewapathirana, I. U.(2025). Benchmarking hybrid architectures of BERT-based embeddings with CNN and LSTM for real-time phishing URL detection, International Conference on Applied and Pure Sciences, 2025. 330-335

Endorsement

Review

Supplemented By

Referenced By