Benchmarking hybrid architectures of BERT-based embeddings with CNN and LSTM for real-time phishing URL detection
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
International Conference on Applied and Pure Sciences, 2025
Abstract
Phishing and malicious URL attacks are escalating threats in the digital landscape, demanding real-time
detection. Current research is constrained by feature-engineered datasets and limited model
combinations. This study introduces a novel, exhaustive investigation into hybrid deep learning
architectures for malicious URL classification, using raw URL strings without pre-extracted features.
We combine pre-trained transformers (BERT, URLBERT, DomURLs-BERT) with Convolutional
Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to capture semantic and
sequential/spatial patterns. Six models are evaluated on a large benchmark dataset. Comparative
analysis reveals performance differentials, identifying the most effective architecture based on accuracy
and F1 score. This work provides a comprehensive benchmarking framework and demonstrates the
promise of BERT-based string-level processing for real-world phishing detection.
Description
Citation
Maduwanthi, W. V. C., Tharaka, Y. M. S. & Hewapathirana, I. U.(2025). Benchmarking hybrid architectures of BERT-based embeddings with CNN and LSTM for real-time phishing URL detection, International Conference on Applied and Pure Sciences, 2025. 330-335