AI Spam Detection

Why This Project Area?

I am writing this two years in retrospect so some details may be missing or brief. This was the final year project I completed for my degree at Edge Hill University. I graduated in 2024 which meant that two years earlier ChatGPT was released with the GPT-3.5 model, often cited as the beginning of the AI boom. Because of this I felt I would be disadvantaged by not having some kind of AI project for my third-year project, and it was, at the time, something knew to learn which would be both fun and a real challenge. I also loved doing the cyber security modules at university, so I tried to combine both these areas, specifically phishing and spam emails. This lead to a rather funnily long title "Using Machine Learning to Detect Phishing Email Attacks to Overcome the Flaws of Traditional Spam Filters".

Abstract

Harry from 2024 would know more about this than me, so ill let him explain with the abstract. "Phishing email attacks represent a significant amount of the threat in the cyber security landscape, vulnerable individuals and organisations are at risk due to human error in recognition of social engineering techniques and flaws with current phishing detectors such as traditional spam filters which rely on predominantly trigger words for detection. A thorough literature review creates a basis to expand upon existing knowledge analysing the existing methods of evaluation and the current state-of-the-art. This project explored the use of machine learning techniques and the natural language processing tool BERT for creating representations of semantic meaning, while implementing a pragmatic approach to model training. The model is evaluated with industry standard metrics and compared to other studies concluding that there is evidence of some improvement amongst existing methods, with this project yielding an accuracy and recall of approximately 98%. The model has also been integrated into a piece of software to demonstrate how the model could be incorporated into a live application. The report also provides insight into future work on how this model could be improved to increase its performance metrics."

Presentation

I went on to present my project at the 2024 computer science end of year project showcase, wherein we got to show off our projects to the entire department and defend our findings and methodology. If you would like to read the whole report, click here. I received a first-class mark for this project, achieving a first overall grade for my three years at Edge Hill.

Gallery:

Tap the gallery to open it, then swipe to navigate through the images