TAILIEUCHUNG - Spam email filtering based on machine learning
In the paper, we are going to present a spam email filtering method based on machine learning, namely Naïve Bayes classification method because this approach is highly effective. With the learning ability (self improving performance), a system applied this method can automatically learn and ameliorate the effect of spam email classification. Simultaneously, the ability of system’s classification is also updated by new incoming emails, therefore, it is very difficult for spammers to overcome the classifier, compared to traditional solutions. | Trịnh Minh Đức Tạp chí KHOA HỌC & CÔNG NGHỆ 118(04): 133 - 137 7 SPAM EMAIL FILTERING BASED ON MACHINE LEARNING Trinh Minh Duc* College of Information and Comunication Technology – TNU SUMMARY In the paper, we are going to present a spam email filtering method based on machine learning, namely Naïve Bayes classification method because this approach is highly effective. With the learning ability (self improving performance), a system applied this method can automatically learn and ameliorate the effect of spam email classification. Simultaneously, the ability of system’s classification is also updated by new incoming emails, therefore, it is very difficult for spammers to overcome the classifier, compared to traditional solutions. Key words: Machine learning, email spam filtering, Naïve Bayes. INTRODUCTION* The Email classification is actually the twoclass text classification problem, that is: the early dataset consists of spam and non-spam emails, the texts to be classified as the emails are sent to inbox. The output of the classification process is to determine the class label for an email – belonging to either one of the two classes: spam or non-spam . The general model of the spam email classification problem can be discribed as follows: The categorization process can be divided two phases: The training phase: The input of this phase is the set of spam and non-spam emails. The output is the trained data applied a suitable classification method to serve for the classification period. The classification phase: The input of this phase is an email, together with the trained data. The output is the classification result of the email: spam or non-spam. The rest of this paper is organized as follows. In Sect. 2, we formulate Naïve Bayes classification method and our solution. In Sect. 3, we show experimental results to evaluate the efficiency of this method. Finally, in Sect. 4, we conclude by showing possible future directions. NAïVE BAYES METHOD [4] Figure .
đang nạp các trang xem trước