A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet Malicious Activity Blacklists
Published in Asia Computer and Communication Security Conference (CCS), 2019
Recommended citation: Benjamin Zi Hao Zhao, Muhammad Ikram, Hassan Asghar, Mohamed Ali Kaafar, Abdelberi Chaabane, and Kanchana Thilakarathna, "A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet Malicious Activity Blacklists", In Asia Computer and Communication Security Conference (CCS), 2019. https://internetmaliciousactivity.github.io
Abstract: This paper focuses on reporting of Internet malicious activity (ormal-activity in short) by public blacklists with the objective of pro-viding a systematic characterization of what has been reportedover the years, and more importantly, the evolution of reportedactivities. Using an initial seed of 22 blacklists, covering the periodfrom January 2007 to June 2017, we collect more than 51 millionmal-activity reports involving 662K unique IP addresses worldwide.Leveraging the Wayback Machine, antivirus (AV) tool reports andseveral additional public datasets (e.g., BGP Route Views and Inter-net registries) we enrich the data with historical meta-informationincluding geo-locations (countries), autonomous system (AS) num-bers and types of mal-activity. Furthermore, we use the initiallylabelled dataset of approx. 1.57 million mal-activities (obtained from pub-lic blacklists) to train a machine learning classifier to classify theremaining unlabeled dataset of approx. 44 million mal-activities obtainedthrough additional sources. We make our unique collected dataset(and scripts used) publicly available for further research.
The main contributions of the paper are a novel means of reportcollection, with a machine learning approach to classify reportedactivities, characterization of the dataset and, most importantly,temporal analysis of mal-activity reporting behavior. Inspired byP2P behavior modeling, our analysis shows that some classes of mal-activities (e.g., phishing) and a small number of mal-activity sourcesare persistent, suggesting that either blacklist-based preventionsystems are ineffective or have unreasonably long update periods.Our analysis also indicates that resources can be better utilized byfocusing on heavy mal-activity contributors, which constitute thebulk of mal-activities.
Recommended citation: ‘Benjamin Zi Hao Zhao, Muhammad Ikram, Hassan Asghar, Mohamed Ali Kaafar, Abdelberi Chaabane, and Kanchana Thilakarathna, "A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet Malicious Activity Blacklists", In Asia Computer and Communication Security Conference (CCS), 2019.’