Data Mining Applications in Cyber Security

The following sample Information Technology research paper is 3134 words long, in unknown format, and written at the master level. It has been downloaded 772 times and is available for you to use, free of charge.

The world that we live in is becoming increasingly dependent on the internet and the safe operation of internet-connected systems.  A well-placed viral attack has already demonstrated how hospitals, police departments, and large companies can be impacted by a virus attack against their systems [1].  The field of cyber security is an emerging trend that is designed to address the threats that exist against activities in the cyber world, which are considerable [2].  Through recent advances in computing science, it has become possible for data mining of data sets which can range from hundreds of terabytes to even petabytes [3].  Cyber security professionals are using this new data mining capacity to creatively develop algorithms that can analyze network activity and predict when new activity on that network is anomalous, indicating a threat [4].  Other professionals have been able to surveil online activity to detect chatter of those intent on creating new threats, providing the information needed to proactively stop future attacks when they finally emerge [5].  By using the data mining applications that are currently under development, this will allow cyber security professionals to create a safer tomorrow for a world that is becoming more reliant on an interconnected world.

Data Mining Applications in Cyber Security


As computers and internet connected devices become an essential part of the economy and daily living, the ability to protect these devices and their actions becomes even more important.  As recent events have demonstrated, simple viral attacks have the ability to take down entire networks of police departments, hospitals, and government offices [1].  It is well known that networks and the internet contain huge bodies of data, however what is now emerging is a new field of science, where data mining of that information is being performed that can allow insights to protect against future cybercrimes.

Literature Review

Cyber Security

A sector of the information technology industry that is experiencing some of the most rapid growth is that of cyber security.  As the number of cybercrimes increases, the need for cyber security to keep pace with the growing threat rises as well [2], [6], [7].  Due to the ubiquitous nature of the web, and the increasing use of its’ ability to share data, many new devices are coming online on the web, as part of the Internet of Things (IoT) [8]–[10].  Also because of the tremendous volume of data being sent across the internet and local networks, this presents strains on already limited resources within businesses [3], [11].

Individuals and groups involved in cyber security perform a number of functions.  Cyber security experts may be called upon to perform forensic investigations to determine tactics and methods used by a criminal in the commission of a cybercrime [2].  Cyber security professionals may proactively search for emerging threats on the internet or in large data pools [5], [12], [13].  They may also be involved in the designing of more robust intrusion detection systems or intrusion protection systems to secure existing networks against future attacks [4], [14], [15].

Data Mining

The unprecedented growth of data on the internet has led to the creation of data warehouses that can house amounts of data that were inconceivable when the PC was initially created.  Data storage has gone from floppy disks that could hold hundreds of bytes [1] to data sets with hundreds of terabytes and even petabytes [3].  This explosion of data has led to organizations around the world, and their information technology departments developing data mining techniques that allow them to gain business insights [9], [16], [17], educational insights [18], [19], medical insights [20], [21], and even insights into cyber security [22], [23].

The practice of data mining involves taking large data sets, such as those already discussed in this review, and using a variety of tools to extract insights from the data contained in the large data sets.  IBM’s Watson is an example of machine learning, and this tool has been used to pore over millions of records in medical data sets in order to “learn” how to give doctors better insight on particular conditions, based on the latest evidence-based medicine [20].  Data mining is being used by the government to analyze tremendous amounts of data to predict future crimes that may occur [24].  Data mining can also use algorithms to predict future behavior, such as what products an individual may purchase in the future, or the likelihood of a store running out of a particular product, based on buying trends [25].  As this section has demonstrated, data mining is a powerful tool that can be utilized to gain much insight from the large data sets that are now being generated.

Data Mining Applications for Cyber Security

Data Mining in the Network Context

When it comes to the use of data mining in information technology, there are a number of contexts in which such data mining will be used.  The first area to consider is that of the organizational network.  Network analysts within organizations must find the best way to secure the assets within the organization from threats, both internal and external [6], [23], [26], [27].  Some of the tools that are commonly used to provide such protection are firewalls, intrusion detection systems (IDS), and intrusion protection systems (IPS) [28].  Intrusion detection systems (IDS) are devices that sit inside the network (or within a network partition) that monitor the network for various kinds of attacks that could attack systems or the network itself [29].  They may reveal network intrusions, attempted root kit attacks, or even errors in system logs that identify some sort of attempt at malicious actions [29].  As has been noted, logs can play a large role in the activities of IDS systems.  However, on a large network with a substantial amount of traffic, this could lead to enormous amounts of data that start verging into the area of big data.

Information technology experts have been working on ways to utilize the power of data mining to bolster the defenses of IDS systems.  For example, one group of researchers developed a C4.5 decision tree algorithm to analyze a 83-point dimensional vector of behavior of applications running on PCs to determine if behavior being exhibited was anomalous as demonstrated in Figure 1 [4].  The result of their research was a system that could detect previously unidentified malware behavior with a 2.0% false negative rate and a 5.4% false positive rate, making this a highly attractive and accurate approach using data mining to thwart new malware attacks [4].

Figure 1 – Table or image redacted in preview but included in download

Singh and Puthran [30] noted that while signature-learning algorithms like C4.5 and decision trees may be effective at algorithmically identifying new threat vectors, if the malware is designed in such a way that it is too close in behavior to normal applications, it may slip through such analysis undetected.  They note that hybrid approaches, such as combining K-Means with C4.5, or J48 with Random Tree may be able to pick up these fainter signals, using data mining to identify even more threats before they are able to execute their payloads.  [31] suggested that IDS could be made more robust through the use of data mining and machine learning techniques.  [32] examined common data sets that are available for IDS machine learning such as UNSW-NB15 and KDD99 datasets.  They noted that the use of such data sets on IDS can improve their detection rate and reduce the false alarm rate (FAR).

While IDS take a more passive role in network security, catching errors as they happen, intrusion protection systems (IPS) take a more proactive role, actively intercepting identified threats within a network before they can do more harm [33], [34].  As [4] noted, IPS frequently use blacklists or signature lists of known attacks to identify malicious activity on a network.  While this approach is adequate to deal with threats that are already known, it ends up failing when there are zero-day attacks or advanced persistent threats, which are constantly mutating.  Data mining of all of the behaviors on the network and use of advanced algorithms to sort through them can allow for much more detailed identification of threat patterns that can then be entered into IPS in order to make them more robust against attacks that are unknown or unidentified in their blacklists or signature files.

Proactive Data Mining to Detect New Threats

While the discussion thus far has focused on mainly defensive approaches, data mining can also be used as a proactive approach to seek and find emerging threats in large data sets.  While there are cyberattacks that are conducted by sophisticated programmers who have substantial programming skills – enough to invent attacks unlike anything previously seen before – a large number of hackers uses tools, tradecraft, and techniques which are shared in open data sources [5].  To engage in these attacks, attackers must determine the vulnerabilities present, get the expertise and tools to craft an attack, find targets, get participants, and devise and implement the attack.  Since many of these steps involve open source and discussions on web forums, data mining of such forums could reveal an attack and its’ nature before it happens [5].

Data mining could also be used to prevent distributed denial of service (DDoS) attacks.  Such attacks generally are not well protected by traditional means such as firewalls, IDS, or IPS [13].  By collecting huge amounts of data from the CPU, network bandwidth, and memory, a data set can be constructed that can be analyzed and modeled to create a model of “proper” network activity, so that when an attack occurs, the data mining engine is able to identify it and shut it down [13].

Figure 2 – Table or image redacted in preview but included in download

The explosion of internet-connected devices has created the IoT which presents another major security challenge.  As [35] notes, security in this area is often not well-defined, and security standards are still not clearly established.  As Figure 2 illustrates, there are four basic levels that security needs to be provided on, the human interface, the IoT processors and platforms, the communication networks and controllers, and the IoT devices themselves.  Each of these levels creates a huge amount of data.  Similar steps noted by [31] could be applied to the traffic within an IoT setting to find anomalous traffic, allowing network devices to be set to protect IoT devices as well as traditional network devices.


As this literature review has revealed, as the amount of computing devices and the amount of data has increased, so has the potential for increased risk of attack.  As governments, education, industry, and even individual consumers are becoming more reliant upon technology for their daily activities, the security of these systems is paramount.  As was demonstrated in the literature review, IDS and IPS that rely on blacklists or signature files are inadequate because they are unaware of future threats which may be currently under development.  Therefore, the use of proactive systems, such algorithmic learning trees and machine learning can allow for the deriving of intelligence from the massive amounts of data that are produced both within and outside of networks.

Additionally, data mining can be successfully used to surveil public networks and examine patterns of discussion which may indicate that future threats are under development.  By having advanced knowledge of these threats and their nature, this will allow network protection systems to be programmed to identify and respond to such threats as they emerge.  In the past, analyzing all of the data and information generated within a network would have been inconceivable, so it was not even something that anyone would consider.  However, now, due to the development of advanced algorithms, machine learning, and massive database systems, the analysis of these data sets is now possible.  This could allow for the development of very specific patterns to be built that future activity could be checked against.  As was demonstrated in this report, such an approach shows a very low false positive and false negative response, making this form of data mining highly effective.


As this report has discussed, the economy and lives of most individuals today are highly reliant on secure access to the internet.  When threats interrupt the smooth transaction of business, commerce, or personal activities, this creates a huge disruption, and can cost large amounts of money.  As this report has demonstrated, data mining presents the potential to perform valuable tasks to both identify threats and prevent threats to all of the systems and devices that we use.  By the intelligent use of data mining in cyber security, information technology and cybercrime professionals are working hard to make the world a safer place for all of us.


[1] S. Mohurle and M. Patil, “A brief study of Wannacry threat: Ransomware attack 2017,” Int J Adv Res Comput Sci, vol. 8, no. 5, pp. 2016–2018, 2017.

[2] Department of Homeland Security, “The 2014 Quadrennial Homeland Security Review,” 2014.

[3] E. Knapp, “Chapter 9 - Monitoring Enclaves,” in Industrial Network Security, 2011, pp. 215–247.

[4] D. Moon, S. B. Pan, and I. Kim, “Host-based intrusion detection system for secure human-centric computing,” J Supercomput, vol. 72, no. 7, pp. 2520–2536, 2016.

[5] A. Sapienza, S. K. Ernala, A. Bessi, K. Lerman, and E. Ferrara, “DISCOVER: Mining Online Chatter for Emerging Cyber Threats,” in Proceedings of the The Web Conference 2018, 2018, pp. 983–990.

[6] C. S. Kruse, B. Frederick, T. Jacobson, and D. K. Monticone, “Cybersecurity in healthcare: A systematic review of modern threats and trends,” Technol Heal Care, vol. 25, no. 1, pp. 1–10, 2017.

[7] Z. Katzir and Y. Elovici, “Quantifying the Resilience of Machine Learning Classifiers Used for Cyber Security,” Expert Syst Appl, vol. 92, pp. 419–429, 2018.

[8] J. Gómez, B. Oviedo, and E. Zhuma, “Patient monitoring system based on Internet of Things,” in Procedia Computer Science, 2016, vol. 83, pp. 90–97.

[9] M. Klun and P. Trkman, “Business process management – at the crossroads,” Bus Process Manag J, vol. 23, no. 6, pp. 1108–1128, 2017.

[10] J. Mohammed, C.-H. Lung, A. Ocneanu, A. Thakral, C. Jones, and A. Adler, “Internet of Things: Remote patient monitoring using web services and cloud computing,” in 2014 IEEE Int Conf Internet Things (iThings), IEEE Green Comput Commun IEEE Cyber, Phys Soc Comput, 2014, pp. 256–263.

[11] R. Henry and S. Venkatraman, “Big Data Analytics the Next Big Learning Opportunity,” Acad Inf Manag Sci J, vol. 18, no. 2, pp. 17–29, 2015.

[12] D. T. Sullivan, “Survey of Malware Threats and Recommendations to Improve Cybersecurity for Industrial Control Systems Version 1 . 0,” 2015.

[13] M. Nijim, H. Albataineh, M. Khan, and D. Rao, “FastDetict: A Data Mining Engine for Predecting and Preventing DDoS Attacks,” in 2017 IEEE International Symposium on Technologies for Homeland Security, HST 2017, 2017.

[14] PCI SSC, “PCI DSS Quick Reference Guide,” PCI Secur Stand Doc, pp. 1–40, 2015.

[15] I. Butun, S. D. Morgera, and R. Sankar, “A Survey of Intrusion Detection Systems in Wireless Sensor Networks,” IEEE Commun Surv Tutorials, vol. 16, no. 1, pp. 266–282, 2014.

[16] L. McGuigan and G. Murdock, “The medium is the marketplace: Digital systems and the intensification of consumption,” Can J Commun, vol. 40, no. 4, pp. 717–726, 2015.

[17] A. Vera-Baquero, O. Molloy, R. Colomo-Palacios, and M. Elbattah, “Business process improvement by means of big data based decision support systems: A case study on call centers,” Int J Inf Syst Proj Manag, vol. 3, no. 1, pp. 5–26, 2015.

[18] P. J. Piety and D. T. Hickey, “Educational Data Sciences: Framing Emergent Practices for Analytics of Learning, Organizations, and Systems,” in Proceedings of the Fourth International Conference on Learning Analytics And Knowledge - LAK ’14, 2014, pp. 193–202.

[19] G. Siemens and R. S. J. D. Baker, “Learning analytics and educational data mining,” in Proceedings of the 2nd International Conference on Learning Analytics and Knowledge - LAK ’12, 2012, p. 252.

[20] Y. Chen, E. Argentinis, and G. Weber, “IBM Watson: How cognitive computing can be applied to big data challenges in life sciences research,” Clin Ther, vol. 38, no. 4, pp. 688–701, 2016.

[21] Y. Zhang, M. Qiu, C. W. Tsai, M. M. Hassan, and A. Alamri, “Health-CPS: Healthcare Cyber-Physical System Assisted by Cloud and Big Data,” IEEE Syst J, vol. 11, no. 1, pp. 88–95, 2017.

[22] M. Kantarcioglu and B. Xi, “Adversarial Data Mining: Big Data Meets Cyber Security,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16, 2016, pp. 1866–1867.

[23] P. S. Raj and G. Silambarasan, “Role of Data Mining in Cyber Security,” Int J Eng Sci Comput, vol. 7, no. 7, pp. 13932–13935, 2017.

[24] D. Lyon, “Surveillance, Snowden, and Big Data: Capacities, consequences, critique,” Big Data Soc, vol. 1, no. 2, p. 205395171454186, 2014.

[25] P. Balaraman and S. Chandrasekar, “E-Commerce Trends and Future Analytics Tools,” Indian J Sci Technol, vol. 9, no. 32, pp. 1–9, 2016.

[26] D. B. Lambert, “Dynamic network security control using software defined networking,” Air Force Institute of Technology, 2016.

[27] M. Jouini, L. B. A. Rabai, and A. Ben Aissa, “Classification of security threats in information systems,” Procedia Comput Sci, vol. 32, pp. 489–496, 2014.

[28] M. J. Herring and K. D. Willett, “Active cyber defense: A vision for real-time cyber defense,” J Inf Warf, vol. 13, no. 2, pp. 46–55, 2014.

[29] A. Pal Singh and M. Deep Singh, “Analysis of Host-Based and Network-Based Intrusion Detection System,” Int J Comput Netw Inf Secur, vol. 6, no. 8, pp. 41–47, 2014.

[30] V. Singh and S. Puthran, “Intrusion Detection System Using Data Mining A Review,” in 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), 2016, pp. 587–592.

[31] J. Mathew and S. Ajikumar, “Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection,” Int J Sci Res Comput Sci Eng Inf Technol, vol. 2, no. 2, pp. 92–97, 2017.

[32] N. Moustafa and J. Slay, “The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems,” in Proceedings - 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, BADGERS 2015, 2017, pp. 25–31.

[33] I. C. S. Systems, “7.7.1 SCADA and ICS Systems,” pp. 4–6, 2018.

[34] V. Chang, “Towards a Big Data System Disaster Recovery in a Private Cloud,” Ad Hoc Netw, vol. 35, no. C, pp. 65–82, 2015.

[35] G. Barrie, A. Whyte, and J. Bell, “IoT Security: Challenges and Solutions for Mining,” in ACM International Conference Proceeding Series, 2017.