Cyber Security and Confusion Matrix

How is Confusion Matrix related to cybercrime attacks?

MishanRG
Nerd For Tech
Published in
6 min readJun 4, 2021

--

We are in the world of computer science. Most of the data that are very important to us or the data which protect our privacy are online. We are using different social media, banking, official work everything online. Yes, it has made our life very easy; just in a single click, we can do many things, and in a single click, we can access and store our data online. But there is also the risk with this.

As internet usage has grown in number, cyber attacks and cyber threats have been huge issues.

What is Cyber Attack?

A cyber attack is an attack on the servers or computer in the public or private internet where the attacker seeks to expose, damage, alter, disable or try stealing the current data or changing the system configuration, and that is done unauthorized. The act of doing this cyberattack is called cybercrime.

Some of the examples of cyber attacks are:

  • Stealing corporate attack and hacking servers
  • Exposing someone privacy and harassing
  • Stealing bank details and card details
  • Fishing Sites and Scam
  • IoT device hacking
  • Flooding the servers with unnecessary traffic

These are a few examples of Cyber Attacks. There are many examples in the list.

What is the solution being used in the industry to prevent it?

The IT industry is trying its best to protect the data and protect servers. Many different techniques and applications have been developed to prevent cybercrimes. We even have some organization which is specifically working for the security of the Internet. Different techniques are being used. Some of the techniques that we can see or are using currently are:

  • Protecting Data In Cloud
  • End to End Encryptions
  • SSH Key and Certificates
  • Automate Monitoring Process
  • And Many More…

We have lots of other options and techniques being used by different users and service providers.

Here we will discuss one of the approaches and briefly discuss a small component of that approach.

Machine Learning and Cyber Security

Nowadays, it has been common that every company usually has a lot of data to handle. Here we talk more about the servers and storage security. Human efforts are less likely to be useful and work nowadays, and they are slow also. We also need everything to be automatic, and manual always has some issues. Machine Learning helps the team to manage the servers and keep them safe.

The machine, when combining with human intelligence we can achieve great things at great speed. On the basics of the older pattern of attacks and the threats that the servers might have to deal with, Machine can be trained to recognize that pattern, and every time a new attack happens or when the traffic is being exchanged Machine Learned model can keep an eye in every packet or their activities. When some malicious activity or attack happens, the Machine can warn the security department team, and then the team can look upon that threat before some big mishap may happen. In some cases, Machine Even can solve the issue as set by the user what to do in such a situation. Like shutting the ingress networks or blocking some suspected IP or network for a limited time until developers look upon it.

So let's take a small part of that Machine Learning called as Confusion Matrix. Confusion Matrix is a 2*2 matrix that describes the performance of the classification model. It gives us 4 outputs, and based on that; we can check how good our model is or what we need to focus on.

Above is the diagram of the confusion matrix. Let me break the image and describe each component.

When we use Machine Learning based on our older data or older pattern, it recognizes something new and gives us its prediction or answer. The data we had before is called actual data, and the data that the machine gave us is predicted data. In our example, we have 2 possibilities one is True, and another is False that the machine will predict. When we compare the predicted data by the machine with the actual data, then there is a percentage of chance that the Machine will predict it correctly. The chance depends on the domain and the data we gave. It may be 70 % to 95%, usually in the real world. Now we know that the Machine is never perfect, and it can never give 100% correct results, so we check the actual data and the predicted data to compare how the machine did when some data was passed.

Now in our image, we have 4 boxes. Let me break them down:

  1. True Positive: This column holds the number of data out of the total, which is True in actual data and is correctly predicted by the machine.
  2. False Positive: This column hold the number of data out of the total, which is True in actual data, but the machine predicted them false.
  3. False Negative: This column holds the number of data out of the total, which is False in actual data and machine predicted then wrong, i.e., True.
  4. True Negative: This column holds the number of data out of the total, which is False in actual data, and the machine also predicted then false, i.e., which means correct prediction.

We will have a example on this so don’t worry if this confuse you.

False Positive is Type 1 Error, whereas False Negative is Type 2 Error.

EXAMPLE

Now let’s relate this confusion matric with a real-world example and see how it is helpful.

Consider we have a server where we received 1000 data traffic in 1 hour. (This will be a scenario). As I mention machine can never be 100 % correct so let’s check how it did. When our machine evaluated our data traffic, let's say it predicted that the packet/transmission is dangerous or not to the server. We want to know if the packet or transmission was good(True/1) or suspicious(False/0).

In the above image, our Machine Learning model predicted 750 packets as same, and they were safe, which is a good thing that we know 750 packets came, and they were safe. Then we can see that model said that 165 packets were suspicious and dangerous, and they were dangerous in actuality, so the machine gave us the correct information, and we were able to deal with it in time. Now we have 20 of the packets predicted as dangerous, but they are safe packets in actuality. In this case, the model alerted a false alarm. It said the safe data unsafe and made the security guys have a look. This one is a Type 2 error; they are not very dangerous in the real world. Finally, we have 65 packets which we in actuality, dangerous, but the machine predicted that they were good and safe. The packet was actually false(dangerous). Still, the model predicted they were True(safe) and that packet did not trigger any alarm or notified the security as passed in the server. This is called a Type 1 Error, and they are very dangerous to the server or real-world example. It is like something bad happened, and we were notified that everything is fine.

So this is how the confusion matrix help in cyber attack monitoring. The team checks the matrix and evaluates everything, and even tries to reduce the type 1 error as much as possible.

CONCLUSION

We can say that Machine Learning is a very much an important part of the IT industry and it has been used in every domain and it is being developed day by day to meet the need of the industry. We have also well discussed how the confusion matrix work and how it helps in real-world problems.

I hope I have explained everything, and if you have any doubts or suggestions, you can comment on this blog or contact me on my LinkedIn.

Thank you for staying until the end of the blog, and please do suggest some ideas for improvement. Your suggestions will really motivate me.

--

--

MishanRG
Nerd For Tech

I blog about ML, Big Data, Cloud Computing. And improving to be the best.