blog/content/posts/mapping-fail2ban.md
2021-08-31 02:20:34 +02:00

7.6 KiB

title date draft
Mapping Fail2ban's list of malicious IPv4 scanners 2021-08-30T23:47:11+02:00 true

Some context

Back when I built my gitea server for the first time, I noticed something strange: it would work nicely, but only for so many hours at a time. Soon enough, it would just crash or stop responding without an apparent reason, leaving me scratching my head.

I had opened sshd's well-known port to the Internet with the naive impression that having my server with no valid ssh login would be more than enough protection. What could possibly happen? Someone stealing my laptop and using my ssh key to push into my random dark-web repo to assert dominance?

Oh boy, was I wrong. Not even days after first exposing the ssh port to the Internet, the sheer amount of malicious traffic would make my server crash. The chinese botnets didn't care know nor care about sshd not accepting logins: they just kept trying to brute force in. It is well-known that there is a gigantic amount of IPv4 scanning going on, in the order of thousands of packages per day, but that is a mere fraction of what you can get by showing a well-known port. Before installing fail2ban, my server was receiving hundreds of login attempts per second.

Meeting the scanners

Fail2ban's method for keeping law and order is fairly straightforward: you give it a # of failed tries, an amount of time to be banned, and it adds temporal iptables rules when someone has tried and failed to connect one too many times. I will be using its daily log to get a better grasp of where all the botting is coming from. Fail2ban's logfiles look something like this:

$ head /var/log/fail2ban.log
2021-08-29 00:00:34,222 fail2ban.server         [94]: INFO    rollover performed on /var/log/fail2ban.log
2021-08-29 00:01:21,092 fail2ban.actions        [94]: NOTICE  [sshd] Unban 186.3.164.76
2021-08-29 00:03:05,205 fail2ban.actions        [94]: NOTICE  [sshd] Unban 222.186.30.112
2021-08-29 00:09:41,049 fail2ban.filter         [94]: INFO    [sshd] Found 221.181.185.159 - 2021-08-29 00:09:40
2021-08-29 00:09:42,651 fail2ban.filter         [94]: INFO    [sshd] Found 221.181.185.159 - 2021-08-29 00:09:42
2021-08-29 00:09:45,665 fail2ban.filter         [94]: INFO    [sshd] Found 221.181.185.159 - 2021-08-29 00:09:45
2021-08-29 00:09:48,369 fail2ban.filter         [94]: INFO    [sshd] Found 221.181.185.159 - 2021-08-29 00:09:48
2021-08-29 00:09:51,574 fail2ban.filter         [94]: INFO    [sshd] Found 221.181.185.159 - 2021-08-29 00:09:51
2021-08-29 00:09:51,638 fail2ban.actions        [94]: NOTICE  [sshd] Ban 221.181.185.159
2021-08-29 00:09:53,229 fail2ban.filter         [94]: INFO    [sshd] Found 221.181.185.159 - 2021-08-29 00:09:53

The only data I am interested in is the IP addresses (and the quantity of them), so we trim the file accordingly, taking care to remove duplicates:

$ grep -E "\WBan" /var/log/fail2ban.log | awk '{ print $8 }' | sort --unique | tee banlog
1.116.211.170
1.117.214.250
1.15.106.44
1.15.151.58
1.15.183.51
1.15.21.246
1.179.137.10
1.226.12.132
1.53.89.181
1.85.216.176
[...]

Take care to use sort --unique instead of something like uniq, which only detects adjacent duplicates.

Scanning the scanners

Now having their IPs, we can get a rough estimation of where the traffic is coming from. There are many online services you can use to get this data, but they won't let you do queries in bulk without charging you for some kind of database suscription. If someone knows a program that just works with batteries included, please tell me.

Anyhow, I ended up using IP2Location's BIN-format database along with its Python API. They require a free account to download their database files, but a burner email or [an alias]( {{< ref "/automating-aliases.md" >}}) will do just fine.

IP2Location's module can be installed in the usual fashion:

$ pip install IP2Location --user

After which we can get our hands on deck. I'm not much of a pythoner myself, so I decided to make a simple .py that outputs formatted lines so I can keep using my shiny UNIX tools:

#!/usr/bin/env python
import sys, IP2Location

def main():

    # Argument checking
    if (len(sys.argv) < 3):
        print("Usage: ip_query.py <database_file> <ips_file>")
        return

    # Get a list of ips as trimmed strings
    with open(sys.argv[1], "r") as ips_file:
        ip_list = [line.rstrip() for line in ips_file]

    # Open connection to binary database
    database = IP2Location.IP2Location(sys.argv[2], "SHARED_MEMORY")

    # field delimiter
    d = "~"
    
    for ip in ip_list:
        record = database.get_all(ip)
        print(record.ip + d +
              record.country_short + d +
              record.country_long + d +
              record.region + d +
              record.city + d +
              record.latitude + d +
              record.longitude + d +
              record.zipcode + d +
              record.timezone)

if __name__ == '__main__':
    main()

Depending on which database you chose, it may have more or less fields available. Later I will cut what I don't need, but for now I'm dumping everything. You can use whichever delimiter you want, but I don't recommend using , or any other that could be included in a country's name or timezone info.

$ chmod +x ip_query.py
$ ./ip_query.py banlog IP2LOCATION-LITE-DB11.BIN | tee ipstats
1.116.211.170~CN~China~Beijing~Beijing~39.907501~116.397232~100006~+08:00
1.117.214.250~CN~China~Beijing~Beijing~39.907501~116.397232~100006~+08:00
1.15.106.44~CN~China~Beijing~Beijing~39.907501~116.397232~100006~+08:00
1.15.151.58~CN~China~Beijing~Beijing~39.907501~116.397232~100006~+08:00
1.15.183.51~CN~China~Beijing~Beijing~39.907501~116.397232~100006~+08:00
1.15.21.246~CN~China~Beijing~Beijing~39.907501~116.397232~100006~+08:00
1.179.137.10~TH~Thailand~Krung Thep Maha Nakhon~Bangkok~13.750000~100.516670~10200~+07:00
[...]

Now, this is looking much better. I was curious about which countries were the biggest culprits, although it isn't much of a surprise:

$ cut -d'~' -f2,3 ipstats | sort | uniq -c | sort -r | head
    239 CN~China
    118 US~United States of America
     45 IN~India
     33 ID~Indonesia
     26 VN~Viet Nam
     25 NL~Netherlands
     23 SG~Singapore
     22 KR~Korea (Republic of)
     22 DE~Germany
     21 RU~Russian Federation

We just made Our Very Own Top 10 Of Shame! And remember that is is just one day's worth of logs, from a server that barely half-a-dozen people use, and not even taking into account repeated offenses from the same IP. Goes to show you how crazy IPv4 scanning has gotten.

Mapping the scanners

To top it off, I would like to have some sort of graphical visualization of this heinous crimes. There are some great libraries out there to plot coordinate data into a mapamundi. I would consider something like folium if I were to do more with the Python side of this blogpost. But that's not what we're here for today. Today we're using crappy sites and copy-pasting.

$ cut -d'~' -f6,7 --output-delimiter=',' ipstats | xclip -selection clipboard

Will do just what we want. --output-delimiter is a cool flag that will substitute whatever your delimiter is with a different one. Most places that let you paste coordinates in bulk require comma-separated lines, and that is what we just copied to our clipboard.

We can use a place like mapcustomizer for our very first, crappy data visualizing:

![Image]({{< static "img/mapcustomizer.png" >}})