Nk_top_ips

Home

Description

This is a Bash script that generates a report of the top IPs accessing a particular domain within a specified time frame. The report includes various metrics such as number of requests, bandwidth, location, and abuse score. The script also uses the AbuseIPDB API to check the abuse score and location of each IP address.

The script takes in three arguments: the domain name, the timestamp (in the format “dd/Mon/yyyy”), and an optional limit on the number of IPs to include in the report. If no limit is specified, it defaults to 10.

The script works by first calling the nk_logs_from_timestamps() function to retrieve the log entries for the specified domain and time frame. It then generates a list of the top IPs accessing the domain using the gen_ip_list() function. For each IP address in the list, it calculates various metrics using the gen_body() function and prints out a row in the report.

The gen_body() function calculates the total bandwidth and total number of requests for the domain. It then iterates through each IP address in the top IP list and calculates the number of requests, number of GET and POST requests, percent of total requests, bandwidth, percent of total bandwidth, location, abuse score, and top URI. These metrics are printed out in a formatted row in the report.

The script also uses various Unix utilities such as awk, sort, and uniq to manipulate the log entries and calculate the metrics.

Overall, this script is a useful tool for analyzing the top IPs accessing a domain within a specific time frame and can help identify potential security threats. However, the use of the AbuseIPDB API requires an API key and could result in additional charges or limitations on usage.

Example

[root@cloudvpsserver ~]# nk_top_ips nkern.net 01/Apr/2023
IP               Requests  GET/POST  %-Requests  Bandwidth  %-Bandwidth  Location  Abuse-Score  Top-URI
103.216.223.204  667       (5/662)   3.16%       394KB      0.43%        SG        100          /xmlrpc.php
143.198.204.227  575       (25/550)  2.72%       309KB      0.33%        SG        56           /xmlrpc.php
139.162.31.42    569       (4/565)   2.69%       315KB      0.34%        SG        30           /xmlrpc.php
185.196.220.26   327       (327/0)   1.55%       4.2MB      4.66%        US        100          /wp-content/plugins/wp-upg/readme.txt
43.135.158.169   180       (180/0)   0.85%       987KB      1.08%        US        100          /wp-includes/Text/Diff/
178.159.37.25    179       (10/0)    0.84%       103KB      0.11%        UA        100          aam-media=2
103.130.219.197  156       (0/156)   0.73%       0B         0%           VN        100          /xmlrpc.php
65.108.0.71      110       (110/0)   0.52%       565KB      0.61%        FI        85           utm_source=dlvr.it&utm_medium=twitter
85.215.114.34    85        (85/0)    0.40%       3.1MB      3.46%        US        35           u
51.79.241.226    93        (0/93)    0.44%       0B         0%           SG        100          /xmlrpc.php

Code

nk_top_ips() {
# This is basically a wrapper for gen_body that formats it.
# nk_top_ips expects both a domain and timestamp.
# If neither are provided exit.
if [ "$1" = "" ] || [ "$2" = "" ]; then
    echo "You must provide a domain and timestamp."
    echo "Example: nk_top_ips domain.com 01/Apr/2023"
    return 0
fi

domain="$1"
timestamp="$2"
limit="$3"
# Limit is optional input, if it wasn't provide just assume 10.
if [ "$limit" = "" ]; then
    limit="10"
fi

# Okay this is in need of a major overhaul.
# Lets just generate the matching timestamps once.
tmpfile="/root/.nk_top_ips_tmp"
nk_logs_from_timestamps "$domain" "$timestamp" "$limit" > "$tmpfile"


gen_ip_list () {
# Select the first field in tmpfile which would be the IP address. Then get the top ones.
awk '{print $1}' "$tmpfile" | sort | uniq -c | sort -rn | head -"$limit" | awk '{print $NF}'
}

get_bandwidth() {
# result is equal to the sum of field 10 in the domlog. for the matching IP.
result="$(grep "$ip" "$tmpfile"| awk '{print $10}' | grep -v "-" | awk '{sum+=$1} END {print sum}')"
# If the result is blank, make it 0 instead.
if [ "$result" = "" ]; then
    result="0"
fi
# Print out the result.
echo "$result"
}

gen_body() {
# Print out the header.
echo "IP Requests GET/POST %-Requests Bandwidth %-Bandwidth Location Abuse-Score Top-URI"

# Calculate total bandwidth. and total requests.
bandwidth_total="$(awk '{print $10}' "$tmpfile" | grep -v "-" | awk '{sum+=$1} END {print sum}')"
total_requests="$(wc -l $tmpfile)"

# For every one of the top x IP's found by gen_ip_list do,
for ip in $(gen_ip_list); do
    # Populate Variable.
    # requests is the number of times the ip has hit in the domlog.
    # gets is the number of requests that are GET
    # posts is the number of requests that are POST
    # gets_posts is just gets and posts formatted together.
    # precent_requests is what percent of total requests is this IP responsible for.
    # bandwidth is the amount of bandwidth for the IP. found with get_bandwidth
    # bandwidth_human is a pretty formatted version of bandwidth for putting in the table.
    # percent_bandwidth is the percent of total bandwidth the IP is responsible for.
    # abuseresult is the response from abuseipdb about the IP. I'll need to change the key later.
    # location is the IPs location, found out by filtering the contents of $abuseresult
    # abuse_score is the abuseipdb abuse score. It's found by filtering out the contents of $abuseresult.
    # top_uri is the top uri accessed by the ip.
    requests="$(grep -c "$ip" "$tmpfile")"
    gets="$(grep "$ip" "$tmpfile" | grep -c "GET")"
    posts="$(grep "$ip" "$tmpfile" | grep -c "POST")"
    gets_posts="($gets/$posts)"
    percent_requests="$(nk_percent "$requests" "$total_requests")"
    bandwidth="$(get_bandwidth)"
    bandwidth_human="$(numfmt --to=iec --suffix=B "$bandwidth")"
    percent_bandwidth="$(nk_percent "$bandwidth" "$bandwidth_total")"
    abuseresult="$(
curl -sG https://api.abuseipdb.com/api/v2/check \
  --data-urlencode "ipAddress=$ip" \
  -d maxAgeInDays=90 \
  -H "Key: f04f3d5367b08b1b73c715c5014f62ed7989fad1bb2d39d8817199e923f17103dd0607b5f741e0a4" \
  -H "Accept: application/json")"
    location="$(echo "$abuseresult" | grep -Eo "countryCode\":\"[A-Z]+\"" | awk -F ":" '{gsub("\"",""); print $NF}')"
    abuse_score="$(echo "$abuseresult" | grep -Eo "abuseConfidenceScore\":[0-9]+" | awk -F ":" '{print $NF}')"
    top_uri="$(grep "$ip" "$tmpfile"| awk '{print $7}' | awk -F "?" '{print $NF}' | sort | uniq -c | sort -rn | head -1 | awk '{print $NF}')"

    # Now that those variable are all found out, we can print them as a row on our table.
    echo "$ip $requests $gets_posts $percent_requests  $bandwidth_human $percent_bandwidth $location $abuse_score $top_uri"
done
# Clean up the tempfile.
rm -f "$tmpfile"
}

# Run gen_body and format the output.
gen_body | column -t
}

Author: Nichole Kernreicht

Created: 2023-04-09 Sun 21:41