Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++11

Antivirus Part--2

3.86/5 (3 votes)
19 Aug 2019CPOL4 min read 12.5K  
An opensource antivirus engine which uses Yara, Locality Sensitive Hashing to detect malware

Introduction

I once wrote an opensource antivirus engine.

The above antivirus uses Message Digest 5 (MD5) algorithm to detect malware. The above antivirus is a great failure. In this article, I will explain how the antivirus failed and how the newer implementation has overcome its pitfall in detecting malware.

Drawbacks of MD5 as a Malware Signature

The previous antivirus used MD5 as its prime search engine for detecting malwares. The idea is to compare the md5 hash with the collection of known MD5 hashes of malware stored in a database. If the hash is matched, then the user will be notified. This approach is plain and simple but has major drawbacks which are explained further.

1. Replacing MD5 with TLSH

Since MD5 is a cryptographic hash function, even a small change to a file could generate you a completely randomized hash. So sample samples of malware will have different signatures. This will greatly flood the database with duplicate signatures and may also cause a failure in detection if the newer hash is not present in the database.

LSH, also known as Locality Sensitive Hashing, will produce similar hashes for similar files. TLSH is a Locality Sensitive Hashing developed by TrendMicro.

TLSH uses distance scores to calculate the distance between two hashes, a score of 0 represents the file is identical or highly identical. Increase in distance score is directly proportional to the difference between two files. We are using a Similarity distance of 20 which gives less false positives when comparing between hashes. The character length of TLSH is 70. The image below will show you that newer Antivirus is capable of removing the duplicates from its signature database.

Image 1

So we have now solved the data-duplication problem which is faced by the previous version of antivirus.

2. Using YARA Signatures

Yara is an opensource pattern matching engine developed by VirusTotal. A wide variety of malicious software can be detected using Yara rules. Yara rules for detecting most common malwares are readily available in various platforms like Github, etc., and the important feature is it is cross-platformed, i.e., it works on Linux, Windows and Mac OS X.

We have built the Yara library from their Github repository and used a wrapper made by Avast. Our antivirus has support for yara rules.

3. Antivirus Server

Our antivirus is a server and client model, i.e., the antivirus engine will run as an http server on 127.0.0.1 and with port 5660, which means you could customize the frontend as you please.

Here are commonly used API calls for the server:

1. Requesting to Scan a File with YARA Signatures

You could request the server to scan a specific file for matching Yara signatures. Below is an example code of the request made using Python.

Python
import requests
r = requests.post("http://127.0.0.1:5660/scan_file_for_yara", 
                   data={"file": "D:/test.eicar", "target": "windows"})
print(r.json())

The above request will output the following JSON response.

JSON
{
  "detections": [
    {
      "author": "Visweswaran",
      "description": "EICAR",
      "name": "bot"
    },
    {
      "author": "UNKNOWN AUTHOR",
      "description": "AV-TEST-FILE",
      "name": "example"
    }
  ],
  "message": true
}

2. Scanning for TLSH in Our Database

Python
def scan_for_tlsh(self, path):
    r = requests.post("http://127.0.0.1:5660/get_tlsh", data={"file": path})
    tlsh_hash = r.json()["message"]
    if len(tlsh_hash) != 70:
        return None
    try:
        file_size = os.path.getsize(path)
        file_type = filetype.guess(path)
        mime = file_type.mime
        r = requests.get("http://127.0.0.1:5660/check_threat_db?tlsh={}7&
            min_size={}&max_size={}&type={}".format(
            tlsh_hash, file_size-10000, file_size+10000, mime
        ))
        if r.json()["message"] != -1:
            result = {
                "name": "Identified Threat",
                "author": "Undefined",
                "description": "This application matches known sample
                                collected from virussign.",
                "path": path
            }
            self.detection_signal.emit(result)
    except AttributeError:
        pass

4. Updating

An antivirus needs to be continuously updated to detect new malwares. Our antivirus downloads the signature from my opensource threat database.

The repository has two folders, ruleset and threat_db. The ruleset will contain rules for downloading the signatures.

Here is an example ruleset entry:

Our program will download all the signatures specified in the files from the specified URL key.

JSON
    {
    "files": [
        "00262c8d0aadc3cb4be65879adc90ae0.json",
        "011bf863d4e663aa0c926b22e10de0a0.json",
        "01c330ac6e38efc825392935f5335970.json",
        "02b527d0d835551173d60165295e5760.json",
        "02e47b0abda68b4ec9ba744949b6db50.json",
        "049a75642e9f3e5de016cebb17684060.json",
        "04d26df3f7ab73593e7032e53c47e610.json",
        "052c5d51c394c0b23b705b9dba7d8610.json",
        "05a999c3d1c289dde03b646f4ecf6970.json",
        "066d8931cc808b434e70035252b38400.json",
        "069beb9929025aa2a06ccc92f92ec210.json",
        "07fa346c6e1b527c05956c380e7845f0.json",
        "08dab9ded3f84c38c5bd767ca3b59d70.json",
        "091cf848a11f19d538ae5523863ed610.json",
        "09d69bb9310fad02e61ce5846974ff40.json",
        "0a3955c6421d72c60032d988eb8355c0.json",
        "0a8af0c42c6d2599ce50a3025eaa1ba0.json",
        "0adb76cb764e40cd2a8f3439fa6e85a0.json",
        "0b51f2ee8fc6fb878fedb5965e18ad80.json",
        "0b54a0b4ef49bc43d1cc7e5fd79f9a00.json",
        "0b58dc29370adc52339b4caf8562ad70.json",
        "0ca48e853292a9b619cdb6b0ab1bd440.json",
        "0cdf02304ebfa49679f7db62037872e0.json",
        "0e4ab6462cce4ef7fa31fb4ac57c83c0.json",
        "0ec2c15fdeeb330cafc4a5fe2ffc3e70.json",
        "0eda094271709f4d5a9f19b57c186160.json",
        "0f22fa598186b77206257e48df59c1b0.json",
        "0f59b56edf6748b9f9bc1aa81e325d40.json",
        "0fe64eadbb6ca69ee787111be6438fb0.json",
        "101534723ab369c5ee0f73bedb2f3ae0.json",
        "10cb1a9ee2a44e1942b2b5c63f1d3810.json",
        "11bed948eb05efd18c08bf2eca2bbe30.json",
        "1222b9de991a3b8c400e0254be8b4320.json",
        "133bc9d70764d9aa2d4caa1be5298a70.json",
        "13aaeb6d9731d14b40ddf0dd82097e60.json",
        "13ff5b66db6708a1b401796be9b051c0.json",
        "15822673e074769eb74ae6af83f97bb0.json",
        "163017016b7510494b290f98e61139b0.json",
        "16a476f8b3042e0a3e860484c09747c0.json",
        "17d3069abe6c8851a6265c671e420eb0.json",
        "188b0472f0d29d870c1f7584d02f5140.json",
        "1973cbe3fa6b2dc027b1f8bc397a8410.json",
        "1a58fdc5111b02f7365d9d6fdfec60d0.json",
        "1b45ebd8e310870edd4f75b338856a80.json",
        "1d2e18ebeb166ea4cb46889813275c90.json",
        "1d884a7e4f6b46359d50dac2c5bc7bc0.json",
        "1e76e1805f5538b4eb9621a1447428b0.json",
        "1ea41d3da07ca12b2621669b53c3e0b0.json",
        "1eadef36e46316c7b121cb67b7c5a990.json",
        "1f75807fe3e230383e62bd1361879a40.json",
        "1fa7852cb5597765439cd0f6f4f4ed10.json",
        "1fb699cdf5491940cb18adf69324ce90.json",
        "203b252dad1c4e55737b8d2c6adca720.json",
        "20b44714f3d1bfabbbd8e919aa68f520.json",
        "ce5e29524b19188331c08291334d45c0.json",
        "cec5be08b011d1a64e4592867bd07b30.json",
        "f5f609657dd815f6403b7db5e5870f80.json",
        "f7aaa86433a314d021a88253855d3b80.json",
        "ff898fa4e7710feaf462cf6d3cfa19b0.json",
        "ffe2a48d87eb2611d278a0041d2da9c0.json"
    ],
    "url": "https://raw.githubusercontent.com/VISWESWARAN1998/
            open-threat-database/master/threat_db/"
}

And the signature will look something like this:

JSON
{
    "tlsh": "8D63E1CA9195EDD4FC5BB839000275FAFB66044C7AEB7201F854ABDDE0D4780D2EC68A",
    "name": "VirusSign Sample",
    "size": 72704,
    "type": "application/x-msdownload"
}

5. Adding yara

You can add your own yara files in the yara folder and the antivirus engine comes with inbuilt error checker for yara.

You can check for compilation errors by going to this url: http://127.0.0.1:5660/check_yara and the server will remove the files which has compilation errors. Running http://127.0.0.1:5660/refactor_threat_db will remove duplicate TLSH hashes.

not found

Using the Antivirus

The antivirus ships with server mrida.exe and you should execute it first which will fireup the server. And you can execute mrida-gui.exe. This will run a basic User Interface to demonstrate the working of the antivirus. Here is how it looks:

No Image

no image

Points of Interest

I still have to admit that this is not a perfect antivirus implementation. It still has many drawbacks and any free commercial engine would outshine its detection. Yet it can be used for educational purposes and will serve the community. Windows only had one opensource antivirus which is clamwin but now we can add another one to the list however it works on other operating systems to.

History

  • 19th August, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)