Malware Analysis Pipeline in AWS (part 0)

2019-07-14

Lately I've been teaching myself AWS, as becoming “Cloud Native” is becoming a very popular strategy. During a few presentations, I've shared the benefits of maintaining a “Malware Zoo” for my team. One good example is my Malware Analysis on a Budget talk. The premise is relatively simple: Any security team needs a repository for long-term & organized archiving of malware. Over time, teams adopt or develop many analysis tools that will take malware samples as input, and output analysis. I'd like to share some of my experiments in adapting this approach to an AWS “server-less” architecture.

Introduction

Some popular examples of tools helping the malware analysis pipeline are:

Yara
ExifTool
Strings
…and more…

It can quickly become unwieldy to marshall malware samples out of long-term storage and onto your local environment, just to run these analyses. Additionally, maintaining separately the malware storage from the derivative analysis outputs can often lead to a situation where, perhaps, the analysis results for various malware samples are distributed among the local disks and home directories of various analysts within the team. A really elegant solution to this problem is to collect all of your offline analysis tools (those which can be run to completion without user interaction) onto a central analysis environment, which will happen to also be the landing zone for your long-term malware archive. Provide a mechanism for users to perform the simplest of operations: Upload a malware sample to the store, including some metadata supplied by the user that may be useful context for the malware sample.

Architecture Overview

In this set of examples, I'll discuss how I set up a system using S3, Node.js on Lambda, and SNS, to provide the following:

A place for others to upload malware samples
A mechanism to tag malware samples with tags supplied by the submitter
Lambda code to collect the malware sample(s) and organize them into a long-term store
Using SNS, post a notification to potential future listeners of the upload of a new malware sample

A rough diagram depicting how these will work together:

![Architecture Diagram (pt0)](index/mwzoo-part0.dot.png "Arch. Diagram part0")

This will set up an ingestion architecture from which I'll build analyzers using open source tools and the AWS services.

We will begin by assuming that you're already familiar with AWS services, have an AWS console account, as well as AWS command-line credentials for interacting with the API. Some of my examples will use the Python API, so I recommend the following documentation on getting it set up and configured with your credentials:

Installing the AWS CLI (aws.amazon.com Documentation)

I'll begin by building the infrastructure: the two buckets, and the SNS topic.

After that, I will define an execution role for the Lambda code that will give it the following permissions:

Read access to the upload bucket
Write access to the mwzoo bucket
Permission to publish to the new SNS topic

Finally, I will create the new Lambda function, upload my code to S3, and connect the Lambda function to pull code from my S3-hosted Zipped-up Lambda function implementation.

Create AWS Buckets

As the first step, you should create two new buckets. These will be used for (1) receiving new samples from external sources, and (2) for organizing & storing malware samples as well their resepective analysis artifacts. In the beginning, we aren't going to be exposing either of these publicly, so we can simply rely upon default, limited access controls for these buckets.

Create the two buckets using the AWS CLI as follows (you will need to pick your own bucket names in place of ckane-blog-new-malware and ckane-blog-mwzoo):

aws s3api create-bucket --acl private --bucket "ckane-blog-new-malware"
aws s3api create-bucket --acl private --bucket "ckane-blog-mwzoo"

If everything worked, you should see each of the below, as the result of each of the commands above:

{
    "Location": "/ckane-blog-new-malware"
}

{
    "Location": "/ckane-blog-mwzoo"
}

You may opt to use the --region operator if you wish to create the buckets in a region that isn't your default.

Throughout this series, we will be creating a library of SNS topics that will be responsible for notifying upon the completion of analysis phases. The first one of these will be a topic to announce to any listeners that a new malware sample has been stored in our malware zoo, and where to find it.

The following command will create our new SNS topic, so that Malware Zoo analyzers can subscribe to it and get notifications when new malware samples are catalogued into the Zoo:

aws sns create-topic --name "ckane-blog-mwzoo-new-sample"

If successful, the above command should output another JSON result, similar to the following, that contains the ARN you'll use to reference this topic elsewhere. Mine looks like the below (though I have masked the user id value with 999999999999):

{
    "TopicArn": "arn:aws:sns:us-east-1:999999999999:ckane-blog-mwzoo-new-sample"
}

You will want to save the ARN you've just been given for ease of access later. If at any time you need to look up your topic ARN again, simply run aws sns list-topics.

End Part 0

With this in place, we have created some basic infrastructure for our AWS MalwareZoo, namely:

Long-term, organized storage
A notification bus which can be replicated and subscribed to
A rough architectural diagram

Initially, I had planned to build this in a single post. However, the documentation has become quite long, and I've decided to split the content across multiple blog posts. In the next blog post, I'll discuss how to think about & set up policies and roles for the infrastructure, so that minimum access is granted to the Lambda code execution.

Author: Coleman Kane
Permanent Link: https://blog.malware.re/2019/07/14/Malware-Analysis-Pipeline-in-AWS-part0/index.html

Malware Analysis Pipeline in AWS (part 0)

Introduction

Architecture Overview

Create AWS Buckets

Create AWS SNS Topic for New Malware

End Part 0