Installation¶

Installation information for the different components of MultiScanner is provided below. To get an idea of how the system works without going through the full process of setting up the distributed architecture, refer to the section on Standalone Docker Installation.

The Docker standalone system is less scalable, robust, and feature-rich, but it enables easy stand up the web UI, the REST API, and an Elasticsearch node, allowing users to quickly see how the system works. The standalone container is intended as an introduction to the system and its capabilities, but is not designed for operational use.

System Requirements¶

Python 3.6 is recommended. Compatibility with Python 2.7+ and 3.4+ is supported but not thoroughly maintained and tested. Please submit an issue or a pull request fixing any issues found with other versions of Python.

An installer script is included in the project (install.sh), which installs the prerequisites on most systems.

Currently, MultiScanner is deployed with Ansible, and we’re working to support distributed architecture deployment via Docker.

Installing Ansible¶

The installer script should install the required Python packages for users of RedHat- or Debian-based Linux distributions. Users of other distributions should refer to requirements.txt.

MultiScanner requires a configuration file to run. After cloning the repository, generate the MultiScanner default configuration by running python multiscanner.py init. The command can be used to rewrite the configuration file to its default state or, if new modules have been written, to add their configuration details to the configuration file.

Installing Analytic Machines¶

Default modules have the option of being run locally or via SSH. The development team runs MultiScanner on a Linux host and hosts the majority of analytical tools on a separate Windows machine. The SSH server used in this environment is freeSSHd.

A network share accessible to both the MultiScanner and the analytic machines is required for the multi-machine setup. Once configured, the network share path must be identified in the configuration file, config.ini. To do this, set the copyfilesto option under [main] to be the mount point on the system running MultiScanner. Modules can have a replacement path option, which is the network share mount point on the analytic machine.

Installing Elasticsearch¶

Starting with Elasticsearch 2.x, field names can no longer contain ‘.’ (dot) characters. Thus, the MultiScanner elasticsearch_storage module adds a pipeline called “dedot” with a processor to replace dots in field names with underscores.

Add the following to the elasticsearch.yml configuration file for the dedot processor to work:

script.painless.regex.enabled: true

To use the Multiscanner web UI, also add the following:

http.cors.enabled: true
http.cors.allow-origin: "<yourOrigin>"

Module Configuration¶

Modules are intended to be quickly written and incorporated into the framework. Note that:

A finished module must be placed in the modules folder before it can be used.
The configuration file does not need to be manually updated.
Modules are configured within the configuration file, config.ini.

Parameters common to all modules are listed in the next section, and module-specific parameters (for core and analysis modules that have parameters) are listed in the subsequent sections. See Analysis Modules for information about all current modules.

Common Parameters¶

The parameters below may be used by all modules.

Parameter	Description
path	Location of the executable.
cmdline	An array of command line options to be passed to the executable.
host	The hostname, port, and username of the machine that will be SSH’d into to run the analytic if the executable is not present on the local machine.
key	The SSH key to be used to SSH into the host.
replacement path	If the main config is set to copy the scanned files this will be what it replaces the path with. It should be where the network share is mounted.
ENABLED	When set to false, the module will not run.

Parameters of Core Modules¶

[main] - searches virustotal for a file hash and downloads the report, if available.

Parameter	Description
copyfilesto	This is where the script will copy each file that is to be scanned. This can be removed or set to False to disable this feature.
group-types	This is the type of analytics to group into sections for the report. This can be removed or set to False to disable this feature.
storage-config	Path to the storage config file.
api-config	Path to the API config file.
web-config	Path to the Web UI config file.

Parameters of Analysis Modules¶

Analysis modules with additional parameters (or notes for installation) are given below in alphabetical order. See Analysis Modules for a list of all current analysis modules.

[Cuckoo] - submits a file to a Cuckoo Sandbox cluster for analysis.

Parameter	Description
API URL	The URL to the API server.
WEB URL	The URL to the Web server.
timeout	The maximum time a sample will run.
running timeout	An additional timeout, if a task is in the running state this many seconds past timeout, the task is considered failed.
delete tasks	When set to True, tasks will be deleted from Cuckoo after detonation. This is to prevent filling up the Cuckoo machine’s disk with reports.
maec	When set to True, a MAEC JSON-based report is added to Cuckoo JSON report. NOTE: Cuckoo needs MAEC reporting enabled to produce results.

[ExifToolsScan] - scans the file with Exif tools and returns the results.

Parameter	Description
remove-entry	A Python list of ExifTool results that should not be included in the report. File system level attributes are not useful and stripped out.

[FireeyeAPI] - detonates the sample in FireEye AX via FireEye’s API. This “API” version replaces the “FireEye Scan” module.

Parameter	Description
API URL	The URL to the API server.
fireeye images	A Python list of the VMs in fireeye. These are used to generate where to copy the files.
username	Username on the FireEye AX.
password	Password for the FireEye AX.
info level	Options are concise, normal, and extended.
timeout	The maximum time a sample will run.
force	If set to True, will rescan if the sample matches a previous scan.
analysis type	0 = sandbox, 1 = live.
application id	For AX Series appliances (7.7 and higher) and CM Series appliances that manage AX Series appliances (7.7 and higher), setting the application value to -1 allows the AX Series appliance to choose the application. For other appliances, setting the application value to 0 allows the AX Series appliance to choose the application.

[libmagic] - runs libmagic against the files.

Parameter	Description
magicfile	The path to the compiled magic file you wish to use. If None it will use the default one.

[Metadefender] - runs Metadefender against the files.

Parameter	Description
timeout	The maximum time a sample will run.
running timeout	An additional timeout, if a task is in the running state this many seconds past timeout, the task is considered failed.
fetch delay seconds	The number of seconds for the module to wait between submitting all samples and polling for scan results. Increase this value if Metadefender is taking a long time to store the samples.
poll interval	The number of seconds between successive queries to Metadefender for scan results. Default is 5 seconds.
user agent	Metadefender user agent string, refer to your Metadefender server configuration for this value. Default is “user agent”.

[NSRL] - looks up hashes in the NSRL database. These two parameters are automatically generated. Users must run nsrl_parse.py tool in the utils/ directory before using this module.

Parameter	Description
hash_list	The path to the NSRL database on the local filesystem, containing the MD5 hash, SHA1 hash, and original file name.
offsets	A file that contains the pointers into hash_list file. This is necessary to speed up searching of the NSRL database file.

[PEFile] - extracts out feature information from EXE files.

The module uses pefile which is currently not available for Python 3.

[Tika] - extracts metadata from the file using Tika. For configuration of the module see the tika-python documentation.

Parameter	Description
remove-entry	A Python list of Tika results that should not be included in the report.

[TrID] - runs TrID against a file.

The module definition file must be in the same folder as the executable malware sample.

[vtsearch] - searches virustotal for the files hash and download the report if available.

Parameter	Description
apikey	Public/private api key. Can optionally make it a list and the requests will be distributed across them. This is useful when two groups with private api keys want to share the load and reports.

[VxStream] - submits a file to a VxStream Sandbox cluster for analysis.

Parameter	Description
BASE URL	The base URL of the VxStream server.
API URL	The URL to the API server (include the /api/ in this URL).
API Key	The user’s API key to the API server.
API Secret	The user’s secret to the API server.
Environment ID	The environment in which to execute the sample (if you have different sandboxes configured).
Verify	Set to false to ignore TLS certificate errors when querying the VxStream server.
timeout	The maximum time a sample will run
running timeout	An additional timeout, if a task is in the running state this many seconds past timeout, the task is considered failed.

[YaraScan] - scans the files with yara and returns the results; yara-python must be installed.

Parameter	Description
ruledir	The directory to look for rule files in.
fileextensions	A Python array of all valid rule file extensions. Files not ending in one of these will be ignored.
ignore-tags	A Python array of yara rule tags that will not be included in the report.

Standalone Docker Installation¶

To introduce new users to the power of the MultiScanner framework, web UI, and REST API, we have built a standalone docker application that is simple to run in new environments. Simply clone the top level directory and run:

$ docker-compose up

This will build the three necessary containers (one for the web application, one for the REST API, and one for the Elasticsearch backend).

Running this command will generate a lot of output and take some time. The system is not ready until you see the following output in your terminal:

api_1      |  * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)

Note

THIS CONTAINER IS NOT DESIGNED FOR PRODUCTION USE. This is simply a primer for using MultiScanner’s web interface. The MultiScanner framework is highly scalable and distributed, but it requires a full install. Currently, we support installing the distributed system via Ansible. More information about that process can be found here: https://github.com/mitre/multiscanner-ansible.

Note

The latest versions of docker and docker-compose are assumed to be installed. Installation guides are here: https://docs.docker.com/engine/installation/ and here: https://docs.docker.com/compose/install/

Note

Because this docker container runs two web applications and an Elasticsearch node, there is a fairly high requirement for computing power (RAM). We recommend running this on a machine with at least 4GB of RAM.

Note

This container will only be reachable and functionable on localhost.

Note

The docker-compose.yml file must be edited in four places if the system is installed behind a proxy. First, uncomment lines 18-20 and lines 35-37. Next, uncomment lines 25-28 and set the correct proxy variables. Finally, do the same thing in lines 42-45. The docker-compose.yml file has comments to make clear where to make these changes.