A brief overview of Nuclei Engine architecture. This document will be kept updated as the engine progresses.
## pkg/templates
### Template
Template is the basic unit of input to the engine which describes the requests to be made, matching to be done, data to extract, etc.
The template structure is described here. Template level attributes are defined here as well as convenience methods to validate, parse and compile templates creating executers.
`Parse` function is the main entry point which returns a template for a `filePath` and `executorOptions`. It compiles all the requests for the templates, all the workflows, as well as any self-contained request etc. It also caches the templates in an in-memory cache.
### Preprocessors
Preprocessors are also applied here which can do things at template level. They get data of the template which they can alter at will on runtime. This is used in the engine to do random string generation.
Custom processor can be used if they satisfy the following interface.
```go
type Preprocessor interface {
Process(data []byte) []byte
}
```
## pkg/model
Model package implements Information structure for Nuclei Templates. `Info` contains all major metadata information for the template. `Classification` structure can also be used to provide additional context to vulnerability data.
It also specifies a `WorkflowLoader` interface that is used during workflow loading in template compilation stage.
Many of these methods are similar across protocols while some are very protocol specific.
A brief overview of the methods is provided below -
- **Compile** - Compiles the request with provided options.
- **Requests** - Returns total requests made.
- **GetID** - Returns any ID for the request
- **Match** - Used to perform matching for patterns using matchers
- **Extract** - Used to perform extraction for patterns using extractors
- **ExecuteWithResults** - Request execution function for input.
- **MakeResultEventItem** - Creates a single result event for the intermediate `InternalWrappedEvent` output structure.
- **MakeResultEvent** - Returns a slice of results based on an `InternalWrappedEvent` internal output event.
- **GetCompiledOperators** - Returns the compiled operators for the request.
`MakeDefaultResultEvent` function can be used as a default for `MakeResultEvent` function when no protocol-specific features need to be implemented for result generation.
For reference protocol requests implementations, one can look at the below packages -
All these different requests interfaces are converted to an Executer which is also an interface defined in `pkg/protocols` which is used during final execution of the template.
```go
// Executer is an interface implemented any protocol based request executer.
The `ExecuteWithResults` function accepts a callback, which gets provided with results during execution in form of `*output.InternalWrappedEvent` structure.
The default executer is provided in `pkg/protocols/common/executer` . It takes a list of Requests and relevant `ExecuterOptions` and implements the Executer interface required for template execution. The executer during Template compilation process is created from this package and used as-is.
A different executer is the Clustered Requests executer which implements the Nuclei Request clustering functionality in `pkg/templates` We have a single HTTP request in cases where multiple templates can be clustered and multiple operator lists to match/extract. The first HTTP request is executed while all the template matcher/extractor are evaluated separately.
With this basic premise set, we can now start exploring the current runner implementation which will also walk us through the architecture of nuclei.
## internal/runner
### Template loading
The first process after all CLI specific initialisation is the loading of template/workflow paths that the user wants to run. This is done by the packages described below.
This package is used to get paths using mixed syntax. It takes a template directory and performs resolving for template paths both from provided template and current user directory.
The syntax is very versatile and can include filenames, glob patterns, directories, absolute paths, and relative-paths.
Next step is the initialisation of the reporting modules which is handled in `pkg/reporting`.
#### pkg/reporting
Reporting module contains exporters and trackers as well as a module for deduplication and a module for result formatting.
Exporters and Trackers are interfaces defined in pkg/reporting.
```go
// Tracker is an interface implemented by an issue tracker
type Tracker interface {
CreateIssue(event *output.ResultEvent) error
}
// Exporter is an interface implemented by an issue exporter
type Exporter interface {
Close() error
Export(event *output.ResultEvent) error
}
```
Exporters include `Elasticsearch`, `markdown`, `sarif` . Trackers include `GitHub` , `Gitlab` and `Jira`.
Each exporter and trackers implement their own configuration in YAML format and are very modular in nature, so adding new ones is easy.
After reading all the inputs from various sources and initialisation other miscellaneous options, the next bit is the output writing which is done using `pkg/output` module.
#### pkg/output
Output package implements the output writing functionality for Nuclei.
Output Writer implements the Writer interface which is called each time a result is found for nuclei.
```go
// Writer is an interface which writes output to somewhere for nuclei events.
ResultEvent structure is passed to the Nuclei Output Writer which contains the entire detail of a found result. Various intermediary types like `InternalWrappedEvent` and `InternalEvent` are used throughout nuclei protocols and matchers to describe results in various stages of execution.
Interactsh is also initialised if it is not explicitly disabled.
It uses two LRU caches, one for storing interactions for request URLs and one for storing requests for interaction URL. These both caches are used to correlated requests received to the Interactsh OOB server and Nuclei Instance. [Interactsh Client](https://github.com/projectdiscovery/interactsh/pkg/client) package does most of the heavy lifting of this module.
Polling for interactions and server registration only starts when a template uses the interactsh module and is executed by nuclei. After that no registration is required for the entire run.
### RunEnumeration
Next we arrive in the `RunEnumeration` function of the runner.
`HostErrorsCache` is initialised which is used throughout the run of Nuclei enumeration to keep track of errors per host and skip further requests if the errors are greater than the provided threshold. The functionality for the error tracking cache is defined in [hosterrorscache.go](https://github.com/projectdiscovery/nuclei/blob/master/v2/pkg/protocols/common/hosterrorscache/hosterrorscache.go) and is pretty simplistic in nature.
Next the `WorkflowLoader` is initialised which used to load workflows. It exists in `v2/pkg/parsers/workflow_loader.go`
The loader is initialised moving forward which is responsible for Using Catalog, Passed Tags, Filters, Paths, etc. to return compiled `Templates` and `Workflows`.
First the input passed by the user as paths is normalised to absolute paths which is done by the `pkg/catalog` module. Next the path filter module is used to remove the excluded template/workflows paths.
`pkg/parsers` module's `LoadTemplate`,`LoadWorkflow` functions are used to check if the templates pass the validation + are not excluded via tags/severity/etc. filters. If all checks are passed, then the template/workflow is parsed and returned in a compiled form by the `pkg/templates`'s `Parse` function.
`Parse` function performs compilation of all the requests in a template + creates Executers from them returning a runnable Template/Workflow structure.
Clustering module comes in next whose job is to cluster identical HTTP GET requests together (as a lot of the templates perform the same get requests many times, it's a good way to save many requests on large scans with lots of templates).
The core of this process is the Execute function which takes an input dictionary as well as a Match and Extract function and return a `Result` structure which is used later during nuclei execution to check for results.
```go
// Result is a result structure created from operators running on data.
The internal logics for matching and extracting for things like words, regexes, jq, paths, etc. is specified in `pkg/operators/matchers`, `pkg/operators/extractors`. Those packages should be investigated for further look into the topic.
`pkg/core` provides the engine mechanism which runs the templates/workflows on inputs. It exposes an `Execute` function which does the task of execution while also doing template clustering. The clustering can also be disabled optionally by the user.
A protocol must implement the `Protocol` and `Request` interfaces described above in `pkg/protocols`. We'll take the example of an existing protocol implementation - websocket for this short reference around Nuclei internals.
The code for the websocket protocol is contained in `pkg/protocols/others/websocket`.
Below a high level skeleton of the websocket implementation is provided with all the important parts present.
```go
package websocket
// Request is a request for the Websocket protocol
type Request struct {
// Operators for the current request go here.
operators.Operators `yaml:",inline,omitempty"`
CompiledOperators *operators.Operators `yaml:"-"`
// description: |
// Address contains address for the request
Address string `yaml:"address,omitempty" jsonschema:"title=address for the websocket request,description=Address contains address for the request"`
// declarations here
}
// Compile compiles the request generators preparing any requests possible.
Almost all of these protocols have boilerplate functions for which default implementations have been provided in the `providers` package. Examples are the implementation of `Match`, `Extract`, `MakeResultEvent`, GetCompiledOperators`, etc. which are almost same throughout Nuclei protocols code. It is enough to copy-paste them unless customization is required.
1. Add the protocol implementation in `pkg/protocols` directory. If it's a small protocol with fewer options, considering adding it to the `pkg/protocols/others` directory. Add the enum for the new protocol to `v2/pkg/templates/types/types.go`.
2. Add the protocol request structure to the `Template` structure fields. This is done in `pkg/templates/templates.go` with the corresponding import line.
// Template is a YAML input file which defines all the requests and
// other metadata for a template.
type Template struct {
...
// description: |
// Websocket contains the Websocket request to make in the template.
RequestsWebsocket []*websocket.Request `yaml:"websocket,omitempty" json:"websocket,omitempty" jsonschema:"title=websocket requests to make,description=Websocket requests to make for the template"`
...
}
```
Also add the protocol case to the `Type` function as well as the `TemplateTypes` array in the same `templates.go` file.
```go
// TemplateTypes is a list of accepted template types
That's it, you've added a new protocol to Nuclei. The next good step would be to write integration tests which are described in `integration-tests` and `cmd/integration-tests` directories.
## Project Structure
- [v2/pkg/reporting](./v2/pkg/reporting) - Reporting modules for nuclei.
- [v2/pkg/reporting/exporters/sarif](./v2/pkg/reporting/exporters/sarif) - Sarif Result Exporter
- [v2/pkg/reporting/exporters/markdown](./v2/pkg/reporting/exporters/markdown) - Markdown Result Exporter