Advanced configurations
Agent: healthbeat
Generic Settings
You can specify advanced settings in the healthbeat.yml configuration file to control the general behaviour of Healthbeat and the events it publishes to vuSmartMaps.
This includes:
- Healthbeat-specific global options (for example, startup delay between metricsets).
- Generic agent options, common to all vuSmartMaps Beats-based agents (name, tags, custom fields, processors, etc.).
Global Healthbeat configuration options
metricbeat.max_start_delay
The maximum random delay to apply to the startup of a metricset.
Random delays in the range [0, max_start_delay) are applied to reduce the “thundering herd” effect that can occur if a fleet of machines running Healthbeat are restarted at the same time.
- Type: duration (e.g. 10s, 1m)
- Default: 10s
- Value 0 disables the startup delay.
healthbeat.max_start_delay: 10s
timeseries.enabled
When this is enabled, Healthbeat adds a timeseries.instance field to all generated events. For a given metricset, this field is unique for every individual item being monitored (for example, per disk, per interface, per process).
- Type: boolean
- Default: false
timeseries.enabled: true
Agent: logbeat
Configure general settings
You configure Logbeat in logbeat.yml. Settings control:
-
Global Logbeat behaviour (registry, shutdown).
-
General Beat options like name, tags, custom fields.
Global Logbeat configuration options
These options live under the filebeat.* namespace.
filebeat.registry.path
Root path of the Logbeat registry. Relative paths are resolved relative to path.data.
filebeat.registry.path: registry
- Default:
${path.data}/registry - The registry is only updated when new events are flushed (not on a fixed timer).
filebeat.registry.file_permissions
Permissions mask for registry data files (Unix only).
filebeat.registry.file_permissions: 0600
- Default: 0600
- Most permissive: 0640
- Must be specified as octal.
filebeat.registry.flush
Controls how often registry changes are flushed to disk.
filebeat.registry.flush: 1s
- Default: 1s
- 0s → registry written after each successful batch publish.
filebeat.registry.migrate_file
Used when migrating from an older single-file registry to the new directory format.
filebeat.registry.path: `${path.data}/registry`
filebeat.registry.migrate_file: /path/to/old/registry_file
Logbeat will migrate only if the new registry directory does not already exist.
filebeat.shutdown_timeout
Maximum time Logbeat waits on shutdown for the publisher to flush events.
filebeat.shutdown_timeout: 5s
- Default: disabled (no waiting; un-acked events may be resent on restart).
Configure inputs
You define inputs under filebeat.inputs to tell Logbeat which files to read and how to parse them.
filebeat.inputs:
- type: filestream
id: my-filestream-id
paths:
- /var/log/system.log
- /var/log/wifi.log
Each input is a YAML list item (-), and you can define multiple inputs.
Filestream input (recommended)
Filestream is the improved replacement for the old log input.
Basic example:
filebeat.inputs:
- type: filestream
id: app-logs
paths:
- /var/log/app/*.log
Each filestream input must have a unique id to track file state correctly.
Key options (filestream):
- paths: list of glob paths.
- exclude_lines / include_lines: regexp filters.
- buffer_size: read buffer size (bytes).
- message_max_bytes: max length of a single message.
- parsers: pipeline (multiline, ndjson, container, syslog, include_message).
- file_identity: fingerprint (default), native, path, inode_marker.
- close.*, clean_*, backoff.*, harvester_limit: lifecycle & performance tuning.
Reading GZIP logs (beta)
filebeat.inputs:
- type: filestream
id: "gzip-filestream"
paths:
- /var/some-app/app.log*
gzip_experimental: true
- Requires file_identity: fingerprint (default).
- Logs are decompressed in memory; ~100KB extra per harvester.
log input
The log input is deprecated. Use filestream instead.
Legacy example:
filebeat.inputs:
- type: log
paths:
- /var/log/messages
- /var/log/*.log
All the classic options (paths, encoding, exclude_lines, include_lines, ignore_older, close_*, clean_*, scan_frequency, tail_files, backoff, harvester_limit, file_identity, etc.) work similarly, but new configs should migrate to filestream.
Manage multiline messages
Use multiline settings to merge multi-line log events (stack traces, multi-line errors, custom blocks) into single events before sending to Kafka.
With filestream input
filebeat.inputs:
- type: filestream
id: java-traces
paths:
- /var/log/app/*.log
parsers:
- multiline:
type: pattern
pattern: '^\['
negate: true
match: after
With deprecated log input
filebeat.inputs:
- type: log
paths:
- /var/log/app/*.log
multiline.type: pattern
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
Core multiline options
- multiline.type: pattern | count | while_pattern
- multiline.pattern: '<regexp>'
- multiline.negate: true | false
- multiline.match: after | before
- multiline.flush_pattern: '<regexp>'
- multiline.max_lines: 500 (default)
- multiline.timeout: 5s (default)
- multiline.count_lines (for type: count)
- multiline.skip_newline: true|false
Example – Java stack trace (indented lines)
multiline.type: pattern
multiline.pattern: '^[[:space:]]'
multiline.negate: false
multiline.match: after
Example – timestamped events (lines without timestamp belong to previous)
multiline.type: pattern
multiline.pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
Filestream – advanced behaviour
Some important filestream-specific features:
File identity (file_identity)
Controls how Logbeat distinguishes files:
- fingerprint (default, recommended)
Identifies files by hashing content (first N bytes). Works well with log rotation and cloud / networked filesystems. - native
Uses inode + device id. - path
Uses path (not safe with rename-based rotation). - inode_marker
Uses inode + marker file for device identity.
Changing the file identity method improperly can cause mass re-ingestion or duplicates.
Closing harvesters (close.* / close.on_state_change.*)
Examples (filestream):
close.on_state_change.inactive: 5m
close.on_state_change.renamed: false
close.on_state_change.removed: true
close.reader.on_eof: false
close.reader.after_interval: 0
Cleaning registry (clean_*)
clean_inactive: -1 # disable automatic cleanup (recommended default)
clean_removed: true
Removing fully ingested files (optional)
delete.enabled: true
delete.grace_period: 30m
Logbeat will remove files once:
- Reader is closed,
- EOF is reached, and
- All events are acknowledged by the output.
Agent configuration options (common to all Beats-based agents)
Generic
These options are not namespaced and are supported by all vuSmartMaps Beats-based agents (Healthbeat, Logbeat, etc.). They control how events are labeled and enriched before being sent to vuSmartMaps.
name
Logical name of the agent instance.
If this option is not set, Healthbeat uses the hostname of the server. The value is included as agent.name in each published event, and can be used to group or filter events per agent instance.
name: "payments-node-01"
tags
A list of tags that Healthbeat includes in the tags field of each event.
Tags make it easy to group servers or instances by logical properties (application, environment, tier, etc.), and to filter or build dashboards in vuSmartMaps.
tags: ["payments-service", "web-tier", "prod"]
fields
Optional custom fields to add additional information to the output.
- Supported types: scalar values, arrays, maps, or any nested combination.
- By default, these fields are grouped under a fields sub-object in the event.
fields: {
project: "payments",
instance_id: "574734885120952459",
owner_team: "SRE"
}
fields_under_root
If true, the custom fields are stored as top-level fields in the event instead of under the fields object.
If a custom field name conflicts with an existing field, the custom value overwrites the original.
fields_under_root: true
fields:
instance_id: "i-10a64379"
region: "us-east-1"
processors
A list of processors applied to the data generated by Healthbeat before it is sent to vuSmartMaps (for example, to drop fields, rename fields, add metadata, or filter events).
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- drop_fields:
fields: ["host.hostname", "host.os.build"]
(You can reference the common “Processors” documentation for the full list of supported processors and usage patterns.)
max_procs
Sets the maximum number of CPUs that Healthbeat can use concurrently.
- Default: number of logical CPUs on the system.
- In most deployments this is left at the default; in constrained environments you can cap it explicitly.
max_procs: 2
The installer’s --cpulimit option also influences max_procs and related resource limits; this YAML option is the manual override at the configuration level.
timestamp.precision
Controls the precision of all timestamps added by Healthbeat.
-
Default: millisecond
-
Valid values:
- millisecond
- microsecond
- nanosecond
timestamp.precision: microsecond
Kafka Output
Configure the Kafka output
The Kafka output sends events to vuSmartMaps.
Note on Kafka timestamps and retention
For Kafka 0.10.0.0+, the event timestamp is normally set by the producer (beat agent) to the original event time.
If your Kafka topic uses a time-based retention policy, an event created long ago but produced now might be dropped immediately (because its timestamp is already older than the retention window).
To avoid this, set the broker config:
log.message.timestamp.type=LogAppendTime
So Kafka uses the append time (arrival time) instead of the original event time for retention.
Example configuration
output.kafka:
# Initial brokers used to fetch cluster metadata
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
# Dynamic topic selection
topic: '%{[fields.log_topic]}'
# Partitioning strategy
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000
Events larger than max_message_bytes will be dropped.
Make sure the beat agent does not generate events larger than this limit (or increase the limit and broker message.max.bytes accordingly).
Port reminder (vuSmartMaps)
- Till 2.16 build: Kafka/collector endpoints are typically on 9092 (non-SSL) and 9094 (SSL).
Confirm with your platform version and network design before finalising hosts. - 3.0 onwards: data ingestion is via 443/TCP (SSL/HTTPS).
Confirm with your platform version and network design before finalising hosts.
Compatibility
Beat agent’s Kafka output is compatible with:
- Kafka 0.8.2.0 and later (older versions may work but are not supported).
- When using Kafka 4.0+, set version to at least "2.1.0".
The version setting controls the protocol features used by the client; it does not prevent Healthbeat from talking to newer Kafka brokers.
Configuration options (output.kafka in *beat.yml)
All of the following options live under the output.kafka: section.
enabled
Enable or disable the Kafka output.
output.kafka:
enabled: true
- Default: true.
- If false, Kafka output is disabled.
hosts
List of bootstrap broker addresses used to fetch Kafka cluster metadata (topics, partitions, leaders).
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092"]
version
Kafka protocol version Healthbeat should use:
output.kafka:
version: "2.1.0"
- Defaults to "2.1.0".
- Valid values: from "0.8.2.0" up to "2.6.0".
- For Kafka 4.0+, use "2.1.0" or higher.
username / password
Credentials for SASL authentication (for example SASL/PLAIN or SCRAM):
output.kafka:
username: "kafka_user"
password: "kafka_password"
If you set a username, you must also set a password.
sasl.mechanism
SASL mechanism to use when username/password are configured:
output.kafka:
sasl.mechanism: "SCRAM-SHA-512"
Supported values:
-
PLAIN – SASL/PLAIN
-
SCRAM-SHA-256
-
SCRAM-SHA-512
If sasl.mechanism is not set:
-
If username & password are present → defaults to PLAIN.
-
Otherwise → SASL authentication is disabled.
To use Kerberos (GSSAPI), leave sasl.mechanism is empty and uses the Kerberos block instead (see below).
topic
Kafka topic name (or format string) for produced events.
You can:
Set a fixed topic:
output.kafka:
topic: "healthbeat"
Use a format string based on ECS fields:
output.kafka:
topic: '%{[data_stream.type]}-%{[data_stream.dataset]}-%{[data_stream.namespace]}'
Use a custom field, for example, fields.log_topic:
output.kafka:
topic: '%{[fields.log_topic]}'
To populate fields.log_topic, you can use an add_fields processor in your input/module config:
processors:
- add_fields:
target: ''
fields:
log_topic: '%{[data_stream.type]}-%{[data_stream.dataset]}-%{[data_stream.namespace]}'
topics
Advanced topic routing using a list of selector rules. Healthbeat applies the first matching rule; if no rule matches, the topic setting is used.
Each rule supports:
- topic – format string.
- mappings – map returned topic to new names.
- default – default name if no mapping matches.
- when – conditional (same syntax as processors).
Example (route CRITICAL/ERR messages to dedicated topics):
output.kafka:
hosts: ["localhost:9092"]
topic: "logs-%{[agent.version]}"
topics:
- topic: "critical-%{[agent.version]}"
when.contains:
message: "CRITICAL"
- topic: "error-%{[agent.version]}"
when.contains:
message: "ERR"
Resulting topics: critical-<version>, error-<version>, logs-<version>.
key
Optional formatted string for the Kafka message key (used by brokers for partitioning):
output.kafka:
key: '%{[host.name]}'
If not set, Kafka chooses the key according to its default behaviour.
partition
Partitioning strategy:
output.kafka:
partition.hash:
hash: ["host.name"]
random: true
Supported strategies:
- random
- round_robin
- hash (default)
Additional tuning:
- random.group_events
- round_robin.group_events
- hash.hash (fields list)
- hash.random (fallback to random if no hash/key)
All partitioners can also set:
partition.round_robin:
reachable_only: true
If reachable_only is true, events are sent only to available partitions (but may become unevenly distributed).
headers
Optional static headers to add to every produced Kafka message:
output.kafka:
headers:
- key: "environment"
value: "prod"
- key: "source"
value: "healthbeat
Values must be strings.
client_id
Client ID used in Kafka logs/metrics:
output.kafka:
client_id: "server1"
Default is "beats".
codec
Controls how events are encoded before sending to Kafka. If omitted, events are JSON encoded.
output.kafka:
codec:
format: json
(Refer to your “output codec” section if you support additional formats.)
metadata
Controls how often Healthbeat refreshes Kafka metadata (brokers, topics, partitions, leaders):
output.kafka:
metadata:
refresh_frequency: 10m
full: false
retry.max: 3
retry.backoff: 250ms
- refresh_frequency – how often to refresh metadata (default: 10m).
- full – true to fetch metadata for all topics, false for configured topics only (default: false).
- retry.max – number of metadata retries (default: 3).
- retry.backoff – backoff between metadata retries (default: 250ms).
max_retries
Number of retries for publishing events after a failure:
output.kafka:
max_retries: 3
- < 0 → retry indefinitely until success.
- Default: 3.
backoff.init / backoff.max
Exponential backoff between retry attempts:
output.kafka:
backoff.init: 1s
backoff.max: 60s
- backoff.init – first delay (default: 1s).
- backoff.max – maximum delay (default: 60s).
bulk_max_size
Maximum number of events per Kafka batch:
output.kafka:
bulk_max_size: 2048
Default: 2048.
bulk_flush_frequency
Time-based flush for partial batches:
output.kafka:
bulk_flush_frequency: 0s
- 0 (default) – no time-based flush; flush is size-driven.
timeout
Timeout for responses from Kafka brokers:
output.kafka:
timeout: 30s
Default: 30s.
broker_timeout
Maximum time the broker waits for the required ACKs:
output.kafka:
broker_timeout: 10s
Default: 10s.
channel_buffer_size
Number of messages buffered per broker in the output pipeline:
output.kafka:
channel_buffer_size: 256
Default: 256.
keep_alive
TCP keep-alive period:
output.kafka:
keep_alive: 0s
- 0s (default) disables keep-alives.
compression / compression_level
Compression codec and level:
output.kafka:
compression: gzip
compression_level: 4
-
compression – one of: none, snappy, lz4, gzip, zstd.
- Default: gzip.
- Azure Event Hub for Kafka: must be none.
-
compression_level (for gzip):
- 0 – disable compression.
- 1–9 – speed vs compression tradeoff (default: 4).
max_message_bytes
Maximum size of the encoded message:
output.kafka:
max_message_bytes: 1000000
- Default: 1000000 bytes.
- Must be ≤ broker message.max.bytes.
- Larger events are dropped.
required_acks
Reliability level:
output.kafka:
required_acks: 1
Values:
- 0 – do not wait for any ACK (high risk of silent loss).
- 1 – wait for leader to commit (default).
- -1 – wait for all replicas to commit.
ssl
SSL/TLS settings for broker connections, for example:
output.kafka:
ssl:
enabled: true
certificate_authorities: ["/path/to/ca.pem"]
certificate: "/path/to/client.crt"
key: "/path/to/client.key"
Ensure Kafka’s keystore uses a cipher supported by the Healthbeat Kafka client (typically -keyalg RSA).
queue
Internal queue options for buffering events before they are sent to Kafka.
queue options can be set either at the top level in healthbeat.yml or under the output section – but not both.
Internal Queue
Configure the internal queue
Beat agents uses an internal queue to store events before they are published to the configured output (Kafka, HTTPS, etc.). The queue:
- Buffers events in memory and/or disk.
- Groups events into batches.
- Hands batches to the output, which sends them with bulk operations.
You can configure the queue either:
- in the top-level queue section of *beat.yml, or
- under the output section (for some outputs),
but not both at the same time. Only one queue type can be active.
Example (memory queue with 4096 events):
queue.mem:
events: 4096
Configure the memory queue (queue.mem)
The memory queue keeps all pending events in RAM.
-
If the queue is full, Healthbeat cannot insert new events until the output acknowledges or drops some events.
-
The memory queue is controlled by:
- events
- flush.min_events
- flush.timeout
Batching behavior
-
flush.min_events and flush.timeout together control how batches are filled.
-
If the output also has bulk_max_size, the actual batch size will be:
- min(bulk_max_size, flush.min_events).
flush.min_events is considered legacy for controlling batch size. New configurations should prefer setting batch size via the output’s bulk_max_size parameter.
Synchronous vs asynchronous mode
- Synchronous mode
- Events are returned to the output as soon as they are available.
- Use when you want to minimize latency.
- Config:
- flush.timeout: 0
(or for backwards compatibility: flush.min_events: 0 or 1, which also caps batch size at half the queue capacity).
- flush.timeout: 0
- Asynchronous mode
- The queue waits up to flush.timeout to try to fill the requested batch.
- If flush.timeout expires, a partial batch with whatever is available is returned.
- Config: flush.timeout: <positive duration>, e.g. 5s.
Example – async mode with size + time triggers
queue.mem:
events: 4096
flush.min_events: 512
flush.timeout: 5s
This configuration forwards events when:
- There are enough events to fill the output’s requested batch (usually driven by bulk_max_size, capped at 512 by flush.min_events), or
- Events have been waiting 5 seconds.
queue.mem options
All options are under queue.mem::
events
Number of events the queue can hold in memory.
queue.mem:
events: 4096
- Default: 3200.
flush.min_events
Controls batch size and, in some cases, mode:
-
If > 1:
- Maximum number of events per batch.
- The output waits until:
- that many events accumulate, or
- flush.timeout elapses.
-
If 0 or 1:
- Batch size is set to half the queue capacity.
- Queue switches to synchronous mode (equivalent to flush.timeout: 0).
queue.mem:
flush.min_events: 512
- Default: 1600.
flush.timeout
Maximum time the queue will wait to fill a batch request from the output.
- If 0s → synchronous mode: events returned immediately.
- If > 0 → asynchronous mode: wait up to this duration, then return whatever is available.
queue.mem:
flush.timeout: 5s
- Default: 10s.
Configure the disk queue (queue.disk)
The disk queue stores pending events on disk instead of (only) in main memory.
Characteristics:
- Can buffer far more events than the memory queue.
- Survives Healthbeat or host restarts (until events are sent).
- Slightly higher overhead because each event is written to and read from disk.
This is useful when:
- You need high reliability (don’t want to lose events during outages).
- Disk is not the main bottleneck.
To enable the disk queue with default behavior, set at least max_size:
queue.disk:
max_size: 10GB
The queue:
- Uses up to max_size on disk, but only as much as needed.
- Deletes data from disk once the events are successfully sent.
queue.disk options
All options are under queue.disk::
path
Directory to store disk queue data files.
queue.disk:
path: "/var/lib/healthbeat/diskqueue"
- Default:
"${path.data}/diskqueue".
(Typically resolves under Healthbeat’s data directory.)
max_size (required)
Maximum total size of the queue on disk.
queue.disk:
max_size: 10GB
- Required.
- If the queue hits this limit:
- Inputs may pause or drop events, depending on their configuration.
- 0 = no enforced maximum:
- Queue can grow until the disk is full.
- Use with great caution, ideally only on a dedicated data partition.
- Default: 10GB.
segment_size
Size of each internal segment file.
queue.disk:
segment_size: 1GB
- Default: max_size / 10.
- Smaller segments:
- More files.
- Faster deletion of old data.
- Larger segments:
- Fewer files.
- Some data may remain on disk longer before deletion.
In most cases, you can leave this at the default.
read_ahead
How many events to read from disk into memory while waiting for the output to request them.
queue.disk:
read_ahead: 512
- Default: 512.
- Increase if outputs are slowed down because they cannot read enough events at once (at the cost of more memory).
write_ahead
How many events the queue can accept and hold in memory while waiting for them to be written to disk.
queue.disk:
write_ahead: 2048
- Default: 2048.
- Decrease if the queue’s memory usage is too high because disk writes are too slow.
- Increase if inputs are blocked or dropping events because disk is the bottleneck (more memory usage, higher throughput).
retry_interval
Base interval between retries after disk-related errors (permissions, disk full, etc.).
queue.disk:
retry_interval: 1s
- Default: 1s.
max_retry_interval
Maximum backoff between retries after consecutive disk errors.
queue.disk:
max_retry_interval: 30s
-
Default: 30s.
-
Increase if you want fewer logs / less load when the disk is unavailable for a long time.
Where to configure the queue?
You can configure queue.mem or queue.disk:
- at the top level in *beat.yml, or
- under the output section (depending on output capabilities),
but not both places at the same time, and only one queue type (memory or disk) can be active.
Updating JAVA Agent’s Startup Options
UNIX systems
This is applicable for all the agents(vuHealthagent/vuAppagent/vuLogagent)
Changing startup options
AIX
On AIX, the agent is managed by init scripts under the agent home.
Both the startup script and supervisor.conf lives in the same directory:
<AGENT_HOME>/etc/rc.d/init.d/
├── vulogagent # startup script
└── supervisor.conf # command-line args & resource caps
To change vuLogAgent command-line options
Edit:
<AGENT_HOME>/etc/rc.d/init.d/supervisor.conf
Solaris (SunOS)
On Solaris (SunOS), the layout is similar, but under etc/init.d:
<AGENT_HOME>/etc/init.d/
├── vulogagent # startup script
└── supervisor.conf # command-line args & resource caps
To modify vuLogAgent command-line arguments:
Edit:
<AGENT_HOME>/etc/init.d/supervisor.conf
HP-UX
On HPUX, the layout is similar, but under sbin/init.d:
<AGENT_HOME>/sbin/init.d
├── vulogagent # startup script
└── supervisor.conf # command-line args & resource caps
To modify vuLogAgent command-line arguments:
Edit:
<AGENT_HOME>/sbin/init.d/supervisor.conf
Windows and Linux (vuAppagent)
This section is applicable only for vuAppagent on Linux and Windows.
Linux
On Linux, vuAppagent is managed by a systemd startup wrapper under the agent home:
<AGENT_HOME>/etc/systemd/vuappagent-start
└── vuappagent-start # startup script / wrapper with JVM + agent options
To modify vuAppagent command-line arguments:
Edit:
<AGENT_HOME>/etc/systemd/vuappagent-start
Windows
On Windows, vuAppagent startup options are controlled through an XML configuration file under the agent home:
<AGENT_HOME>\vuappagent\vuappagent.xml
└── vuappagent.xml # Windows service / wrapper configuration
To modify vuAppagent command-line arguments:
Edit:
<AGENT_HOME>\vuappagent\vuappagent.xml
Agent: vuLogagent
command-line options(startup)
You can control vuLogAgent behaviour using the following command-line flags:
| Option | Description |
|---|---|
| -quiet | Run in quiet mode – only errors are logged. |
| -debug | Run in debug mode – verbose diagnostic logging. |
| -trace | Run in trace mode – very detailed, low-level logging (useful for deep troubleshooting). |
| -tail | Start reading new files from the end, instead of from the beginning (similar to tail -f). |
| -version | Print the agent version and exit. |
| -config <path/to/config/file> | Path to the configuration file to load (for example: -config /opt/vu/vulogagent/conf.d/vulogagent.json). |
| -idletimeout <seconds> | Time between file reads (idle polling interval) in seconds. Example: -idletimeout 5000. |
| -spoolsize <count> | Event count threshold for the internal spool/buffer. When this many events are queued, the agent forces a network flush. Example: -spoolsize 1024. |
| -signaturelength <bytes> | Maximum length of the file signature used for file identity / sincedb tracking. Default: 4096. |
| -logfile <path/to/logfile> | Path and file name for the agent log file. |
| -logfilesize <size> | Maximum size of each log file, e.g. 10MB. Default: 10MB. |
| -logfilenumber <count> | Number of rotated log files to keep. Default: 5. |
| -workers <count> | Number of worker threads used for event processing and shipping. Example: -workers 16. |
| -diagnosticsfrequency <minutes> | How often to emit diagnostic reports (health / stats) in minutes. Example: -diagnosticsfrequency 5. |
| -sincedb <filename> | Name/path of the sincedb state file used to remember file offsets. Example: -sincedb ".logstash-forwarder-java". |
| -workerBufferSize <count> | Event buffer size per worker – maximum number of events buffered per worker before backpressure kicks in. Default: 10000. |
| -workerQueueSize <count> | Task queue size per worker – maximum number of tasks queued per worker. Default: 1000. |
| -heartbeatport <port> | Port used for the agent heartbeat / health check listener (if enabled). |
Configuring vuLogAgent
After installation, vuLogAgent is configured via its JSON config file:
<VULOGAGENT_HOME>/conf.d/vulogagent.json
1. The configuration Parameters
-
Shipper / Target IP
- The remote vuSmartMaps system where data must be sent.
- Example: 10.10.10.5 or vu-smartmaps.example.com.
-
PROTOCOL
Protocol used to send events:- kafka
- logstash (TCP / Beats-like)
-
PORT
- The TCP port on which vuSmartMaps is listening for this agent’s data.
- Must match the configured listener in vuSmartMaps.
-
TIMEOUT
- Maximum time (in milliseconds) to wait for a network connection or response.
- Example: 15000 (15 seconds).
-
LOGPATH
-
The path(s) to logs that vuLogAgent should ship.
-
Can be:
- A single file: /var/log/messages
- A wildcard: /var/log/*.log
- A wildcard directory: /var/log/httpd/httpd-*.log
-
-
LOGTYPE
- A logical type used to identify the log source (enriches events).
- Typical values: "syslog", "apache", "application", etc.
- This is usually mapped to fields.type in the config.
-
MULTILINE PATTERN
- Defines how multiline log messages (e.g. stack traces) are grouped.
- The pattern is a regular expression that detects the “start” (or continuation) of a multiline message.
- You can specify:
A single pattern, e.g.
^INFO
Or multiple alternatives, e.g.
^INFO|^ERROR
vuLogAgent then groups lines between these “pattern hits” into one logical message.
-
NEGATE & WHAT (multiline strategy)
These two options define how lines are combined:
- negate: true or false
- what: next or previous
| negate | what | Meaning (high-level) |
|---|---|---|
| false | next | Lines that match the pattern are appended to the previous non-matching line. |
| false | before | Lines that match the pattern are prepended to the next non-matching line. |
| true | next | Lines that do NOT match the pattern are appended to the previous match. |
| true | before | Lines that do NOT match the pattern are prepended to the next match. |
-
DEAD TIME
- dead time tells vuLogAgent to ignore log files that have not been modified within the specified duration.
- Time suffixes:
- m – minutes (e.g. 30m)
- h – hours (e.g. 1h)
- Any file older than this period (based on modification time) is not harvested.
-
CLOSE TIMEOUT
- close timeout is the maximum duration any file is allowed to remain open in vuLogAgent’s internal watch map.
- After this time, vuLogAgent closes the file handle (even if the file remains present).
- Same time suffixes as dead time, e.g. 30m, 1h.
- FILTER PATTERN & NEGATE (line filtering)
vuLogAgent can include or exclude lines using a regex filter:
-
pattern – regular expression to match lines.
-
negate:
- false → keep lines that match the pattern.
- true → drop lines that match the pattern.
Examples:
- pattern: ^b, negate: false
→ export only lines starting with b. - pattern: ^b, negate: true
→ export all lines except those starting with b.
- TOPIC (Kafka only)
- Kafka topic to which events are published when protocol is kafka.
- Must match a topic configured on the Kafka cluster used by vuSmartMaps.
2. Understanding vulogagent.json
This section explains the main blocks in the vuLogAgent configuration file, typically named:
<VULOGAGENT_HOME>/conf.d/vulogagent.json
2.1 Common blocks
network
Defines how vuLogAgent connects to the remote receiver:
"network": {
"type": "kafka",
"topic": "%{[topic_tag]}",
"sslEnabled": true,
"trustStore": "/path/to/client-truststore.jks",
"trustStorePassword": "<TRUSTSTORE_PASSWORD>",
"hostVerification": false,
"servers": [
"kafka-broker-1.example.com:9094",
"kafka-broker-2.example.com:9094",
"kafka-broker-3.example.com:9094"
],
"timeout": "30",
"ackType": 1,
"retryConfig": 3
}
You can describe each field like this:
type
Protocol/target type. For Kafka use:
"type": "kafka"
topic
Kafka topic name or format string (e.g. "%{[type]}" to derive topic from field topic_tag).
sslEnabled
true to enable SSL/TLS for Kafka connections; false for plain-text.
trustStore
Path to the Java truststore (JKS/PKCS12) containing the Kafka broker CA:
"trustStore": "/path/to/client-truststore.jks"
trustStorePassword
Password protecting the truststore:
"trustStorePassword": "<TRUSTSTORE_PASSWORD>"
In production, avoid hardcoding; use secure parameter / secret management.
hostVerification
Set to true to enforce hostname verification against the certificate’s CN/SAN, false to skip it.
servers
List of Kafka broker endpoints:
"servers": [
"broker1:9094",
"broker2:9094"
]
timeout
Network timeout in seconds (as string, e.g. "30").
ackType
Kafka acknowledgment level:
- 0 – no ACK (fast, but unsafe)
- 1 – leader-only ACK (default)
- -1 – all replicas must ACK (safest).
retryConfig
Number of retries on send failure (e.g. 3).
files
Defines which logs to read and how to treat them:
"files": [
{
"paths": [
"/var/log/messages",
"/var/log/*.log"
],
"fields": {
"type": "syslog"
},
"dead time": "12h",
"close timeout": "30m"
}
]
- paths
- List of paths (files, globs, or wildcard directories).
- fields
- Extra metadata attached to each event from these paths (e.g. "type": "syslog" or "type": "apache").
- dead time
- Ignore files not modified within this timespan.
- close timeout
- Maximum duration a file is allowed to stay open in vuLogAgent’s internal watch map.
filter
Optional block to include/exclude lines:
"filter": {
"pattern": "INFO|ERROR",
"negate": "false"
}
- pattern
- Regular expression used to match lines.
- negate
- "false" → include only matching lines.
- "true" → drop matching lines.
3. Sample configurations
3.1 Single-line log monitoring – basic
Example 1 – multiple paths and log types
"network": {
"type": "kafka",
"topic": "%{[topic_tag]}",
"sslEnabled": true,
"trustStore": "/path/to/client-truststore.jks",
"trustStorePassword": "<TRUSTSTORE_PASSWORD>",
"hostVerification": false,
"servers": [
"kafka-broker-1.example.com:9094",
"kafka-broker-2.example.com:9094",
"kafka-broker-3.example.com:9094"
],
"timeout": "30",
"ackType": 1,
"retryConfig": 3
},
"files": [
{
"paths": [
"/var/log/messages",
"/var/log/*.log"
],
"fields": {
"type": "syslog"
}
},
{
"paths": [
"/var/log/apache/httpd-*.log"
],
"fields": {
"type": "apache"
},
"dead time": "12h"
}
]
}
What this does
- Sends all events to localhost:5043 over TCP.
- Ships:
- /var/log/messages and /var/log/*.log as "type": "syslog".
- /var/log/apache/httpd-*.log as "type": "apache".
- Ignores Apache logs that haven’t been modified in the last 12 hours.
3.2 Single-line log monitoring – with filtering
Example 2 – filter by INFO/ERROR
"network": {
"type": "kafka",
"topic": "%{[topic_tag]}",
"sslEnabled": true,
"trustStore": "/path/to/client-truststore.jks",
"trustStorePassword": "<TRUSTSTORE_PASSWORD>",
"hostVerification": false,
"servers": [
"kafka-broker-1.example.com:9094",
"kafka-broker-2.example.com:9094",
"kafka-broker-3.example.com:9094"
],
"timeout": "30",
"ackType": 1,
"retryConfig": 3
},
"files": [
{
"paths": [
"/var/log/sample.log"
],
"fields": {
"type": "syslog"
},
"dead time": "2h",
"filter": {
"pattern": "INFO|ERROR",
"negate": "false"
}
}
]
}
What this does
- Ships only lines in /var/log/sample.log that contain INFO or ERROR.
- Ignores files older than 2 hours (dead time).
4. Multiline configuration
Use the multiline block to stitch together related lines (e.g. stack traces) into a single event.
4.1 Multiline block
"multiline": {
"pattern": "A\\[",
"negate": "true",
"what": "previous"
}
Fields:
- pattern
- Regex pattern to match. In the example: A\[
- negate
- true → treat non-matching lines as continuation lines.
- what
- previous → attach continuation lines to the previous matching line.
- next → attach continuation lines to the next matching line.
4.2 Full multiline example
"network": {
"type": "kafka",
"topic": "%{[topic_tag]}",
"sslEnabled": true,
"trustStore": "/path/to/client-truststore.jks",
"trustStorePassword": "<TRUSTSTORE_PASSWORD>",
"hostVerification": false,
"servers": [
"kafka-broker-1.example.com:9094",
"kafka-broker-2.example.com:9094",
"kafka-broker-3.example.com:9094"
],
"timeout": "30",
"ackType": 1,
"retryConfig": 3
},
"files": [
{
"paths": [
"/var/log/messages",
"/var/log/*.log"
],
"multiline": {
"pattern": "A\\[",
"negate": "true",
"what": "previous"
},
"fields": {
"type": "syslog"
}
}
]
}
What this does
- Watches /var/log/messages and /var/log/*.log.
- Groups lines until they hit a line matching A[:
- Line with A[ is treated as start.
- Non-matching lines before the next A[ are attached to the previous event.
