Skip to content

Latest commit

 

History

History
462 lines (428 loc) · 46.3 KB

cloudfoundry-firehose-nozzle.md

File metadata and controls

462 lines (428 loc) · 46.3 KB

cloudfoundry-firehose-nozzle

Monitor Type: cloudfoundry-firehose-nozzle (Source)

Accepts Endpoints: No

Multiple Instances Allowed: Yes

Overview

This is a CloudFoundry firehose nozzle by connecting to the Cloud Foundry Reverse Log Proxy (RLP) Gateway that feeds metrics from the Loggregator. This uses the new RLP Gateway model that was introduced in Pivotal Cloud Foundry (PCF) 2.4, so it will not work with older releases.

Pivotal has a helpful guide for Key Performance Indicators (KPIs) to monitor. Most of these metrics come through the firehose. They also have a guide for Key Capacity Scaling Indicators that will help determine when to scale up or down your cluster.

This supports gauge and counter metrics at this time. Firehose gauge metrics gets converted to SignalFx gauges, an firehose counter metrics get converted to SignalFx cumulative counters metrics. All of the tags in the firehose envelopes will be converted to dimensions when sending to SignalFx.

To create a UAA user with the proper permissions to access the RLP Gateway, run the following:

$ uaac client add my-v2-nozzle \
    --name signalfx-nozzle \
    --secret <signalfx-nozzle client secret> \
    --authorized_grant_types client_credentials,refresh_token \
    --authorities logs.admin

Then set the uaaUsername config value to signalfx-nozzle and the uaaPassword field to the <signalfx-nozzle client secret> that you select.

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: cloudfoundry-firehose-nozzle
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
rlpGatewayUrl no string The base URL to the RLP Gateway server. This is quite often of the form https://log-stream. if using PCF 2.4+.
rlpGatewaySkipVerify no bool Whether to skip SSL/TLS verification when using HTTPS to connect to the RLP Gateway (default: false)
uaaUser no string The UAA username for a user that has the appropriate authority to fetch logs from the firehose (usually the logs.admin authority)
uaaPassword no string The password for the above UAA user
uaaUrl no string The URL to the UAA server. This monitor will obtain an access token from this server that it will use to authenticate with the RLP Gateway.
uaaSkipVerify no bool Whether to skip SSL/TLS verification when using HTTPS to connect to the UAA server (default: false)
shardId no string The nozzle's shard id. All nozzle instances with the same id will receive an exclusive subset of the data from the firehose. The default should suffice in the vast majority of use cases. (default: signalfx_nozzle)

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

  • auctioneer.AuctioneerFetchStatesDuration (gauge)
    Time in nanoseconds that the auctioneer took to fetch state from all the cells when running its auction. Emitted every 30 seconds during each auction.
  • auctioneer.AuctioneerLRPAuctionsFailed (cumulative)
    Cumulative number of LRP instances that the auctioneer failed to place on Diego cells. Emitted every 30 seconds during each auction.
  • auctioneer.AuctioneerLRPAuctionsStarted (cumulative)
    Cumulative number of LRP instances that the auctioneer successfully placed on Diego cells. Emitted every 30 seconds during each auction.
  • auctioneer.AuctioneerTaskAuctionsFailed (cumulative)
    Cumulative number of Tasks that the auctioneer failed to place on Diego cells. Emitted every 30 seconds during each auction.
  • auctioneer.AuctioneerTaskAuctionsStarted (cumulative)
    Cumulative number of Tasks that the auctioneer successfully placed on Diego cells. Emitted every 30 seconds during each auction.
  • auctioneer.LockHeld.v1-locks-auctioneer_lock (gauge)
    Whether an auctioneer holds the auctioneer lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active auctioneer.
  • auctioneer.LockHeldDuration.v1-locks-auctioneer_lock (gauge)
    Time in nanoseconds that the active auctioneer has held the auctioneer lock. Emitted every 30 seconds by the active auctioneer.
  • auctioneer.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • auctioneer.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • auctioneer.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • auctioneer.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • auctioneer.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • auctioneer.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • auctioneer.numCPUS (gauge)
    Number of CPUs on the machine.
  • auctioneer.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • bbs.BBSMasterElected (gauge)
    Emitted once when the BBS is elected as master.
  • bbs.ConvergenceLRPDuration (gauge)
    Time in nanoseconds that the BBS took to run its LRP convergence pass. Emitted every 30 seconds when LRP convergence runs.
  • bbs.ConvergenceLRPPreProcessingActualLRPsDeleted (gauge)
    Cumulative number of times the BBS has detected and deleted a malformed ActualLRP in its LRP convergence pass. Emitted every 30 seconds.
  • bbs.ConvergenceLRPPreProcessingMalformedRunInfos (gauge)
    Cumulative number of times the BBS has detected a malformed DesiredLRP RunInfo in its LRP convergence pass. Emitted every 30 seconds.
  • bbs.ConvergenceLRPPreProcessingMalformedSchedulingInfos (gauge)
    Cumulative number of times the BBS has detected a malformed DesiredLRP SchedulingInfo in its LRP convergence pass. Emitted every 30 seconds.
  • bbs.ConvergenceLRPRuns (cumulative)
    Cumulative number of times BBS has run its LRP convergence pass. Emitted every 30 seconds.
  • bbs.ConvergenceTaskDuration (gauge)
    Time in nanoseconds that the BBS took to run its Task convergence pass. Emitted every 30 seconds when Task convergence runs.
  • bbs.ConvergenceTaskRuns (cumulative)
    Cumulative number of times the BBS has run its Task convergence pass. Emitted every 30 seconds.
  • bbs.ConvergenceTasksKicked (cumulative)
    Cumulative number of times the BBS has updated a Task during its Task convergence pass. Emitted every 30 seconds.
  • bbs.ConvergenceTasksPruned (cumulative)
    Cumulative number of times the BBS has deleted a malformed Task during its Task convergence pass. Emitted every 30 seconds.
  • bbs.CrashedActualLRPs (gauge)
    Total number of LRP instances that have crashed. Emitted every 30 seconds.
  • bbs.CrashingDesiredLRPs (gauge)
    Total number of DesiredLRPs that have at least one crashed instance. Emitted every 30 seconds.
  • bbs.Domain.cf-apps (gauge)
    Whether the ‘cf-apps’ domain is up-to-date, so that CF apps from CC have been synchronized with DesiredLRPs for Diego to run. 1 means the domain is up-to-date, no data means it is not. Emitted every 30 seconds.
  • bbs.Domain.cf-tasks (gauge)
    Whether the ‘cf-tasks’ domain is up-to-date, so that CF tasks from CC have been synchronized with tasks for Diego to run. 1 means the domain is up-to-date, no data means it is not. Emitted every 30 seconds.
  • bbs.ETCDLeader (gauge)
    Index of the leader node in the etcd cluster. Emitted every 30 seconds.
  • bbs.ETCDRaftTerm (gauge)
    Raft term of the etcd cluster. Emitted every 30 seconds.
  • bbs.ETCDReceivedBandwidthRate (gauge)
    Number of bytes per second received by the follower etcd node. Emitted every 30 seconds.
  • bbs.ETCDReceivedRequestRate (gauge)
    Number of requests per second received by the follower etcd node. Emitted every 30 seconds.
  • bbs.ETCDSentBandwidthRate (gauge)
    Number of bytes per second sent by the leader etcd node. Emitted every 30 seconds.
  • bbs.ETCDSentRequestRate (gauge)
    Number of requests per second sent by the leader etcd node. Emitted every 30 seconds.
  • bbs.ETCDWatchers (gauge)
    Number of watches set against the etcd cluster. Emitted every 30 seconds.
  • bbs.LRPsClaimed (gauge)
    Total number of LRP instances that have been claimed by some cell. Emitted every 30 seconds.
  • bbs.LRPsDesired (gauge)
    Total number of LRP instances desired across all LRPs. Emitted periodically.
  • bbs.LRPsExtra (gauge)
    Total number of LRP instances that are no longer desired but still have a BBS record. Emitted every 30 seconds.
  • bbs.LRPsMissing (gauge)
    Total number of LRP instances that are desired but have no record in the BBS. Emitted every 30 seconds.
  • bbs.LRPsRunning (gauge)
    Total number of LRP instances that are running on cells. Emitted every 30 seconds.
  • bbs.LRPsUnclaimed (gauge)
    Total number of LRP instances that have not yet been claimed by a cell. Emitted every 30 seconds.
  • bbs.LockHeld.v1-locks-bbs_lock (gauge)
    Whether a BBS holds the BBS lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active BBS server.
  • bbs.LockHeldDuration.v1-locks-bbs_lock (gauge)
    Time in nanoseconds that the active BBS has held the BBS lock. Emitted every 30 seconds by the active BBS server.
  • bbs.MetricsReportingDuration (gauge)
    Time in nanoseconds that the BBS took to emit metrics about etcd. Emitted every 30 seconds.
  • bbs.MigrationDuration (gauge)
    Time in nanoseconds that the BBS took to run migrations against its persistence store. Emitted each time a BBS becomes the active master.
  • bbs.RequestCount (cumulative)
    Cumulative number of requests the BBS has handled through its API. Emitted for each BBS request.
  • bbs.RequestLatency (gauge)
    Time in nanoseconds that the BBS took to handle requests to its API endpoints. Emitted when the BBS API handles requests.
  • bbs.TasksCompleted (gauge)
    Total number of Tasks that have completed. Emitted every 30 seconds.
  • bbs.TasksPending (gauge)
    Total number of Tasks that have not yet been placed on a cell. Emitted every 30 seconds.
  • bbs.TasksResolving (gauge)
    Total number of Tasks locked for deletion. Emitted every 30 seconds.
  • bbs.TasksRunning (gauge)
    Total number of Tasks running on cells. Emitted every 30 seconds.
  • bbs.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • bbs.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • bbs.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • bbs.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • bbs.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • bbs.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • bbs.numCPUS (gauge)
    Number of CPUs on the machine.
  • bbs.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • bosh-system-metrics-forwarder.system.cpu.sys (gauge)
    CPU load consumed by the kernel.
  • bosh-system-metrics-forwarder.system.cpu.user (gauge)
    CPU load consumed by userspace.
  • bosh-system-metrics-forwarder.system.cpu.wait (gauge)
    Time CPU spent waiting for IO.
  • bosh-system-metrics-forwarder.system.disk.ephemeral.percent (gauge)
    Percentage of the ephemeral disk used.
  • bosh-system-metrics-forwarder.system.disk.system.percent (gauge)
    Percentage of the system disk used.
  • bosh-system-metrics-forwarder.system.healthy (gauge)
    Overall status of system health.
  • bosh-system-metrics-forwarder.system.mem.percent (gauge)
    Percentage of RAM used.
  • bosh-system-metrics-forwarder.system.swap.percent (gauge)
    Percentage of swap space used.
  • cc.failed_job_count.VM_NAME-VM_INDEX (cumulative)
    Number of failed jobs in the <VM_NAME>-<VM_INDEX> queue. This is the number of delayed jobs where the failed at column is populated with the time of the most recently failed attempt at the job. The failed job count is not specific to the jobs run by the Cloud Controller worker. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM.
  • cc.failed_job_count.cc-generic (cumulative)
    Number of failed jobs in the cc-generic queue. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM.
  • cc.failed_job_count.total (gauge)
    Number of failed jobs in all queues. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM.
  • cc.http_status.1XX (cumulative)
    Number of HTTP response status codes of type 1xx (informational). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle.
  • cc.http_status.2XX (cumulative)
    Number of HTTP response status codes of type 2xx (success). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request.
  • cc.http_status.3XX (cumulative)
    Number of HTTP response status codes of type 3xx (redirection). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request.
  • cc.http_status.4XX (cumulative)
    Number of HTTP response status codes of type 4xx (client error). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request.
  • cc.http_status.5XX (cumulative)
    Number of HTTP response status codes of type 5xx (server error). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle.
  • cc.job_queue_length.cc-VM_NAME-VM_INDEX (gauge)
    Number of background jobs in the <VM_NAME>-<VM_INDEX> queue that have yet to run for the first time. Emitted every 30 seconds per VM.
  • cc.job_queue_length.cc-generic (gauge)
    Number of background jobs in the cc-generic queue that have yet to run for the first time. Emitted every 30 seconds per VM.
  • cc.job_queue_length.total (gauge)
    Total number of background jobs in the queues that have yet to run for the first time. Emitted every 30 seconds per VM.
  • cc.log_count.all (gauge)
    Total number of log messages, sum of messages of all severity levels. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
  • cc.log_count.debug (gauge)
    Number of log messages of severity “debug.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
  • cc.log_count.debug1 (gauge)
    Not used.
  • cc.log_count.debug2 (gauge)
    Number of log messages of severity “debug2.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
  • cc.log_count.error (cumulative)
    Number of error log messages.
  • cc.log_count.fatal (cumulative)
    Number of fatal log messages.
  • cc.log_count.info (gauge)
    Number of log messages of severity “info.” Examples of info messages are droplet created, copying package, uploading package, access denied due to insufficient scope, job logging, blobstore actions, staging requests, and app running requests. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
  • cc.log_count.off (gauge)
    Number of log messages of severity “off.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
  • cc.log_count.warn (cumulative)
    Number of warn log messages.
  • cc.requests.completed (cumulative)
    Number of Cloud Controller API requests completed.
  • cc.requests.outstanding (cumulative)
    Number of Cloud Controller requests made but not completed.
  • cc.tasks_running.count (gauge)
    Number of tasks currently running.
  • cc.tasks_running.memory_in_mb (gauge)
    Memory being consumed by all currently running tasks. Emitted every 30 seconds per VM. This metric is only seen in version 3 of the Cloud Foundry API.
  • cc.thread_info.event_machine.connection_count (gauge)
    Number of open connections to event machine. Emitted every 30 seconds per VM.
  • cc.thread_info.event_machine.resultqueue.num_waiting (gauge)
    Number of scheduled tasks in the result. Emitted every 30 seconds per VM.
  • cc.thread_info.event_machine.resultqueue.size (gauge)
    Number of unscheduled tasks in the result. Emitted every 30 seconds per VM.
  • cc.thread_info.event_machine.threadqueue.num_waiting (gauge)
    Number of scheduled tasks in the threadqueue. Emitted every 30 seconds per VM.
  • cc.thread_info.event_machine.threadqueue.size (gauge)
    Number of unscheduled tasks in the threadqueue. Emitted every 30 seconds per VM.
  • cc.thread_info.thread_count (gauge)
    Total number of threads that are either runnable or stopped. Emitted every 30 seconds per VM.
  • cc.total_users (gauge)
    Total number of users ever created, including inactive users. Emitted every 10 minutes per VM.
  • cc.vitals.cpu (gauge)
    Percentage of CPU used by the Cloud Controller process. Emitted every 30 seconds per VM.
  • cc.vitals.cpu_load_avg (gauge)
    System CPU load averaged over the last 1 minute according to the OS. Emitted every 30 seconds per VM.
  • cc.vitals.mem_bytes (gauge)
    The RSS bytes (resident set size) or real memory of the Cloud Controller process. Emitted every 30 seconds per VM.
  • cc.vitals.mem_free_bytes (gauge)
    Total memory available according to the OS. Emitted every 30 seconds per VM.
  • cc.vitals.mem_used_bytes (gauge)
    Total memory used (active + wired) according to the OS. Emitted every 30 seconds per VM.
  • cc.vitals.num_cores (gauge)
    The number of CPUs of a host machine. Emitted every 30 seconds per VM.
  • cc.vitals.uptime (gauge)
    The uptime of the Cloud Controller process in seconds. Emitted every 30 seconds per VM.
  • cc_uploader.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • cc_uploader.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • cc_uploader.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • cc_uploader.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • cc_uploader.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • cc_uploader.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • cc_uploader.numCPUS (gauge)
    Number of CPUs on the machine.
  • cc_uploader.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • container.cpu_percentage (gauge)
    Percentage of CPU used by this container
  • container.disk_bytes (gauge)
    Number of bytes of disk used by this container
  • container.disk_bytes_quota (gauge)
    Number of bytes of disk allowed for this container
  • container.memory_bytes (gauge)
    Number of bytes of RAM used by this container
  • container.memory_bytes_quota (gauge)
    Number of bytes of RAM allocated to this container
  • etcd.CompareAndDeleteFail (gauge)
    CompareAndDeleteFail operation count. Emitted every 30 seconds.
  • etcd.CompareAndDeleteSuccess (gauge)
    CompareAndDeleteSuccess operation countEmitted every 30 seconds.
  • etcd.CompareAndSwapFail (gauge)
    CompareAndSwapFail operation count. Emitted every 30 seconds.
  • etcd.CompareAndSwapSuccess (gauge)
    CompareAndSwapSuccess operation count. Emitted every 30 seconds.
  • etcd.CreateFail (gauge)
    CreateFail operation count. Emitted every 30 seconds.
  • etcd.CreateSuccess (gauge)
    CreateSuccess operation count. Emitted every 30 seconds.
  • etcd.DeleteFail (gauge)
    DeleteFail operation count. Emitted every 30 seconds.
  • etcd.DeleteSuccess (gauge)
    DeleteSuccess operation count. Emitted every 30 seconds.
  • etcd.EtcdIndex (gauge)
    X-Etcd-Index value from the /stats/store endpoint. Emitted every 30 seconds.
  • etcd.ExpireCount (gauge)
    ExpireCount operation count. Emitted every 30 seconds.
  • etcd.Followers (gauge)
    Number of etcd followers. Emitted every 30 seconds.
  • etcd.GetsFail (gauge)
    GetsFail operation count. Emitted every 30 seconds.
  • etcd.GetsSuccess (gauge)
    GetsSuccess operation count. Emitted every 30 seconds.
  • etcd.IsLeader (gauge)
    1 if the current server is the leader, 0 if it is a follower. Emitted every 30 seconds.
  • etcd.Latency (gauge)
    Current latency in milliseconds from leader to a specific follower. Emitted every 30 seconds.
  • etcd.RaftIndex (gauge)
    X-Raft-Index value from the /stats/store endpoint. Emitted every 30 seconds.
  • etcd.RaftTerm (gauge)
    X-Raft-Term value from the /stats/store endpoint. Emitted every 30 seconds.
  • etcd.ReceivedAppendRequests (gauge)
    Number of append requests this node has processed. Emitted every 30 seconds.
  • etcd.ReceivingBandwidthRate (gauge)
    Number of bytes per second this node is receiving (follower only). Emitted every 30 seconds.
  • etcd.ReceivingRequestRate (gauge)
    Number of requests per second this node is receiving (follower only). Emitted every 30 seconds.
  • etcd.SendingBandwidthRate (gauge)
    Number of bytes per second this node is sending (leader only). This value is undefined on single member clusters. Emitted every 30 seconds.
  • etcd.SendingRequestRate (gauge)
    Number of requests per second this node is sending (leader only). This value is undefined on single member clusters. Emitted every 30 seconds.
  • etcd.SentAppendRequests (gauge)
    Number of requests that this node has sent. Emitted every 30 seconds.
  • etcd.SetsFail (gauge)
    SetsFail operation count. Emitted every 30 seconds.
  • etcd.SetsSuccess (gauge)
    SetsSuccess operation count. Emitted every 30 seconds.
  • etcd.UpdateFail (gauge)
    UpdateFail operation count. Emitted every 30 seconds.
  • etcd.UpdateSuccess (gauge)
    UpdateSuccess operation count. Emitted every 30 seconds.
  • etcd.Watchers (gauge)
    Watchers operation count. Emitted every 30 seconds.
  • file_server.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • file_server.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • file_server.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • file_server.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • file_server.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • file_server.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • file_server.numCPUS (gauge)
    Number of CPUs on the machine.
  • file_server.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • garden_linux.BackingStores (gauge)
    Number of container backing store files. Emitted every 30 seconds.
  • garden_linux.DepotDirs (gauge)
    Number of directories in the Garden depot. Emitted every 30 seconds.
  • garden_linux.LoopDevices (gauge)
    Number of attached loop devices. Emitted every 30 seconds.
  • garden_linux.MetricsReporting (gauge)
    How long it took to emit the BackingStores, DepotDirs, and LoopDevices metrics. Emitted every 30 seconds.
  • garden_linux.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • garden_linux.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • garden_linux.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • garden_linux.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • garden_linux.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • garden_linux.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • garden_linux.numCPUS (gauge)
    Number of CPUs on the machine.
  • garden_linux.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • gorouter.backend_exhausted_conns (cumulative)
  • gorouter.bad_gateways (cumulative)
    Number of bad gateway events.
  • gorouter.responses (cumulative)
    Number of router responses.
  • gorouter.total_requests (cumulative)
    Number of router requests received
  • gorouter.total_routes (gauge)
    Number of registered routes
  • nsync_bulker.DesiredLRPSyncDuration (gauge)
    Time in nanoseconds that the nsync-bulker took to synchronize CF apps and Diego DesiredLRPs. Emitted every 30 seconds.
  • nsync_bulker.LRPsDesired (gauge)
    Cumulative number of LRPs desired through the nsync API. Emitted on each request desiring a new LRP, every 30 seconds.
  • nsync_bulker.LockHeld.v1-locks-nsync_bulker_lock (gauge)
    Whether an nsync-bulker holds the nsync-bulker lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active nsync-bulker.
  • nsync_bulker.LockHeldDuration.v1-locks-nsync_bulker_lock (gauge)
    Time in nanoseconds that the active nsync-bulker has held the convergence lock. Emitted every 30 seconds by the active nsync-bulker.
  • nsync_bulker.NsyncInvalidDesiredLRPsFound (gauge)
    Number of invalid DesiredLRPs found during nsync-bulker periodic synchronization. Emitted every 30 seconds.
  • nsync_bulker.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • nsync_bulker.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • nsync_bulker.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • nsync_bulker.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • nsync_bulker.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • nsync_bulker.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • nsync_bulker.numCPUS (gauge)
    Number of CPUs on the machine.
  • nsync_bulker.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • nsync_listener.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • nsync_listener.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • nsync_listener.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • nsync_listener.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • nsync_listener.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • nsync_listener.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • nsync_listener.numCPUS (gauge)
    Number of CPUs on the machine.
  • nsync_listener.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • rep.CM (gauge)
    Emitted every 30 seconds.
  • rep.CapacityRemainingContainers (gauge)
    Remaining number of containers this cell can host. Emitted every 60 seconds.
  • rep.CapacityRemainingDisk (gauge)
    Amount of disk available to allocate in the cell, in megabytes.
  • rep.CapacityRemainingMemory (gauge)
    Amount of memory available to allocate in the cell, in megabytes.
  • rep.CapacityTotalContainers (gauge)
    Total number of containers this cell can host. Emitted every 60 seconds.
  • rep.CapacityTotalDisk (gauge)
    Total amount of disk in a cell, in megabytes.
  • rep.CapacityTotalMemory (gauge)
    Total amount of memory in a cell, in megabytes.
  • rep.ContainerCount (gauge)
    Number of Diego containers currently running.
  • rep.GardenContainerCreationDuration (gauge)
    Time in nanoseconds that the rep Garden backend took to create a container. Emitted after every successful container creation.
  • rep.LogMessage (gauge)
    Emitted every 30 seconds.
  • rep.RepBulkSyncDuration (gauge)
    Time in nanoseconds that the cell rep took to synchronize the ActualLRPs it has claimed with its actual garden containers. Emitted every 30 seconds by each rep.
  • rep.UnhealthyCell (gauge)
    Number of unhealthy Diego cells
  • rep.logSenderTotalMessagesRead (cumulative)
    Count of application log messages sent by Diego Executor. Emitted every 30 seconds.
  • rep.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • rep.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • rep.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • rep.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • rep.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • rep.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • rep.numCPUS (gauge)
    Number of CPUs on the machine.
  • rep.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • route_emitter.LockHeld.v1-locks-route_emitter_lock (gauge)
    Whether a route-emitter holds the route-emitter lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active route-emitter.
  • route_emitter.LockHeldDuration.v1-locks-route_emitter_lock (gauge)
    Time in nanoseconds that the active route-emitter has held the route-emitter lock. Emitted every 30 seconds by the active route-emitter.
  • route_emitter.MessagesEmitted (cumulative)
    The cumulative number of registration messages that this process has sent. Emitted every 30 seconds.
  • route_emitter.RouteEmitterSyncDuration (gauge)
    Time in nanoseconds that the active route-emitter took to perform its synchronization pass. Emitted every 60 seconds.
  • route_emitter.RoutesRegistered (cumulative)
    Cumulative number of route registrations emitted from the route-emitter as it reacts to changes to LRPs. Emitted every 30 seconds.
  • route_emitter.RoutesSynced (cumulative)
    Cumulative number of route registrations emitted from the route-emitter during its periodic route-table synchronization. Emitted every 30 seconds.
  • route_emitter.RoutesTotal (gauge)
    Number of routes in the route-emitter’s routing table. Emitted every 30 seconds.
  • route_emitter.RoutesUnregistered (cumulative)
    Cumulative number of route unregistrations emitted from the route-emitter as it reacts to changes to LRPs. Emitted every 30 seconds.
  • route_emitter.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • route_emitter.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • route_emitter.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • route_emitter.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • route_emitter.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • route_emitter.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • route_emitter.numCPUS (gauge)
    Number of CPUs on the machine.
  • route_emitter.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • ssh_proxy.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • ssh_proxy.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • ssh_proxy.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • ssh_proxy.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • ssh_proxy.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • ssh_proxy.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • ssh_proxy.numCPUS (gauge)
    Number of CPUs on the machine.
  • ssh_proxy.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process .
  • stager.StagingRequestFailedDuration (gauge)
    Time in nanoseconds that the failed staging task took to run. Emitted each time a staging task fails.
  • stager.StagingRequestSucceededDuration (gauge)
    Time in nanoseconds that the successful staging task took to run. Emitted each time a staging task completes successfully.
  • stager.StagingRequestsFailed (gauge)
    Cumulative number of failed staging tasks handled by each stager. Emitted every time a staging task fails.
  • stager.StagingRequestsSucceeded (gauge)
    Cumulative number of successful staging tasks handled by each stager. Emitted every time a staging task completes successfully.
  • stager.StagingStartRequestsReceived (gauge)
    Cumulative number of requests to start a staging task. Emitted by a stager each time it handles a request.
  • stager.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • stager.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • stager.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • stager.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • stager.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • stager.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • stager.numCPUS (gauge)
    Number of CPUs on the machine.
  • stager.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • syslog_drain_binder.memoryStats.lastGCPauseTimeNS (gauge)
    Duration of the last Garbage Collector pause in nanoseconds.
  • syslog_drain_binder.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • syslog_drain_binder.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • syslog_drain_binder.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • syslog_drain_binder.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • syslog_drain_binder.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • syslog_drain_binder.numCPUS (gauge)
    Number of CPUs on the machine.
  • syslog_drain_binder.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the Doppler process.
  • syslog_drain_binder.pollCount (cumulative)
    Number of times the syslog drain binder has polled the cloud controller for syslog drain bindings. Emitted every 30 seconds.
  • syslog_drain_binder.totalDrains (gauge)
    Number of syslog drains returned by cloud controller. Emitted every 30 seconds.
  • system_metrics_agent.system_cpu_core_idle (gauge)
  • system_metrics_agent.system_cpu_core_sys (gauge)
  • system_metrics_agent.system_cpu_core_user (gauge)
  • system_metrics_agent.system_cpu_core_wait (gauge)
  • system_metrics_agent.system_cpu_idle (gauge)
  • system_metrics_agent.system_cpu_sys (gauge)
  • system_metrics_agent.system_cpu_user (gauge)
  • system_metrics_agent.system_cpu_wait (gauge)
  • system_metrics_agent.system_disk_ephemeral_inode_percent (gauge)
  • system_metrics_agent.system_disk_ephemeral_io_time (cumulative)
  • system_metrics_agent.system_disk_ephemeral_percent (gauge)
  • system_metrics_agent.system_disk_ephemeral_read_bytes (cumulative)
  • system_metrics_agent.system_disk_ephemeral_read_time (cumulative)
  • system_metrics_agent.system_disk_ephemeral_write_bytes (cumulative)
  • system_metrics_agent.system_disk_ephemeral_write_time (cumulative)
  • system_metrics_agent.system_disk_persistent_inode_percent (gauge)
  • system_metrics_agent.system_disk_persistent_io_time (cumulative)
  • system_metrics_agent.system_disk_persistent_percent (gauge)
  • system_metrics_agent.system_disk_persistent_read_bytes (cumulative)
  • system_metrics_agent.system_disk_persistent_read_time (cumulative)
  • system_metrics_agent.system_disk_persistent_write_bytes (cumulative)
  • system_metrics_agent.system_disk_persistent_write_time (cumulative)
  • system_metrics_agent.system_disk_system_inode_percent (gauge)
  • system_metrics_agent.system_disk_system_io_time (cumulative)
  • system_metrics_agent.system_disk_system_percent (gauge)
  • system_metrics_agent.system_disk_system_read_bytes (cumulative)
  • system_metrics_agent.system_disk_system_read_time (cumulative)
  • system_metrics_agent.system_disk_system_write_bytes (cumulative)
  • system_metrics_agent.system_disk_system_write_time (cumulative)
  • system_metrics_agent.system_healthy (gauge)
  • system_metrics_agent.system_load_15m (gauge)
  • system_metrics_agent.system_load_1m (gauge)
  • system_metrics_agent.system_load_5m (gauge)
  • system_metrics_agent.system_mem_kb (gauge)
  • system_metrics_agent.system_mem_percent (gauge)
  • system_metrics_agent.system_network_bytes_received (cumulative)
  • system_metrics_agent.system_network_bytes_sent (cumulative)
  • system_metrics_agent.system_network_drop_in (cumulative)
  • system_metrics_agent.system_network_drop_out (cumulative)
  • system_metrics_agent.system_network_error_in (cumulative)
  • system_metrics_agent.system_network_error_out (cumulative)
  • system_metrics_agent.system_network_ip_forwarding (cumulative)
  • system_metrics_agent.system_network_packets_received (cumulative)
  • system_metrics_agent.system_network_packets_sent (cumulative)
  • system_metrics_agent.system_network_tcp_active_opens (cumulative)
  • system_metrics_agent.system_network_tcp_curr_estab (cumulative)
  • system_metrics_agent.system_network_tcp_retrans_segs (cumulative)
  • system_metrics_agent.system_network_udp_in_errors (cumulative)
  • system_metrics_agent.system_network_udp_lite_in_errors (cumulative)
  • system_metrics_agent.system_network_udp_no_ports (cumulative)
  • system_metrics_agent.system_swap_kb (gauge)
  • system_metrics_agent.system_swap_percent (gauge)
  • tps_listener.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • tps_listener.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • tps_listener.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • tps_listener.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • tps_listener.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • tps_listener.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • tps_listener.numCPUS (gauge)
    Number of CPUs on the machine.
  • tps_listener.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • tps_watcher.LockHeld.v1-locks-tps_watcher_lock (gauge)
    Whether a tps-watcher holds the tps-watcher lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active tps-watcher.
  • tps_watcher.LockHeldDuration.v1-locks-tps_watcher_lock (gauge)
    Time in nanoseconds that the active tps-watcher has held the convergence lock. Emitted every 30 seconds by the active tps-watcher.
  • tps_watcher.memoryStats.lastGCPauseTimeNS (gauge)
    Duration in nanoseconds of the last garbage collector pause.
  • tps_watcher.memoryStats.numBytesAllocated (gauge)
    Instantaneous count of bytes allocated and still in use.
  • tps_watcher.memoryStats.numBytesAllocatedHeap (gauge)
    Instantaneous count of bytes allocated on the main heap and still in use.
  • tps_watcher.memoryStats.numBytesAllocatedStack (gauge)
    Instantaneous count of bytes used by the stack allocator.
  • tps_watcher.memoryStats.numFrees (gauge)
    Lifetime number of memory deallocations.
  • tps_watcher.memoryStats.numMallocs (gauge)
    Lifetime number of memory allocations.
  • tps_watcher.numCPUS (gauge)
    Number of CPUs on the machine. Emitted every 30 seconds.
  • tps_watcher.numGoRoutines (gauge)
    Instantaneous number of active goroutines in the process.
  • uaa.audit_service.client_authentication_count (cumulative)
    Number of client authentication attempts.
  • uaa.audit_service.client_authentication_failure_count (cumulative)
    Number of failed client authentication attempts.
  • uaa.audit_service.principal_authentication_failure_count (cumulative)
    Number of failed principal authentication attempts.
  • uaa.audit_service.principal_not_found_count (cumulative)
    Number of times a non-user was not found.
  • uaa.audit_service.user_authentication_count (cumulative)
    Number of times a user has successfully authenticated.
  • uaa.audit_service.user_authentication_failure_count (cumulative)
    Number of failed user authentication attempts.
  • uaa.audit_service.user_not_found_count (cumulative)
    Number of times a user was not found.
  • uaa.audit_service.user_password_changes (cumulative)
    Number of times a user password has changed.
  • uaa.audit_service.user_password_failures (cumulative)
    Number of times a user password change has failed.

Group container

All of the following metrics are part of the container metric group. All of the non-default metrics below can be turned on by adding container to the monitor config option extraGroups:

  • rep.absolute_entitlement (cumulative)
    The total number of nanoseconds the container is entitled to spend using CPU

  • rep.absolute_usage (cumulative)
    The total number of nanoseconds the container has used CPU

  • rep.container_age (cumulative)
    The total number of nanoseconds the Diego-managed container has been alive

  • rep.cpu (gauge)
    Percentage of time container spent using CPU.

  • rep.disk (gauge)
    Disk space (bytes) in use by this container.

  • rep.disk_quota (gauge)
    User requested disk quota (bytes) set on the DesiredLRP for this container

  • rep.memory (gauge)
    Memory in use by this container (bytes). If the per-instance proxy is enabled, memory usage is scaled set based on the additional memory allocation for the proxy.

  • rep.memory_quota (gauge)
    User requested memory quota (bytes) set on the DesiredLRP for this container

Non-default metrics (version 4.7.0+)

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Dimensions

The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.

Name Description
instance_id The BOSH instance id that pertains to the metric, if any.
source_id The source of the metric