Skip to content

Monitoring Compute Gateways🔗

Basic overview in the Governance Portal🔗

The Governance Portal allows you to see the most important settings of your Compute Gateway(s) as well as it's running status.

Compute Gateway view in the Governance Portal

Compute Gateway logs🔗

Note

All Apheris components logs are on UTC timezone.

Log Event Ingestion🔗

The Apheris Compute Gateway components emit logs in jsonline format to stdout/stderr. This integrates with any logging system that is tailored towards Kubernetes.

Note

No Apheris component maintains log files.

Log shipping, ingestion and indexing is out of scope of this guide as we cannot reasonably provide meaningful documentation for setting this up. Please find information about setting up log shipping, ingestion and indexing in the documentation for your specific logging system.

If you have further questions, please contact your Apheris representative or reach out via support@apheris.com.

Log Event Format🔗

All Compute Gateway components produce logs in jsonline format (one json document per log event on a single line) and emit them to the containers (and pods) stdout.

The logs are leveled, the default level is info. The log levels can be set via the agent.logLevel and dal.logLevel helm values.

Fields🔗

field description
level the log level of the event
ts timestamp of the event in unix epoch
msg main message
error (optional) error message if present
stacktrace (optional) stacktrace if present

Gateway Agent Logs🔗

The following examples are reformatted for readability.

An example error event:

{
  "level": "error",
  "ts": 1686125489.6039624,
  "caller": "app/result\_adapter.go:19",
  "msg": "receiving event",
  "agent\_id": "35d1f1d5-318a-458e-9432-97d892c6c296",
  "error": "Get \\"<http://orchestrator/computations\\>": dial tcp: lookup orchestrator on 10.96.0.10:53: server misbehaving",
  "stacktrace": "main.resultAdapter.func1\\n\\t/go/src/app/result\_adapter.go:19"
}

An example computation request event:

{
  "level": "info",
  "ts": 1710169487.2425287,
  "caller": "agent/computation\_pipeline.go:186",
  "msg": "computation request",
  "agent\_id": "c4e84dc3-3248-44b2-890b-b4b6f0b472d0",
  "request": {
    "id": "a1f76a60-300c-43cd-af9a-f7f3cfec9e69",
    "resources": {
      "cpu": 0.5,
      "memory": 500
    },
    "authentication": {
      "userSession": "..."
    },
    "execution": {
      "image": "quay.io/apheris/statistics:0.3.0",
      "dataSources": \[
        {
          "path": "s3://apheris-tutorials-data/whas/worcester/data.csv",
          "key": "whas1\_gateway-1\_org-1"
        }
      \],
      "Parameters": {
        "NvflareParameters": {
          "arguments": "-u -m nvflare.private.fed.app.client.client\_train -m /workspace -s fed\_client.json --set secure\_train=true uid=f44f2052-659a-43fd-84f8-8942627d222c org=org\_yJz0JV5nAkFTkyl9 config\_folder=config",
          "deploymentID": "88aaf187-3ca2-4460-9271-359b1a4ef57d"
        }
      },
      "Statement": {
        "NvflareStatement": {
          "command": "/usr/local/bin/python3"
        }
      }
    },
    "replicas": 1
  }
}

An example heartbeat error event:

{
  "level": "error",
  "ts": 1687431881.4542866,
  "caller": "app/main.go:179",
  "msg": "heartbeat",
  "agent\_id": "972a5b9d-d67e-4474-a3fb-1240cbfedd67",
  "error": "error response from server: <html>\\r\\n<head><title>504 Gateway Time-out</title></head>\\r\\n<body>\\r\\n<center><h1>504 Gateway Time-out</h1></center>\\r\\n</body>\\r\\n</html>\\r\\n",
  "stacktrace": "main.main.func4\\n\\t/go/src/app/main.go:179\\ngithub.com/apheris/node-agent/pkg/orchestrator.Client.GatewayHeartbeat.func1\\n\\t/go/src/app/pkg/orchestrator/orchestrator.go:155"
}
Notable events🔗
msg field level when description
"configuration" info once at startup agent configuration
"computation request" info for every computation request event the entire payload of the computation request event
"heartbeat" error for every heartbeat error event the error message and the stacktrace of the heartbeat error event

Data Access Layer (DAL) Logs🔗

The following examples are reformatted for readability.

An example data access log event:

{
  "level": "info",
  "ts": 1709735744.1490877,
  "caller": "dal/http\_middleware.go:58",
  "msg": "request",
  "instance\_id": "64646a19-62f0-43c1-9c6a-30844a31f749",
  "http\_status": 200,
  "http\_method": "GET",
  "url": "/datasets/s3://apheris-tutorials-data/whas/worcester/data.csv",
  "request\_duration": 0.327430324,
  "error": ""
}
Notable events🔗
msg field level when description
"configuration" info once at startup agent configuration
"request" info for every request for a dataset that DAL (Data Access Layer) serves includes the dataset url (as url field)