Back
Featured image of post Lex 01 - Designing a RMM from scratch

Lex 01 - Designing a RMM from scratch

Designing a remote management and monitoring tool from scratch in Go

This will be the start of a hopefully full series for creating a remote monitoring and management tool from scratch in Go. Initially the idea sparked from my job/apprenticeship where I work in the IT department. There we have a RMM tool for managing customer computers (for automation and patch management). As the name suggests, it’s used for monitoring and managing computers, usually allowing for grouping them together.

Suffering from continuously-coming-up-with-new-projects-before-finishing-current-ones, I noted this down, since of course I could create an enterprise class RMM tool. Or more to the point, simply can’t help doing what interests me in the moment. However I’m currently focusing on specific projects at the moment, and this will be a nice change of pace, and gets me to write down the journey so I can look back on it.

This post sadly won’t have any code, and will only go over the

Design

While I haven’t always really designed my projects, I’ve gotten more used to doing so in recent times. It starts off detailing the main components, in this case a server and agent. Then a brainstorm about their respective high-level features, which ended up looking like this:

  • Use osquery to gather information?
    • Custom service that talks to osqueryd and schedules and executes queries
    • Able to push configurations to agents, eg. based on groups/policies
  • Get data from agent
    • IP configuration
    • Connectivity
    • Health
      • Disk
    • Software
    • Logs?
  • Custom groups and policies
    • Group auto-join
  • Schedule scripts
    • Scripts
    • Packages composing of multiple scripts
  • Alerts
    • Based on data thresholds

Server

After the brainstorm of features, I broke down everything for each component further and got into a few more specifics.

  • REST API
  • Manage agents - CRUD
  • Manage groups and policies - CRUD
  • Schedule executions
  • Saving/updating data
    • MongoDB
    • SQLite
    • MariaDB/MySQL
    • Cache
      • Memory
      • Redis
  • Register new agents
  • Plugin library
    • Allow agents to request a certain plugin
    • Alert if requesting a unknown plugin
  • Alerts

The previous idea of using osquery was scrapped as it didn’t seem worth it, and the tool doesn’t have equal support for every platform (mainly Windows and Linux). Performance would be better with it’s own implementation anyway.

CRUD services for agents, groups, and policies. These will use interfaces for storage and cache. Cache, will query storage if nothing found.
Agent manager will handle schedules and queuing of jobs to agents.
Register for handling new agents joining. Agents join using a unique token and get returned an agent token which will be used for authorization.
Plugin library allows agents to download plugins hosted by the server.

Agent

  • Configuration
  • Gather information
    • Plugins for custom querying
  • Execute command
  • Install program

Collector that handles loading plugins, downloading from the server if missing, and queries plugins and built-in collectors for data, returning them to the server.
Executor will execute given task(s) and sending the conclusion back to the server.
 Create interface for collecting data, this way both DataBuiltin and DataPlugin can easily be used.
Installer for installing new software on the agent - might be replaced by executor.

This setup will allow the most flexibility. Using a interface to define the builtin data collection for creating the platform specific implementation of gathering the data. Doing it this way different platforms can more easily be supported by implementing the interface.

Since setups can be so diverse, a plugin system using Hashicorps' plugin module will allow implementing custom and community plugins for gathering data on agents.

High-level overview

I created a diagram to show the interactions between the different internal components, it also helps keeping a overview of the implementations. Here the blue indicates internal components, and the yellow being interfaces that would be implemented.

Metrics

Having setup Grafana recently, I thought it would be great to integrate it with the ecosystem. So there’ll be a custom metrics endpoint to get data, this will also use a interface implementation to support having different implementations if needed. The idea to have /metric endpoints which when requested, will send the request from the server to the agent, which in-turn returns the response from the agent back to Prometheus. It seems this can be done by using relabeling and maybe the multi-target exporter pattern.

Then combining this with the HTTP service-discovery config so agents can be used for targets, returning the RESTful path on the server for that agent e.g. http://lex.example.com/agent/123/metrics.

Conclusion

Those are the basic design outlines, hopefully my ideas made sense. Next post will likely be implementing the API and CRUD actions, using some simple interfaces.

Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy