Service instrumentation
What stats should give u every library serice or subsystme
Service types
Online serving system
Humans use and expect immidiete responseOffline proccesing
No one waiting for the responseBatch jobs
offline proccesing may be done in batch no run continously
Subsystems
Sub-parts of the service - Libraries (no additional configuration should be done this) - Track query count, errors, and latency if calling outside resource. - Internal errors, latency, and general statistics. - Logging - Every line of logging code should increment a counter somewhere.
-
Failures (every failure should increment a counter)
-
Threadpools (Tracks queries, number of threads in use, number of threads)
-
Caches (track queries, hits, misses, and overall latency)
-
Collectors (Export gauge for how long the collection took in seconds)