Kindling
Overview and Concepts
Overview
Trace Profiling
RPC Trace
Roadmap
Prometheus vs. Kindling vs. APM
How Kindling Agent is going to evolve
Installation
Kindling Agent
Requirements
Install Kindling in Kubernetes
Setting up Grafana
FAQ
Download Linux kernel headers
Usage
How to enable Trace Profiling
Prometheus Metrics Description
Use Cases
Service Map and Performance
Observe Java Lock
Developer Guide
Architecture
Build Kindling container image from source codes
Kindling agent-libs 用户态空间数据流
Kindling agent-libs内核态空间数据流
Kindling collector 流水线数据传递流程
Add new eBPF hooks
Kindling probe核心流程
Developer FAQ
Reference
Kindling Java Agent
Overview
Modifications of async-profiler
async-profiler 改造点
Trace Profiling Operation Manual
本文档使用 MrDoc 发布
-
+
home page
Prometheus vs. Kindling vs. APM
# **Prometheus vs. Kindling** The common question for Kindling is "What is the difference between Prometheus and Kindling". As Prometheus is the de-facto monitor tool for CloudNative world. Kindling is not trying to replace Prometheus. The goal of Kindling is trying to help developer understanding the app behavior from the kernel to code. ## What is missing for Prometheus? Prometheus is designed for metric collection and storage, and it works very well. But only Prometheus won't work for your apps in k8s, when an product issue arises. The first missing part is tracing. Prometheus can do metric collection and storage very well, but only discrete metric data can't be inspired for troubleshooting. The trace is the a good way of displaying data in a integration way, all the related metric data can be grouped and correlated into a single trace for issue digging. That's why so many APM tools use the trace for troubleshooting. The second missing part is kernel metric and "Four Golden metric". Kindling is trying to help developer and operator understand the app behavior from kernel to code. So all the missing metric from the kernel like DNS, throughput, TPS, Latency, disk metric, [one trace file read bandwidth, one trace file write bandwidth](), will be collected by Kindling and export to Prometheus in the Prometheus way. The metric which has been collected by Prometheus already, won't be collected again. Kinding is just do a integration with Prometheus to get the full picture of all the metric from kernel to code. # APM vs. Kindling From our point of view, tracing is a good way of grouping all the metric data. APM or distributed tracing tool Zipkin do the job very well. But these tools work only in the code level, if the issue happens in the code, then the issue can fixed quickly by using these tool. But what if the root cause is not the code, and it is caused by network or disk issue. These tools will show indirect affect from code perspective. For instance, if an disk issue happen, the flushing or write syscall takes much more time than it used to be. The developer gets code perspective report, and the code snippet which cause the flush or write syscall, will confuse the developer. It will take a lot of time for the developer to find the root cause, because the issue is not apparent from the code, the developer has to dig into logs for some hints, but usually the hints will be found a few hours later or a few days later. From kernel perspective, the issue can be identified easily, especially the data has been grouped into a rpc trace. Kindling is trying to resolve the product issue from kernel to code. Kindling is trying to identify which part is to be blamed for the production issue first. If it is a code issue, the APM dashboard should be used to resolve the issue. If it is an infrastructure issue, then related metric from kernel will be displayed for further digging. Kindling can't do the distributed tracing without span information. But the trace way is good to identify issue quickly. So Kindling does the partial tracing named as RPC trace. Kindling integrate all the metric information into one [RPC trace](). Usually [RPC trace]() and ServiceMap work together to locate the exact rpc call causing the problem in not very complicated MicroService environment. # Conclusion As above statement, Kinding, APM, Prometheus can be working as a whole monitor set to help developer understand the app behavior from kernel to code. The infrastructure layer can't be regard as static and stable anymore in the Cloud environment, because the infrastructure is software defined which is code too and it do evolves as long as the app code.
xieyun
Oct. 31, 2022, 4:57 p.m.
Share documents
Collection documents
Last
Next
Scan wechat
Copy link
Scan your mobile phone to share
Copy link
关于 MrDoc
觅思文档MrDoc
是
州的先生
开发并开源的在线文档系统,其适合作为个人和小型团队的云笔记、文档和知识库管理工具。
如果觅思文档给你或你的团队带来了帮助,欢迎对作者进行一些打赏捐助,这将有力支持作者持续投入精力更新和维护觅思文档,感谢你的捐助!
>>>捐助鸣谢列表
微信
支付宝
QQ
PayPal
Markdown文件
share
link
type
password
Update password