It would be great if the logs could describe a bit what exactly one has to do to use this as an alternative to Grafana Loki.
How do I get my logs (e.g. local text files from disk like nginx logs, or files that need transformation like systemd journal logs) into ClickHouse in a way that's useful for Telescope?
What kind of indices do I have to configure so that queries are fast? Ideally with some examples.
How can I make that full-text substring search queries are fast (e.g. "unexpected error 123")? When I filter with regex, is that still fast / use indices?
From the docs it isn't quite clear to me how to configure the system so that I can just put a couple TB of logs into it and have queries be fast.
Telescope is primarily focused on log visualization, not on log collection or preparing ClickHouse for storage. The system does not currently provide (and I think will not ever) built-in mechanisms for ingesting logs from any sources.
I will consider providing a how-to guide on setting up log storage in ClickHouse, but I’m afraid I won’t be able to cover all possible scenarios. This is a highly specific topic that depends on the infrastructure and needs of each organization.
If you’re looking for a all-in-one solution that can*both collect and visualize logs, you might want to check out https://www.highlight.io or https://signoz.io or other similar projects.
And also, by the way, I’m not trying to create a "Grafana Loki killer" or a "killer" of any other tool. This is just an open source project - I simply want to build a great log viewer without worrying about how to attract users from Grafana Loki or Elastic or any other tool/product.
A lot of people who operate servers (including me) just want to view and search their logs -- fast and convenient. Your tool provides that. They don't care about whether the backend uses ClickHouse or Postgres or whatever, that's just a pesky detail. They understand they may have to deal with it to some extent, but they don't want to have to become experts at those, and to conclude everything by themselves, just to read their logs.
Also, those things are general-purpose databases, so they don't tell the user how to best set them up so your tool can produce results fast and convenient. So currently, neither side helps the user with that.
That's why it's best if your tool's docs gives some basic tips on how to achieve the most commonly desired goals: Some basic way to get logs into the backend DB (if there's a standard way to do that for text log files and journald, probably fine to just link it), and docs on what indices Telescope needs to be faster than grep for typical log search tasks (ideally with some quick snippet or link on how to set those up, for people who haven't used ClickHouse before).
So overall, it's fine if the tool doesn't do everything. But it should say what it needs to work well.
As someone who has never worked anywhere that tried it out, what do you not like about loki. I've been stuck in the very expensive splunk and opensearch/kibana mines for many years and I find it an amazingly frustrating place to be. I honestly find that I can better debug via logs using grep than either of those tools.
Loki works fine for what it does; the problem is what it lacks.
It doesn't do full-text search indices. So if you just search for some word across all your logs (to find eg when a rare error happened), it is very slow (it runs the equivalent of grep, at 500 MB/s on my machine). If you have a couple TB, it takes half an hour!
As you say, even plain grep is usually faster for such plain linear search.
I want full-text indices so that such searches take milliseconds, or a couple seconds at most.
see to me, having at one point been responsible for maintaining an ES instance for logs (and exporters and all the other bits) I feel like the prices you pay in engineering hours and hardware costs to maintain all those indexes while keeping ES from absolutely melting down is way too high.
I think grep is amazing but yes if you unleash it on 'all the logs' without narrowing yourself down to a time frame first or some other taxonomy is going to be slow. This seems like a skill issue, frankly.
Also full text indexes for all the things are generally FASTER of course, but seconds/milliseconds? How much hardware are you throwing at logs. Most only go to logs in an emergency, during an incident and the like. How much are you paying just to index a bunch of shit that will probably never even be looked at, and how much are you paying for hardware to run queries on those indexes that will be largely idle.
The problems with ES/Splunk for logs is that they were not designed for logs, so they are both, in my view, overkill AND underkill for the task. Full fuzzy text serch is probably overkill, the UI for the task of dealing with log data is underkill. (The cloud bills are certainly overkill)
I'm currently doing platform engineering at a company in the top half of the fortune 500. Honestly, probably about 90-95% of the time when I'm helping a team troubleshoot their service on kubernetes I'm using the kubectl `stern` plugin (shows log streams from all pods that match a label query) and grep/sed/awk/jq if it's ongoing, it's just waaaaay more responsive. If it's a 'weird thing happened last night, investigate' task and I have to go to Kibana it's just a much worse experience overall.
It should not take engineering time to have a database compute full-text indices. In sensible systems, you do "CREATE INDEX" and done.
To search multiple TBs of logs, you need a single 40 $/month server containing an 8 TB SSD running sensible software/index algorithm.
I agree that ElasticSearch is bloated and needs undue engineering time. But it doesn't need to be that way.
For example Quickwit finds things subsecond.
It's a huge improvement when queries go from 10 minutes linear search to instant.
(Its index is still not perfect for me because it doesn't support fully simple exact prefix/infix search, but otherwise it does the job fast with few resources.)
> Full fuzzy text serch is probably overkill
Yes, I think most people don't need fuzzy search for log search. They just need indexed grep.
> I think grep is amazing but yes if you unleash it on 'all the logs' without narrowing yourself down to a time frame first or some other taxonomy is going to be slow. This seems like a skill issue, frankly.
Right, grep is not the tool for the job. It's neglecting all sensible algorithms that solve this problem. It's like saying "I don't use binary search, only linear search", and spend human effort to pre-select the range so that it's fast enough.
When you're searching for the rare bugs, you also can't just limit the the time frame.
I was talking about what it takes to search through a couple TB of logs. I said that with grep and Loki it's slow due to the linear search, and that indexing systems make it much faster (from many minutes to subsecond).
That's independent of whether you have more than just a couple TB of logs. If you have more, you just get more servers. You'll still get the subsecond results that I find so beneficial.
How do I get my logs (e.g. local text files from disk like nginx logs, or files that need transformation like systemd journal logs) into ClickHouse in a way that's useful for Telescope?
What kind of indices do I have to configure so that queries are fast? Ideally with some examples.
How can I make that full-text substring search queries are fast (e.g. "unexpected error 123")? When I filter with regex, is that still fast / use indices?
From the docs it isn't quite clear to me how to configure the system so that I can just put a couple TB of logs into it and have queries be fast.
Thanks!