<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Observability on Marcello Barnaba</title>
    <link>https://sindro.me/tags/observability/</link>
    <description>Recent content in Observability on Marcello Barnaba</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 08 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://sindro.me/tags/observability/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Backfilling Two Years of Logs: Enterprise-Grade Observability on a Raspberry Pi</title>
      <link>https://sindro.me/posts/2026-04-08-backfilling-two-years-of-logs/</link>
      <pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://sindro.me/posts/2026-04-08-backfilling-two-years-of-logs/</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://sindro.me/posts/2026-04-08-backfilling-two-years-of-logs/cover.jpg&#34; alt=&#34;BSD daemon with telemetry flowing through an enrichment pipeline into VictoriaLogs&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;I have a FreeBSD server called m42 that&amp;rsquo;s been running for years. It handles email (Postfix + Dovecot + Rspamd), web (nginx with a dozen vhosts), firewall (pf), and all the usual suspects. It generates thousands of log entries per day across four distinct formats: BSD syslog, fail2ban, pf packet filter, and nginx access/error logs.&lt;/p&gt;&#xA;&lt;p&gt;I even wrote &lt;a href=&#34;https://sindro.me/posts/2023-08-17-pfasciilogd-link-pf-and-fail2ban/&#34;&gt;pfasciilogd&lt;/a&gt; back in 2023 to convert pf&amp;rsquo;s binary logs into ASCII text so fail2ban could parse them — a foundational piece that now feeds structured firewall telemetry into the whole pipeline.&lt;/p&gt;&#xA;&lt;p&gt;I also have two years of monthly backups sitting in &lt;a href=&#34;https://restic.net/&#34; target=&#34;_blank&#34;&gt;restic&lt;/a&gt; snapshots. That&amp;rsquo;s roughly 25 million log lines, just&amp;hellip; sitting there. A goldmine of security telemetry, traffic patterns, and attack data — completely unindexed and unsearchable.&lt;/p&gt;&#xA;&lt;p&gt;I built a full observability stack on a Raspberry Pi 5 at home — &lt;a href=&#34;https://docs.victoriametrics.com/victorialogs/&#34; target=&#34;_blank&#34;&gt;VictoriaLogs&lt;/a&gt; for storage, &lt;a href=&#34;https://www.influxdata.com/time-series-platform/telegraf/&#34; target=&#34;_blank&#34;&gt;Telegraf&lt;/a&gt; for processing, &lt;a href=&#34;https://grafana.com/&#34; target=&#34;_blank&#34;&gt;Grafana&lt;/a&gt; for visualization — and then I backfilled every single one of those 25 million entries through the exact same pipeline that processes live data. With full enrichment: GeoIP geolocation, ASN identification, and reverse DNS resolution for every IP address.&lt;/p&gt;&#xA;&lt;p&gt;This is enterprise-grade log management. Running on a $80 single-board computer. In my living room.&lt;/p&gt;&#xA;&lt;h2 id=&#34;why-backfills-are-hard-and-usually-skipped&#34;&gt;Why backfills are hard (and usually skipped)&lt;/h2&gt;&#xA;&lt;p&gt;Let&amp;rsquo;s be honest: nobody does backfills. They&amp;rsquo;re the broccoli of operations work. You know you should, but the effort-to-reward ratio feels terrible.&lt;/p&gt;&#xA;&lt;p&gt;The problem is that a backfill isn&amp;rsquo;t just &amp;ldquo;load old data.&amp;rdquo; Your pipeline has evolved. The parsing rules you wrote six months ago don&amp;rsquo;t match today&amp;rsquo;s processors. The enrichment you added last week doesn&amp;rsquo;t exist in your old backfill scripts. You end up with two classes of data: rich live entries and impoverished historical ones.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
