Timo Dechau

The Wave — notes on a shifting analytics landscape

Sat, 28 Mar 2026 00:00:00 GMT

In January I decided I won't build a commercial product. Building is not the problem anymore. With what's available now you can spin up something functional over a weekend. I did it four times in the last few months. Four tools I actually use regularly, built in sessions I wouldn't even call serious. The coding part, which was always my bottleneck, not my natural habitat, is no longer the bottleneck. So the blocker I always had is gone. And still I decided not to go ahead. The real problem is distribution. Getting people to find it, trust it, pay for it, keep paying for it. That hasn't changed. It's still brutal, it still takes years, and it's still not fun — especially when consulting pays well and gives you faster, cleaner feedback. When you stack the two paths next to each other honestly, the product path becomes much harder. But that's not the whole story. Underneath the distribution reason there was something else sitting. A question I kept coming back to: even if I built it and distributed it, does it have a place in the world that will still exist in two years? That question is what this post is about. ## The Pattern ![The pattern of AI reception among data people on LinkedIn](/images/posts/the-wave/the-wave-ill-1.png) If you follow data people on LinkedIn you've watched something play out over the last six to twelve months. The arc is pretty consistent. First came the confident skepticism. "Sure, AI does interesting things, but I don't really see it in data engineering yet." Said with authority, sometimes with a little smugness. The reasoning was fair enough — try a one-shot prompt, hit the limitations fast, conclude the technology isn't there. Then came the grudging acknowledgment. The results got harder to dismiss. People started posting Claude Code outputs that were genuinely impressive. The tone shifted from skepticism to qualification. "Yes it can write the SQL, but can it design a semantic layer?" "Yes it builds the pipeline, but what about edge cases?" "Yes it works for simple setups, but real production environments are different." The "but" is always real. The limitations exist. Nobody is claiming otherwise. AI-bros excluded from that. The problem is that the "but" keeps moving. What required heavy context engineering six months ago works reasonably out of the box now. What works out of the box now will probably require almost nothing in another six months. Take the delta between where things were and where they are, apply a conservative discount, project forward. The curve is still steep. Most people engaging with this are testing the waters. One-shot prompts, quick experiments, surface-level conclusions. The people who go deeper — who actually invest in context, who push the tools past the obvious use cases — are finding the limitations and potentials further out than the skeptics suggest. The wave isn't theoretical anymore. The question is just how fast it's moving and what it hits first. ## Walking the Stack ![The data stack layers and where AI stands](/images/posts/the-wave/the-wave-ill-2.png) Let me go through the data stack layer by layer. Not to be dramatic about it, but to be honest about where things actually stand. ### Ingestion For the 90% case, this is largely done. Give Claude Code a reasonably documented API, a principle file with some basic opinions, point it at DLT for the pipeline scaffolding and Dagster for orchestration, and you get something production-ready in a session⁴. I have a setup running on a cheap VPS right now that took about an hour end to end. Is it the most bulletproof thing in the world? No. But it runs, it handles schema changes, and it does what it needs to do. The 10% case — the nightmare APIs, the high-volume edge cases, the stuff that gives engineers bad dreams — is real. People write a lot of blog posts about it. It's also mostly not what most teams deal with most of the time. For the actual work, the common work, the wave has already arrived. ### Transformation This one requires some honesty about the bar. Claude Code will build you a dbt setup that looks like a lot dbt setups in the wild. Which is to say: not elegant, not deeply considered, functional enough. If you've spent time in the intermediate layers of real production data models you know what lives there. Scary things. Half-finished logic, redundant CTEs, naming that made sense to one person once. Claude Code produces roughly the same. The difference is it produces it in minutes. Add some context: opinions about data modeling, a skill file with your preferred patterns — and it gets noticeably better. I have a handful of these now and the output quality jumps. The argument that transformation requires deep human expertise to get right is true. It's also true that most transformation layers in production weren't built with deep human expertise. The bar Claude Code has to clear is not the theoretical ideal. It's the actual average. ### Semantic layer and metrics This is where people feel safest. And they're not entirely wrong, but the reason they're right is not the one they usually give. The semantic layer is still hard. Metrics are still contested and messy. Business definitions still vary by team, by context, by who you ask on which day. AI doesn't solve this. But here's the thing. Humans haven't solved this either. Every data team I've worked with has struggled with metric alignment. Not because of tooling, not because of technical complexity — because getting different business teams to agree on a shared definition of revenue, or activation, or churn, requires ongoing organizational work that most companies don't do well. It requires someone with enough authority and persistence to drive it through. That failure predates AI. AI doesn't fix it. But it also doesn't make it worse. The semantic layer is unsolved for the same reason it's always been unsolved. Which means it's not really high ground — it's just a different kind of stuck. ## The missing piece ![Data infrastructure flows to the decision point but context is missing](/images/posts/the-wave/the-wave-ill-3.png) The whole point of a data setup, underneath all the ingestion, transformation, and dashboards, is decision preparation and sometimes execution. Giving people solid context to make a call about a business process, an investment, a strategic direction. That's the job. Everything else is infrastructure for it. And right now, that job is where AI genuinely struggles. Not because the models aren't capable enough. Because the context isn't there. To prepare a good decision you don't just need the data. You need to know what the decision actually is. You need to understand the business strategy behind it. You need to know which teams are involved, how they work, what they're optimizing for, where the politics sit. You need the full picture, not just the numbers, but the organizational layer that gives the numbers meaning. A really good data person does exactly this. They don't just pull metrics. They go deep into how the business operates, climb up to the strategy layer, and hold both at the same time. That's what makes the difference between an analysis that gets used and one that gets filed away. Right now if you open Claude and throw a business problem at it, you get something useful but incomplete. The model is working without the organizational context that a good analyst carries in their head. You can feel the gap. But I don't think this gap stays open much longer. The threshold that changes everything is organizational memory¹. Right now Claude sessions are personal and isolated. Your context is yours. What's completely missing is a shared company memory — the accumulated knowledge of how an organization thinks, decides, and operates. There are tools trying to bridge this. They work as an intermediate. I'm not sure how long they stay relevant as an intermediate. Once the memory problem is solved at the organizational level, the next piece is the loop. The ability to refine, evaluate, and improve outputs over time without constant human steering. This is already working impressively at the personal level² ³. The extrapolation to the organizational level is not a huge leap — it's more an engineering and architecture problem than a conceptual one. Put organizational memory and loops together, apply them to a company's data and strategic context, and decision preparation stops being a human-led process with data as an input. It becomes something closer to decision assistance. And from there, in some domains, decision making. We are not there. But we are also not far. ## The Middle Collapses ![Applications squeezed down to managed services and infrastructure](/images/posts/the-wave/the-wave-ill-4.png) So where does this leave the tools? Infrastructure will stick around. The picks-and-shovels layer: compute, storage, the pipes that move data at scale. That has a clear value proposition that doesn't depend on what sits on top of it. Managed services will stick around too. Having someone else take responsibility for running and maintaining a system still has real value, and that value doesn't disappear just because building the system got cheaper. But it might increase the pricing pressure. The danger zone is the middle. Most data tooling lives in the middle. Tools that do one or two jobs well enough. A pipeline tool. A transformation layer. A dashboard product. A product analytics platform. Each one solving a specific problem, each one charging for that solution. Sounds like modern data stack spirit. That value equation is getting squeezed from both ends simultaneously. From below: anyone with an engineering mindset can now build a functional version of most of these tools for their own use case over a weekend. Not for Amazon scale. Not for every edge case. But for the actual problem they have, in their actual organization, without paying for a seat or negotiating a contract or waiting for a feature request. From above: the AI platform layer is eating features steadily. Not all features, not immediately, but consistently. And the pace is not slowing down. The tools caught in the middle have to answer a question that's getting harder to answer convincingly: what do you offer that justifies the distribution tax? The sales cycle, the onboarding, the ongoing cost, the organizational buy-in required to adopt and maintain a vendor relationship. That tax was always real. It was worth paying when the alternative was months of engineering work. When the alternative is a weekend session and/or a principle file, the math changes. I'm not saying every tool in the middle dies tomorrow. Distribution is still hard, switching costs are real, and organizations move slowly. There's runway. But the direction is clear enough. ## The Honest Ending New things will emerge. They always do. The history of technology is not a history of categories dying and leaving nothing behind. It's a history of categories transforming, compressing, and spawning things that weren't predictable from the previous vantage point. The wave doesn't leave an empty beach. But I can't tell you what the new things are. And I'm suspicious of anyone who can. Right now there's too much in motion. The organizational memory problem isn't solved yet. The loops are early. The platform capabilities are improving faster than most people's mental models of them. Making confident predictions about what data work looks like in three years feels like reading a map of a coastline that's actively changing shape. What I can say is this. If you are a data practitioner, the question worth sitting with is not "will AI replace me." That's the wrong frame. The better question is: how much of what I do right now lives in the middle? How much of my value is tied to work that exists because building and maintaining things was hard? And how much of it is tied to the organizational layer: the strategy, the context, the human judgment about what decisions actually matter? The first kind of value is getting compressed. The second kind is harder to compress. The gap between them is where the wave hits hardest. I decided not to build a product because I could see the wave clearly enough. Not all of it. Not where it ends up. But clearly enough to focus on other things (my good old friend/foe consulting). That's probably enough to work with. ---

1 For an interesting deep dive into how memory systems for AI agents work, see I Studied OpenClaw Memory System.

2 Ralph — an autonomous agent that runs eval-driven improvement loops.

3 Gastown — Steve Yegge's experiment with self-improving AI agents.

4 Building a production data pipeline with Claude Code — a walkthrough showing how well this works in practice.

It's about the strategy, stupid

Mon, 23 Feb 2026 00:00:00 GMT

"What can you see in the data?" The client asked me this while I was scanning through their analytics setup. Ten curious eyes were looking at me, waiting. I was actually looking for something completely different. I wasn't hunting for insights. I was just checking the shape of the data - what kind of events they're collecting, how things are structured. My usual "let me get an overview" check that I do at the start of every project. But the group sitting around me was expecting something else entirely. They wanted answers. The data should tell us something about the business, right? I closed my laptop. This question has followed me since day one of working with analytics setups. And I can tell you - for the first two or three years, I always tried to give something. I'd point at the screen and say things like "the bounce rate here looks worth investigating" or "I'm not sure about this landing page performance." I wanted to deliver. I didn't want to say what was actually true: I have no idea. When you take a first look at a dataset, you have no idea. It's actually the wrong thing to do. It took me six or seven years to understand why I couldn't tell anything meaningful from that initial scan. Nowadays, I do it completely differently. I don't look at the data anymore. Not at the start. When clients ask me in initial workshops if I want to see their analytics account or look at their dataset, they get confused when I say no. "Why don't you want to see the data?" Because right now, I'm not interested in your data. I'm interested in understanding how your business works. ## The tactics trap Analytics content is almost always tactical. Scroll through LinkedIn and you'll see it everywhere: "We just moved the client to server-side tag management and tracking improved significantly." "We're using this specific retention analysis and it gave us this insight." "Here's how you should measure your data." These aren't wrong. But they're usually at the end of something else. Hopefully, they're at the end of something else - and not just ticking off boxes from a catalog of what you can do. Unfortunately, most of the work in this space is exactly that. We have backlogs filled with tactics that we deploy because they're on the list, not because they're the right thing for this specific situation. ![](/images/posts/its-about-the-strategy-stupid/image.png) ### The checklist approach I can still remember my first four to five years of analytics projects. We always had templates for analytics audits. Over time, these templates became quite sophisticated. You make experiences, you find weird stuff, you add it to the list. Eventually, you have a comprehensive document that covers everything. The process works like a car check. Someone drives their data setup into your garage. You take your checklist, go through all the different things, tick your boxes, write short summaries. "This looks okay." "This is maybe not so good." "This needs attention." This is still valuable work. There are enough cases where these audits discovered severe problems. Things that genuinely needed fixing. But here's the issue: it's often presented as the one thing you have to do. The checklist becomes the strategy. You go through the list, you find things that don't match best practices, you recommend fixing them. And none of it is wrong - no one is faking these results. The audit will discover real problems in a setup. The recommendations will be technically valid. ![](/images/posts/its-about-the-strategy-stupid/image-1.png) Take my favorite candidate: server-side tagging. People who know my content already know this example. Recommending server-side tagging isn't wrong. It won't produce bad data. But the question remains: are we actually doing the right thing? Or are we just playing the standard program? Or to make it worse, are we just selling hours? ### The consultant triangle There's a classic pattern in analytics consulting. You do an audit to show people what they're missing. Then you sell an implementation project to fix the things you found. Then you hope they stick around on a retainer. It can become a small, self-sustaining world. Nothing on these lists is wrong. The audits discover real problems. The implementations address real gaps. The retainer keeps things maintained. Everyone feels productive. But productive isn't the same as strategic. Let me give you a counter-example. Together with [Barbara](https://www.linkedin.com/in/barbara-galiza/), I run [FixMyTracking](https://fixmytracking.com), where we also do audits. We also go through measurement setups and identify parts that aren't implemented well. The output looks similar on the surface. The difference is how we frame these projects. First, we only analyze ad platform measurement - not general analytics tracking. Why? Because when you improve measurement for advertising platforms, you see the immediate impact. Technical fixes directly translate into performance improvements, and you can verify them quickly by checking your costs or performance metrics. ![](/images/posts/its-about-the-strategy-stupid/image-2.png) Second, one of our most important criteria is ad spend. We check how much the client is actually investing in the channels we'd be working on. All this tinkering and checkbox-ticking doesn't make sense if advertising plays a 10% role in their business. We want to make sure the channels we're optimizing are actually a driving force in acquisition. Same type of work. Same tactics. But filtered through a strategic question: will this actually matter for this specific business? And that's the bridge. That's what all these tactics are missing. The strategy. ## So what is the strategy here? If tactics are the problem, strategy is the answer. But what does that actually mean in practice? It's not about writing strategy documents or having off-site meetings. It starts much simpler: with the questions you ask before you touch any data. ### What will you do with this? The simplest implementation of strategy is to ask what you're actually planning to do with the data. When I work with product teams, one of the things I want to understand is what role data will actually play in their daily work. I want to understand how they make backlog decisions. I want to understand how they treat features. Do they treat features as hypotheses - where they need fast feedback to decide how much more time to invest? Or do they treat features as building blocks that will be built and maintained anyway? This makes a big difference. If you build features for features' sake, then data about how those features are used, how they got adopted, what role they play in retention or revenue - that data has a different weight. The feature will be built or maintained regardless. I'm not saying data is useless here. But the impact will be smaller and slower. Compare that to treating features as clear hypotheses. Here, you want feedback as fast as possible. This feedback is essential to decide: do we invest more, or do we stop? The data setup matters in a completely different way. Or maybe you already know this feature is strategically important. You're committed. But you want to use data to make it work - to optimize, to find problems early, to prove impact. Same data collection. Three very different levels of impact depending on how the team actually operates. ### The two levels of the question There's a trick that gets posted on LinkedIn once or twice a month. When the business asks your data team for a specific report or metric - something urgent, something they need immediately - you ask a simple question back: "Okay, we'll calculate this metric. And let's say it drops by 20% from one week to the next. Or it increases by 30%. What actions will you start based on this insight?" This already surfaces a lot. Sometimes people realize they don't actually know what they'd do. The urgency fades. But this is only the first level of the strategy question. You're testing if there's any action attached to the insight. The second level goes deeper: "Can you show me where this insight sits in your current strategy?" This is a different question. Before we implement anything, I want to understand the bigger picture. Not just "what would you do if the number moves" but "how does this connect to what you're actually trying to achieve?" Without this, you're just producing metrics. With it, you're producing something that matters. ### The uncomfortable truth These conversations don't always end well. I've done product analytics workshops where we determined - together, during the workshop - that yes, we'll do the setup. It can still give some insights into how the product is performing. But given how this team currently operates, data won't play a significant role in their decisions. ![](/images/posts/its-about-the-strategy-stupid/image-3.png) This doesn't make teams happy. They came to the workshop believing data would make their life easier. And now someone is telling them that the setup we're building probably won't have the impact they're dreaming of. But it's an incomplete picture if we don't talk about this. And these projects are never about changing how a team works - that's a completely different kind of effort, a much longer process. Here, it's more about setting the scene. We can build a very effective data setup. But its impact depends on the role it will play in the operations of this team. That's not a data problem. That's a strategy problem. And it is essential to be transparent about this. ## The window of opportunity So strategy matters. But why does it matter so much for data and analytics teams specifically? Because of constraints. Every team has them, but data teams feel them acutely - and the typical response only makes things worse. ### The constraint reality Data and analytics teams, like every other team, have natural constraints. You have a specific number of people who can work on things. You have a specific amount of data that's currently available. You have a specific level of data quality. And there are probably a handful more constraints that define your situation. The team is usually well aware of these constraints. They're trying to do their best job possible. But these constraints allow only a specific type and amount of work to be delivered. Every data team knows its backlog. The ever-expanding, not-so-fun-to-look-at backlog with 40 or 50 items in it, some of them already two years old. And everyone who has worked with data teams knows the other side. You have a specific need for an insight. You follow the rules. You write a ticket. And you never hear back. Maybe once or twice you ask about it. "Hey, what's happening with this issue?" The answer: "Yes, it's still in our backlog. We have some other priority items. Maybe it can make it to the sprint in a month or so." The obvious solution is to hire more people. Unfortunately, in every setup I've seen, this never solves the problem. It just makes it bigger. ![](/images/posts/its-about-the-strategy-stupid/image-4.png) What you actually have is a small window of opportunity. After maintenance work, after the fundamental things that have to be done, there's an amount of hours left each month where you can work on real analyst work - producing insights for the business. This window is finite. So the question becomes: what goes into that window? In the standard approach, people drop everything into the backlog, and you pick the things from whoever shouts loudest in Slack channels or whoever is high enough in the hierarchy that ignoring them would be painful. This is usually not the best use of your time. ### The city trip Think about going on a trip to a really nice city. One approach: you check into your hotel, and every morning you just go outside and walk around. When you see something that might be interesting, you do it. I'm not saying this isn't a nice type of holiday. Sometimes it can be fun. But let's say you're really interested in exploring the cool things this city has to offer. You have preferences. You have limited time. The random wandering approach won't give you an ideal output. So what do you do instead? You sit down. You look at what's actually possible in this city - what attractions exist. You map this to what you actually like to do. You check reviews to see if things are as good as they sound. You might ask someone you trust who's been there before, someone who thinks similarly to you. Or you ask an expert - someone who can listen to what you like and give you tailored recommendations. Based on all this, you build an itinerary. One that fits your time, covers the things that matter to you, and makes the most of the trip. Why don't we do the same thing for how we handle data work? We have a small window of opportunity. We have constraints. And yet the standard approach is to wake up, walk outside, and react to whatever happens to cross our path - or whoever happens to shout the loudest. There's a better way. And it starts with understanding where the organization is actually trying to go. ## Join the party Planning a trip is one thing. But where do you actually look to figure out what matters? For data teams, the answer is closer than you think. It's already there - in how your organization operates. ### Strategy trickles down Organizations have constraints too. An organization can usually only push in one specific direction at a time. If you have 200 people all working in different directions, you're usually pretty ineffective. This is why organizations have leadership. And the way leadership presents the direction of the company is usually called strategy. Strategy can take different forms. Sometimes it's annoyingly vague - so vague you wonder if you can even call it strategy. Other times, it's powerful because it gives a clear direction for where things need to move. In an ideal setup, strategy trickles down. You have a specific strategic push that the organization is focusing on for the next 6 to 12 months. Marketing builds their own strategy around this push. Product does the same. Here's the trick: when you work in the data and analytics team, you want to be part of this party. Since we've been talking about constraints - limited time, limited resources, needing the right data to do things - if you want to use all of this effectively, one thing that works really well is to align with the strategy the whole organization is currently thinking about. Let me give you an example. Say your organization is thinking seriously about how to add AI to what you're doing. Not just slapping an AI label on things, but genuinely asking: which parts of our business could be improved by current AI capabilities? How can we provide more value to our customers with this? The organization identifies an opportunity and it becomes an organizational push. Product and development are tasked with rolling it out. Marketing is tasked with building awareness - small at first for early versions, bigger as confidence grows. The data team now has an opportunity to join this. How can we support this strategic push? You could develop adoption metrics to see early signals of whether this direction is working. You could create proxies to detect changes in retention patterns as people start using the new features. You could build dashboards that show early revenue signals, making transparent what's actually happening. ![](/images/posts/its-about-the-strategy-stupid/image-5.png) Maybe you even surface that revenue will take a dip because the new features improve customer experience, but the pricing model hasn't caught up yet. That's valuable. It makes things visible so the company can adapt - maybe introduce different pricing later. Data helps make this transparent. When you focus on the strategic direction the company is taking, you're working on something where effort and resources are already flowing. You can make a difference. You can make an impact. ### The simple contrast Imagine the company has been working on this new initiative for months. Everyone contributed. And then someone asks the data team: what did you do during this time? "We implemented server-side tagging. We can now track 3% more people on our website." Everyone nods politely. Sounds good, technically. But you won't get applause for that. Compare that to: "We built adoption metrics for the new AI features. We created early signal dashboards for retention impact. We surfaced that revenue will dip initially and helped frame the pricing conversation. We made visible what was working and what needed attention." Same data team. Same constraints. Same window of opportunity. One played the standard program. The other aligned with what the organization actually cared about. Let me return to that moment. Ten curious eyes. The closed laptop. The reason I changed my approach so significantly is this: when I do these projects now, I don't look at any data. I don't analyze what's there. I'll look at what data is available - but much later. Only after we've developed a plan for where we want to go, what we want to measure, and how we can support the business best. Then I look at the data to see what's already available. What can we pick from there? Where's the gap between what we want to achieve and what's currently measured? That's when the data becomes useful. Where I spend most of my time now is at the beginning - understanding how the business works. When I talk with marketing, I want to know: where do you build awareness? How do you support first-attracted accounts in their discovery phase? How do you convert them into your onboarding? I want to understand the essential touchpoints. The levers. And then: what's your strategy? What were you pushing for last year? How well did it work? Do you know why it worked or didn't? What's your strategic push for the next couple of months? This gives me the picture I need. The one or two areas where improving the data setup can actually move things. It's never precise science. Sometimes we just know that we need to make things visible first before we can take the next step. That's fine. Because we know the direction. Interestingly, in these setups, no one asks me anymore what I can see in the data. Because we've already shifted the question. Not "what can you see?" but "what do we actually want to see?"

You only know when you try

Sun, 25 Jan 2026 00:00:00 GMT

I was listening to Ben Thompson and John Gruber discussing their VisionPro experience watching NBA games. They were frustrated. Apple keeps switching your perspective mid-game - you're sitting courtside, and suddenly you're teleported somewhere else. No choice. They just do it for you. And my first thought was: couldn't Apple have figured this out before shipping? It feels so obvious. If I'm watching a football match from a specific seat in the stadium, I don't want to be yanked to a different spot without asking. Of course, that's annoying. But then I caught myself. Was it actually obvious? Or does it only feel obvious now because someone experienced it? This triggered something I've been wrestling with for years. ## The Frustration I Kept Having With Myself Last year, I finally tried a subscription model for my content. The idea had been bouncing around in my head for almost two years. Maybe I could turn my newsletter into a paid subscription. Maybe that's the right business model for what I'm building. I talked to a lot of people about it. Everyone had opinions. Good idea because of this. Bad idea because of that. Classic pros and cons. I collected an extensive list. Once the AI models got good enough to brainstorm properly, I ran sessions with them too. Different angles, different scenarios, different considerations. And then one of those AI sessions just said: "You only know when you try." I hated that answer. Why do I always have to try things to know if they work? ![](/images/posts/you-only-know-when-you-try/image-20.png) Because this wasn't the first time. When I got started with YouTube, I had the same question spinning in my head. Would it be a good idea to do YouTube or not? I weighed the options. I thought about it. And then I just started - and only then did I actually find out what worked and what didn't. Same with consulting projects. I'd think: maybe this type of project is a good fit for me. Looks interesting on paper. And then I'd start the project, and after a week I'd know immediately - no, this type doesn't work. But by then, I was committed. Stuck with it until I delivered what I promised. Every time, the same pattern. I'd discover something that felt completely obvious in hindsight. And then I'd have these moments with myself. Not quite self-hate, but a softer version of it. This voice says: " You could have known before. Wasn't it obvious enough? Everything you discovered now was already on your list." This kept happening. And I kept blaming myself for not seeing it earlier. ## The Dimension That's Missing And then, while listening to that podcast episode, something clicked. When you plan things out, you can sketch scenarios. You can list what might happen. You can imagine how it could play out. But because it's not actually happening, something is missing. A dimension. Think of it as 2D versus 3D. Planning is flat. You see the shape of something - the outline, the structure. But you don't see the depth. You can't. It's not a limitation of your list or your thinking. It's a limitation of the medium. We need the lived experience to actually see. That's how we're built as humans. When I realized this, I found a kind of peace I'd been looking for. It's not that I failed at planning. It's not that I should have been smarter or more thorough. The information on my list was never going to be enough - because information isn't experience. They're different dimensions. But here's what's interesting: the jump from 2D to 3D isn't binary. There are steps in between. ### Talking and Writing: 2.5D You start adding depth before you reach full reality. I've been working on a new approach to data modeling - using jobs-to-be-done as a structuring principle instead of traditional layers. I had the idea sketched out. It made sense on paper. Then I started talking to people about it. And something interesting happened. While I was explaining the approach - even before they said anything, even just seeing their face on the camera - I started to see things I hadn't seen before. I was articulating something, and in that moment of articulation, I added new layers. I started to refine specific parts. I noticed gaps. The same thing happened when I wrote about it on LinkedIn. Some comments were what I expected - yes, makes sense, I agree. Not really adding depth. But there were one or two that hit differently. A perspective I didn't have. An angle I hadn't considered. ![](/images/posts/you-only-know-when-you-try/image-21.png) This is 2.5D. You're still not in full reality. You haven't tried it yet. But you're no longer flat either. The act of talking, explaining, writing - it forces you to rotate the idea. You see it from angles that pure thinking doesn't reach. ### AI: Also 2.5D When AI tools first became useful for daily work, everyone talked about speed. You can do things so much faster now. The model just writes it for you. That never matched my experience. I wrote about this in my 2025 reflections - yes, on some measures, things got quicker. But that's not what actually changed. What changed is that things got broader. AI lets me explore angles. Lots of them, quickly. Take that jobs-to-be-done data modeling idea. I can take a model and say: here's a business, here are the metrics they care about, here's what we want to implement. Now here's my new approach. Let's play it out. What happens in the typical scenarios that occur when you work on a data stack for two years? Let's brainstorm how these could look with this approach. And the model will run through scenarios. It'll find edges. It'll surface things I might not have thought about in that moment. This is valuable. This adds depth. ![](/images/posts/you-only-know-when-you-try/image-22.png) But is it 3D? No. It's still simulation. I'm a big science fiction reader. And in almost every book with virtual reality, there's this moment where the simulation feels too flat somewhere. Something is off. The depth isn't quite right. That's what AI feels like to me right now - incredibly useful for rotating an idea, seeing it from many sides. But not the same as actually being in it. ### What's In The Gap? So what's actually in that space between 2.5D and 3D? What does reality provide that simulation can't? I have no idea. But I think it's something about consequences. When you're planning, talking, writing, or even running scenarios with AI, nothing is at stake yet. You can rotate the idea endlessly. But you're not committed. There's no week two of a consulting project where you realize it's wrong and you still have to deliver. Reality has friction. It pushes back. And apparently, we need that pushback to truly see. The peace I found isn't that I can skip the trying. I can't. None of us can. The peace is that I can stop blaming myself for not knowing beforehand. The information was never going to be enough. It couldn't be. ![](/images/posts/you-only-know-when-you-try/image-23.png) So now the question changes. Not: how can I figure everything out before I start? But: how can I get to 3D faster? How can I try sooner, smaller, cheaper - so I learn what only reality can teach? Maybe that's also where AI becomes most useful. Not to replace the trying. But to get through the 2D and 2.5D faster, so you have more time for what actually matters: the real thing.

Data Model Layers and Jobs

Mon, 19 Jan 2026 00:00:00 GMT

I had a hard time getting into data modeling. When I started building my first models for e-commerce use cases, I kept asking the same question: how do we actually do this? What's a good structure? Unfortunately, the people I worked with didn't have a clue either. We had some classic education - we knew what a relational model was, we understood normal forms - but it always felt like these concepts were too primitive. They helped to some degree, but what I was missing was the big picture. So I did what you do: I picked up books. On Amazon, I found the Kimball references. I thought the publication date was a mistake - 1980-something couldn't be right. There had to be something newer. There wasn't, so I got my first Kimball book. I learned about the star schema approach, fact tables, and dimensional tables. It made more sense, but it still wasn't the big picture I was looking for. Interestingly, I never came across Bi Inmon's work back then, which probably would have helped more with the architectural view. Bad Amazon search, I guess, or bad me. This drove me a little crazy. I have a computer science background, and in CS you're taught to think in higher-level concepts. Object-oriented programming. Functional programming. Pure functions. You have frameworks that help you think through different approaches, understand how they differ, see where they shine, and where they struggle. I couldn't find the equivalent for data modeling. Later, I talked to more experienced people, got better book recommendations, and finally gathered enough material to develop my own version of the big picture. And here's the thing I realized: this seems to be pretty common in data modeling. When you talk to practitioners, yes, there's some overlap - but everyone has their own flavor. My usual joke when clients ask if my data models follow a standard approach: if you asked 10 data modelers to build a model for your case, you'd see maybe 30-40% overlap. The rest is variation. Far more than you'd expect from a mature discipline. ![](/images/posts/data-model-layers-and-jobs/image-11.png) So when I stumbled across a LinkedIn post about layers a few years back, something clicked. ## Layers - the organizing principle most of us adapted The LinkedIn post - and I'm sorry I can't remember who wrote it - presented layers as a way to shape data until it's ready for a specific use case. Think of it like filtering water through different materials. Data flows through each layer, getting refined along the way. Not a perfect analogy since you don't end up with pure data at the end - it's still messy - but you get the idea. The concept stuck with me. And once you start looking, you see layers everywhere. When dbt became popular, they established their own layer approach: staging, intermediate, and marts. Then Databricks brought us the medallion architecture - bronze, silver, gold. The medallion version is particularly popular, and I think I know why: it's sales compatible. Salespeople love it because it fits perfectly into a story. "You start with messy data in bronze, refine it to silver, and end up with gold." Gold sounds great. Who doesn't want gold? But here's the thing - whether it's dbt's three layers or medallion's three layers, it's still just three loosely defined buckets. And when you ask people to explain what actually happens in each layer, the answers start to diverge. ### What layers actually do for you At their simplest, layers act like folders. Directories. They structure your data model into distinctive areas. Take dbt's approach. Staging is the welcome mat - you decide what raw data to pull in and apply some basic transformations. Intermediate is where stuff happens, usually business rules, though what exactly happens there is often hard to pin down. Marts is where you shape things for your end application - maybe a star schema for your BI tool, with fact tables and dimensions. This structure helps in practical ways. When your data model grows and you need to add a new source - say, someone introduces a new marketing tool - you know where to start. You don't jump straight to creating a fact table. You begin in staging. ![](/images/posts/data-model-layers-and-jobs/image-12.png) When you notice repeating patterns across your reports - the same calculation happening in three different fact tables - you can centralize it. Maybe that logic belongs in intermediate, so you're not repeating yourself. And there's cost optimization. Raw data from tools like Google Analytics 4 is notoriously wide. Dozens of fields you'll never use. Every query against that raw schema costs money. A staging layer can slim this down - you decide what actually matters for analysis and filter out the rest. For anyone querying GA4 raw data directly, building even a simple staging model is usually the first step toward controlling costs. So layers make sense. They give structure, they enforce some discipline, they help teams orient themselves in a growing codebase. The problem is what happens when you look closer. ### Where layers fall apart In the last year, I spent a lot of time teaching data modeling to two very different audiences: analysts who hadn't worked with data models before, and Gen AI models. Both experiences exposed the same weakness in how we talk about layers. The analysts picked up the basics fast. Fact tables and dimension tables clicked immediately - it maps to how they already think. Measures and dimensional values, that's their world. Wide pre-joined tables? They get it. Precalculated fields make their BI work easier. But then you try to explain the staging layer, and things get fuzzy. "We standardize stuff there." "What do we standardize?" "Well, we align timestamps. Everything gets the same format." They can follow that. But then they ask: "Why do we do it there? Why not just do it on the fact table?" And you say, well, it's good practice to centralize it because we might use it somewhere else. And they look at your model and point out: "But we don't use this timestamp anywhere else. It's only in this one fact table." The deeper you go into the underlying layers, the harder it gets to explain why things are where they are. Teaching Gen AI models made this painfully clear. I've spent enough time with agentic approaches to know how to break down work - how to plan, how to chunk tasks. But even with all that applied, the first data models I generated were not good. I'd look at what ended up in staging and think: no, that doesn't belong there. It didn't fit my flavor. And that's the problem. Flavor. ![](/images/posts/data-model-layers-and-jobs/image-13.png) I tried to write context for the models - clear definitions of what should happen in each layer. This is usually what makes or breaks Gen AI output. But I couldn't write a staging definition that worked everywhere. Because it depends. Here's what I mean. A staging layer can do two very different jobs. A light staging layer does basic transformation. You establish naming conventions - all timestamps become `ts`, you normalize types, you standardize wording. Simple stuff. A strong staging layer does something heavier. In one project, I work with 20-30 different advertising platforms. The staging layer there acts like a data contract. We define exactly how campaign data should look like - these identifiers, this metadata, these measures - and we map every source into that shape. Naming conventions, field mapping, all of it happens there. But does it have to happen in staging? No. You could do a light staging layer and then add a separate mapping layer. Both approaches are valid. There's no decision framework that tells you which to choose. This is the core issue. Layers are loosely coupled. They're an organizing principle, not a strongly typed system. You can't write clear rules for when something belongs in staging versus intermediate versus somewhere else. Which is exactly why, when you look at how different teams implement the same layer concepts, you find wildly different results wearing the same labels. ## Jobs as an alternative framing Before I go further, a caveat: what follows is still a thought experiment. I've been tinkering with this for three or four months. It's not a finished framework. But it's been useful enough that I think it's worth sharing - and I'd love to hear from others who are exploring similar ideas. If you've followed my work, what comes next probably won't surprise you. I have a product background, and sometimes you come across a concept that works so well in one area that you start applying it everywhere. For me, that's Jobs to be Done. JTBD was the framework that finally explained product work to me. And if you dig into it - I highly recommend everything Bob Moesta has written - you realize it's not really a product principle. It's a general principle you can apply to many domains. So when I went back to the drawing board with my Gen AI data modeling experiments, trying to figure out what context to provide, I didn't consciously think "I should frame these transformations as jobs." I just started writing definitions for what each transformation should accomplish. And when I stepped back and looked at what I'd created, I realized: I'd been defining jobs the whole time. It's my atomic unit. Everything falls back into it. ### From "where does it live" to "what does it do" The basic principle of Jobs to be Done is a change in perspective. Usually, we start from the asset - the product, the data model - and define everything from there. JTBD flips it. You start from the progress you want to achieve and work backwards. In data modeling terms: instead of asking "which layer does this transformation belong in?", you ask "what job does this transformation need to accomplish?" The difference sounds subtle, but it changes how you communicate about your data model. Take timestamps. In a layers framing, you might say: "We handle timestamp normalization in staging." Okay, but why staging? And what exactly happens there? In a jobs framing, you define a format alignment job. This job takes timestamps in whatever form they arrive - proper datetime formats, exotic string formats that need custom regex, Unix epochs, whatever - and normalizes them into one standard type that your database handles well. You apply this job early because you don't want to think about timestamp formats ever again downstream. When you show this to someone, they immediately understand. "Oh, we always align timestamps like this." There's no ambiguity about what's happening or why. Or take a heavier example: the advertising platforms I mentioned earlier. Instead of saying "we do mapping in staging" or debating whether mapping deserves its own layer, you define a data contract job. This job takes source data from any advertising platform and maps it into a standardized shape - these identifiers, this metadata, these measures. The job definition specifies what the output looks like, what validation happens, how edge cases get handled. The job doesn't care which folder it lives in. It cares about the progress it delivers. Here's a practical takeaway, even if you don't adopt this framing fully: take each layer in your current data model and try to describe what it actually does in six sentences. Not generic descriptions like "business logic happens here." Specific descriptions of what progress that layer delivers. If you can do that clearly, you've essentially defined the jobs - you just haven't called them that yet. ### Light jobs, heavy jobs, and chaining Not all jobs are the same size. The timestamp alignment job is light. It does one thing, does it consistently, and you're done. Same for other formatting primitives - standardizing boolean representations, normalizing currency codes, cleaning up string encodings. A data contract/mapping job is heavy. You're defining expected schemas, mapping multiple sources into a canonical shape, handling edge cases, validating outputs. It's a lot more work and a lot more complexity. And here's where it gets interesting: light jobs often become components inside heavy jobs. That timestamp alignment job? It's probably a primitive that lives inside your data contract job. You're not going to define timestamp handling separately for every source mapping - you use the primitive. Same with other formatting jobs. They become building blocks. This creates a natural hierarchy. You have primitive jobs that handle atomic transformations. You have combining jobs that orchestrate primitives into something more substantial. And you have high-level jobs that deliver complete, usable outputs. ![](/images/posts/data-model-layers-and-jobs/image-15.png) The challenge is that this doesn't map cleanly to how most data model infrastructure works today. In dbt, the closest analog to primitive jobs would be macros. And I've used macros this way - packaging up reusable transformation logic. But macros have problems. Jinja templating makes them hard to read. I've worked with macros where the original author understood them perfectly, but nobody else could parse what was happening. They're a weird construct that pushes against what the platform wants to do. So there's tension here. The jobs framing makes conceptual sense - you can see the hierarchy, you can explain it clearly - but the tooling doesn't natively support thinking this way. ### The challenges (order, scale, early days) I want to be honest about what's still unresolved. Layers give you something for free: order. Staging happens first, then intermediate, then marts. When you need to work on foundational stuff, you know to look in staging. When you're building outputs, you're in marts. The sequence is baked into the structure. Jobs don't give you that automatically. Yes, some jobs are naturally earlier - format alignment happens before you build entities. But the job type itself doesn't tell you where it sits in the sequence. You have to solve for order separately. There are options. You could use naming conventions - prefix early jobs with `1_` or `base_`, later jobs with `2_`. You could organize by job type and accept that order is implicit. In my current setup, I've landed on three high-level jobs - source mapping, entities, analytics - and the flow between them is clear enough that order takes care of itself. But I don't have a universal answer here. It's too early. Then there's scale. I mentioned light jobs and heavy jobs, but the spectrum is wide. A timestamp alignment job is trivial. A full attribution modeling job - handling multiple touch points, different attribution windows, various models for different teams - that's substantial. How do you organize a codebase where jobs range from five lines to five hundred? I'm still figuring that out. And the honest truth is: I've only been working with this framing for some months. I'm using it in my personal data stack. I've seen it work for certain use cases - source mapping emerged as a clear job type almost immediately, and I'd already been thinking in terms of entity models and analytical outputs. But it hasn't faced enough reality yet. Every model has to face reality at some point. That's when you learn what actually works and what looked good on paper but falls apart under pressure. I'm not there yet with this. What I can say is that the framing has been useful enough to share - and I'm curious whether others have been tinkering with similar ideas. ## My practical experiment About two months ago, when the dbt acquisition news hit and people started asking what a post-dbt world might look like, I decided to take the question seriously. Not because I think dbt is going away tomorrow, but because the question itself is interesting: if you started from scratch, what would you build differently? dbt is a transformation orchestrator. It's agnostic - it doesn't care what SQL you run. That flexibility is a feature, but it also means dbt never enforced a strong opinion about what a data model should look like. Which is how you end up with those 600 to 1,000 model setups that nobody can navigate. I wanted to try the opposite. Start with a strong opinion about what the output data model should be, and let everything else follow from that. The principle I started with: fewer transformations are better. Every transformation and materialization introduces risk. The best data quality you can have is when you run no transformations at all - you just use the source data as-is. Obviously that's not realistic, but the principle holds. The fewer transformations you apply, the less risk you introduce. So I built myself a stack. I removed dbt entirely and defined everything using Pydantic models. And what emerged was three high-level jobs. **Three high-level jobs: source mapping, entities, analytics** The first job is source mapping. This is the data contract work I described earlier. You have data coming from different places - HubSpot, Amplitude, your production database, advertising platforms - and you need to map it into a canonical shape. The source mapping job defines what that shape looks like and handles the translation from each source. The second job is entity definition. This is where I land on what the core business objects actually are. In my case, since I work mostly with marketing and product use cases, the dominant entity is the account. Accounts are the things that bring money to your company. Accounts are what you analyze when you look at revenue performance. Everything else orbits around that. I build my entity models to be very close to the business. And I consolidate aggressively. Where a traditional dimensional model might give you 30 or 40 tables, I aim for five or six strong entities. Denormalized, yes - but deliberately so. Each entity should be a solid, resilient representation of something the business actually cares about. ![](/images/posts/data-model-layers-and-jobs/image-18.png) The third job is analytical definition. This is where you take your entities and shape them for specific analytical use cases. If you're familiar with the Maxim's entity model approach ([https://preset.io/blog/introducing-entity-centric-data-modeling-for-analytics/](https://preset.io/blog/introducing-entity-centric-data-modeling-for-analytics/)), this is similar territory. You're deriving analytical datasets from your core entities - adding retention windows, building in the dimensions you need, preparing the data for the questions people actually ask. What I found interesting: when you work with Pydantic to define all this, you naturally start building primitives. Timestamp handling, field validation, common transformations - these become reusable components that the higher-level jobs consume. The hierarchy I described earlier - primitive jobs feeding into heavier jobs - emerged organically from the structure. The source mapping job wasn't there on day one. It emerged quickly once I started connecting multiple data sources. The analytical model concept I'd already been working with. The entity-centric approach has been my bias for years. So the three jobs aren't arbitrary - they're what fell out of trying to build something coherent. ### What's still open I've been using this approach for three or four months now. It's working for my personal data stack, and it's been useful when thinking through client problems. But that's not enough time to know what breaks. The source mapping job proved itself quickly - as soon as you have multiple sources feeding the same entity, you need it. The entity-centric approach I've believed in for years, so that wasn't new. The analytical layer was already how I thought about outputs. But the whole system together? It hasn't faced enough reality yet. Some specific things I'm still uncertain about: How does this scale with a team? I've been the only one working in this stack. What happens when three people need to collaborate? Do the job definitions stay clear or do they drift into the same flavor problem that layers have? Is three the right number of high-level jobs? It's what emerged for my use cases, but that doesn't mean it's universal. Maybe some domains need four. Maybe two would be enough for simpler setups. I am pretty close to Data Kitchen's FITT architecture, but how can I get better at testing as they propose it as a core requirement ([https://datakitchen.io/fitt-data-architecture/](https://datakitchen.io/fitt-data-architecture/)). How do you handle jobs that don't fit cleanly? I mentioned attribution as a business rules job - but attribution touches source mapping, entity enrichment, and analytical outputs. Where does it actually live? I have a first answer for my setup, but I'm not confident it generalizes. And honestly, is this actually better than well-defined layers, or just different? Here's what I'd suggest if any of this resonates: take your current data model and try to describe each layer in six sentences. Not generic descriptions. Specific statements about what progress that layer delivers, what jobs it accomplishes. If you can do that clearly, you've already done the hard work - whether you call them layers or jobs doesn't matter much. And if you've been tinkering with similar ideas - different ways to think about data model structure, alternatives to the layer paradigm, experiments with Pydantic or other approaches - I'd love to hear about it. This is early. The more people poking at the problem, the better.

2025 - Learnings & Thoughts

Fri, 02 Jan 2026 00:00:00 GMT

I am a bit late to this category of posts, mostly because I had no real idea how to approach it. Usually, I would write an industry look back and prediction, but honestly, everything is in such a state of flux that it did not feel right. For myself, I always do a year recap and next year planning, but this contains a lot of personal stuff. But the year felt transformational, so I decided to write a public personal version. I usually don't write that kind of stuff, so be nice to me, and maybe you'll find something interesting. ## AI/agentic transformation of work ### The capability leap Let me start with a concrete example. In December 2024, I created a game for my daughter to reveal her Christmas present. She had to play through an asteroid clone to see what she was getting. Fun idea, worked as intended. I built it using one of the early coding agents. Replit's automatic coding. And it was impressive. Half a year earlier, similar tools produced results that weren't really promising. This felt like a glimpse into what's possible. But let's be honest. It was buggy. Getting those bugs out would have taken a long time. The game was good enough for a one-time Christmas reveal, not much more. ![](/images/posts/2025-learnings-thoughts/image.png) Now I look at what I'm doing with coding agents in December 2025, and it's an entirely different world. The progress in this space has been so significant that it defines a new chapter for how I work. I know some people will flinch reading this. There's so much stupid stuff written about AI. So much ridiculous advice that you shouldn't follow. I get it. Take my perspective with a grain of salt as well. You have to find your own angle. It's not a super weapon. But the capabilities have increased substantially in 2025. And my adoption of these capabilities has increased with it. When I compare how I work now to January 2025, about 80% has changed. This was a transition year in which I began relearning a different way of working. I'm curious to see if 2026 brings more fundamental shifts or if it's mostly iteration on what happened this year. ### Everything is a tool (the fundamental reframe) This might sound obvious. But it was such a strong reinforcement that it changed how I approach things fundamentally. Everything is a tool. And tools come with specific requirements. The first requirement: you have to learn to use the tool. This was the essential thing with everything I did with AI this year. It was a complete relearning. Not incremental improvement. Learning a new tool from scratch. Understanding how to actually use it. This relearning took much longer than I would have predicted. I had to force myself into it. The way I did this was by taking something I can do pretty well, building a data stack with data infrastructure, something I've done enough times to know how it should look. And then forcing myself to implement it purely using agentic mode with Claude Code. No manual intervention allowed. No "let me check the code and adapt something." No, "this is actually not right, let me fix that." Everything had to go through the agent. ![](/images/posts/2025-learnings-thoughts/image-2.png) The first two implementations were painful. It took three times as long as doing it myself. Or even doing it in an AI copilot way. I rebuilt things so many times. It screwed up so many parts. At some point, it was really painful to watch. But that was the learning process. Along the way, I learned which kind of context I have to provide and in which way. How much I have to break down the things I ask for. What the process should look like when working on a specific kind of feature. You might remember the early criticism. People saying "I asked Claude to do XYZ and it totally sucked, so the whole thing sucks." Well, yes. You buy a new tool, you don't know how to use it, you try to drill a hole. Usually doesn't end well. Same here. The second part of the tool framing: not every tool works for every job. Not every tool is the best for a specific job. This sounds obvious too. But it matters when you're figuring out where AI fits and where it doesn't. ### The real productivity gain isn't "4x faster" When I figured out how to build a data stack mostly agentic, it didn't suddenly make me four times more productive. That's not what happened. What it gave me was the possibility to go broad. To question things I wouldn't have questioned before because of resource constraints. Here's the reality of classic consulting work. I have an agreed budget. The budget is based on my experience. So the project runs in a tight environment that doesn't allow me to say "let's see if I can do this model differently." This was always painful for me. I love iterations. Build something, look at it, play around with it, run it in production, see some flaws, and then say "actually, this doesn't do the job well enough, why don't we change this part completely?" Not possible in a classic data setup project. It would require telling clients they have to pay three times the price because they're not just getting my first implementation. They're also getting my iterations to make it better. Hard sell. "I know really well how to build it, but every setup needs some tweaks, so maybe you have to pay up." Not impossible to sell, but definitely harder. Now with agentic implementation, I can actually do this. Same budget as before. But I can offer the same setup with more iterations on my end. One project this year, I didn't have a blueprint. Had some ideas from similar projects, knew I wouldn't implement it the same way because I'd seen the flaws. I implemented my first idea. Then I iterated. Took two significant implementation shifts along the way. All within the budget I agreed on. Because I could go broad. I could question things. ![](/images/posts/2025-learnings-thoughts/image-3.png) I could question parts of my data model from a tool perspective. "Do we actually need this layer? Does this model approach actually do the job I want it to do?" Turns out, a lot of times, no. It doesn't do the job well enough. So I refine it. And it made me better at thinking about a job more intensively. What is the job of a staging model? Before, someone would ask "why do we have a staging model?" and I'd say "it's a common approach, best practice." Now I can say "let's run an experiment without it and see how it works." This is the real gain. Everything becomes a flexible tool. Question its role. See if it's still the right use case. Or don't use it at all. ### Building is great again I come from product. My first and true love is still product. Data is the second, but it could never match the feeling I have when I actually work on products. A lot of things drove me away from product work. That's why I ended up in data and analytics. But here's the interesting thing: coding agents are taking away some of the parts that pushed me out. In classic product setups, you have a very narrow window for solutions. Lean development, agile iterations, all that stuff is true in theory. In practice, it rarely happens. There's too much organizational pressure not to iterate. Investments in features have to pay off before you start working on them again. It's super hard to actually take time to iterate until you have something really good. And I knew I couldn't be a person who finds something that works well for users in just two or three iterations. Almost no one can. That always drove me crazy. Now with coding agents, this changes. I can go into very different directions. I can experiment with approaches, not just ship and move on. One thing I built this year, just for internal use so far: a different data stack. Not using dbt. Pydantic models under the hood. Everything is typed. It still has some layered data modeling, but it's more optional. We only do staging if we need staging. It looks more like function steps than "another transformation." The only thing that came fixed was my paradigm, what I wanted to achieve. Then I tested different ways to implement that paradigm. Eight weeks of playing around with different approaches. Huge fun. What I actually built: you define entities with activities in a Pydantic model. The model has primitives for how it maps source data to these entities. Then I created what Pete nicely called a compiler. I wouldn't call it that because it's a poor compiler. But it compiles the Pydantic models into Ibis code. The Ibis code can then work on any kind of source or destination system. Very flexible. Two things make this great. First, it gives me a much more resilient data model to work with. Super easy to test. Second, agents love types. Everything is typed, so agents work really well with it. This brought my personal joy back. I'm working on something I'll publish very soon. A product I always wanted to exist. Too niche for anyone else to build. But not too niche to be sustainable. So I'm building it myself. ![](/images/posts/2025-learnings-thoughts/image-4.png) Next to it, I'm building some internal tools. A Mac application that's basically a crossover of Claude Code and n8n. Right now it generates my blog images and YouTube thumbnails. Helps me save interesting stuff I find on the internet. Already becoming a useful workbench. The point is: I have fun building again. I now have the tools to build products how I always wanted to build them. I suck at building products together with other people. I can do compromises in other areas of my life, but when I build stuff, I'm very bad at making compromises. Now I have a setup that doesn't require me to. I talk to other people who are doing similar things. This isn't a single event. It's a massive event. I'm curious what comes out of this in the next years. ### The brain adapts (but needs different things) This one is more personal. Maybe I'm just seeing it wrong, some weird self-awareness issue. But I noticed something when I started experimenting with agentic workflows. I was working on four different projects at the same time. Not doing the manual implementation myself. Giving feedback, refining context, improving prompts, providing feedback for next steps. I got exhausted much faster. My energy to work ran out after an hour and a half. That felt really weird. Usually I can go for longer time spans. I can work deeper for longer. My brain wasn't used to this. I was doing stuff I wasn't used to doing. Maybe the reason is that I always avoided delegation work. I was never good at it. For someone in a classic manager role who delegates a lot, this type of work might feel natural. For me, it didn't. But it turned out to be an adjustment. After pushing through and keeping the same way of working, the brain started to adapt. Now I can more easily work this way. Still, my way of working is changing significantly. I have to take different breaks. I have to free up a lot more space to think about things. The setup of constant information inflow that I had before, social media, news, other stuff, lots of information coming in and still trying to find something new in it. That doesn't work for me anymore. People who made this change earlier would say "of course." For me, it needed this specific push to get there. I need more space where I just think about things. ![](/images/posts/2025-learnings-thoughts/image-5.png) My content approach is shifting too. Before, when I wrote a blog post or created a video, it was very hands-on. "Here's how you actually solve something." I'm still doing that. But now I can feel that thinking about a topic, distilling it down, writing it out or creating a video, presenting it to people, getting feedback, incorporating that feedback, doing another iteration. This has much more value for me now. I'm also starting to read more. Haven't done that for five or six years. I was an avid reader before, but I never felt the space for it. Couldn't slow down. Why is this possible now? AI can do the implementation work that I usually had to speed up to get done. I can pass it on. "You do the implementation, I'll check the results, we'll see if it works out." That gives me the space to step back and think more. I'm not a deep thinker. I'm more a variation thinker. I love going into different variations of things. This setup gives me the possibility to do that again. ## Creating things (book and courses) ### Writing and publishing a book I started my book in 2023. Wrote it throughout 2024. Finally published it in 2025. You can get the PDF [here](https://timodechau.com/book/) and I am now releasing it as a free web version module by module: [https://timodechau.com/books/analytics-workbook/](https://timodechau.com/books/analytics-workbook/) ![](/images/posts/2025-learnings-thoughts/image-6.png) One decision was essential to make it happen: pre-publishing. I made the book available when I had the first 100 pages. People could buy it early, get updates as I wrote more. This was essential to actually finish. Without that pressure, I would have never completed it. I would have started, reached a point where I thought "maybe this isn't the right thing to do," put it aside. Maybe pulled it out again later. But I'm not sure I would have actually brought it out. But I won't do it again. Two major downsides. First, awful user experience. People get a version they know isn't finished. They get updates after two weeks, maybe four weeks. I had planned a strict two-week rhythm. That turned out to be impossible. The time periods became irregular, which destroyed the experience. People make notes in the PDF. New version comes out, they can't transfer their notes. Others missed updates entirely. "You wrote about this?" "Yes, one of the latest updates." "Sorry, didn't see it." Second, it takes away the possibility to refine. I planned the book too big. The initial idea was too ambitious for one book. It ended up at almost 600 pages. One page text, one page illustration, so it doesn't feel like 600 pages. But still massive. It covers four different sections that each could have been their own book. The book became a fundamental book even though I didn't want to write a fundamental book. If I hadn't pre-published, I would have made different decisions along the way. Probably split it into four books. Theory, Design, implementation, governance. Would have worked better. But that's part of the process. I already have a clear idea for the next book and know how to do it differently. Self-publishing is still my model. Having "O'Reilly author" in your bio has lost some value. Lots of people have it now. It's not worth compromising on how I want to write. I build my products on my own. I'm not good at building together with others. The book sold over 300 times. Not massive, but solid. I got really nice feedback from people who learned essential things. Someone consumed the whole book through NotebookLM, just chatting with the content. That gave me the idea to create tracking plans with the book. Worked, but didn't cause a massive run on sales. One thing is clear: a book gives you more authority than any other format. I do YouTube videos, blog posts, LinkedIn posts, had a podcast. They all give you a different kind of authority. But a book gives you "he's the one for this" positioning. Nothing else does that. If you're thinking about writing a book, I highly recommend it. Don't overthink it. Find a very narrow angle so you can keep it short. Then go on the journey. ### Creating and selling courses This chapter isn't finished yet. Just some early learnings. I went into 2025 with the idea to create three or four courses. Take a narrow topic, create something sophisticated, make it available. Stack them up. The more courses, the more it compounds into a nice passive income. The idea is still valid. The problem was I got distracted. I talked to some people, discovered others were interested in collaborating, and switched modes. Started creating courses in collaborations without recognizing the significant overhead that comes with it. I learned there are two types of collaboration. The first type: you both have some expertise or audience, you bring them together. I did one of these. The product was good enough to publish. But I can definitely do better. With this type, the sum of the parts is just the sum. Nothing more. Like those books where different authors write their perspective on a topic. Useful collection, but no magic. The second type: you actually riff on each other's ideas. Extend them, bring them into different contexts. This can create something much better than the single parts. The collaboration I did with Barbara was like this. We expanded our knowledge because we did it together. The course we created was really good. What paused the whole course thing was marketing. My basic idea: create four or five assets, find a nice way to market them, generate around 100 sales a month. The more courses, the more it stacks up. Turns out marketing for courses is much harder than I thought. ![](/images/posts/2025-learnings-thoughts/image-7.png) I did a course about it. The main insight was "if you have a massive audience, you can definitely sell courses." No shit. Obviously, I have a good audience, but it's a niche audience. That advice doesn't help. I'm still experimenting. Currently reading "Simple Marketing for Smart People." What they describe matches my struggles pretty well. I'll try their approaches in 2026. One good thing: because I've created a lot of this type of content now, I have an efficient production workflow. Especially when I do it on my own. I still have one collaboration course I want to bring out in 2026. I know that collaboration will work well, like the one with Barbara. And we already have a better angle for it. More to figure out. But I still believe there's real value in courses. I buy them myself when they fit a topic I'm interested in. Marketing is just not easy. Even with a significant audience. ## Working with others ### Collaborations - finding what works Maybe this doesn't apply to a lot of people, but it's essential for me. I work alone. That's my default. One of the reasons I became self-employed was to test if I could do stuff on my own. It's not that I don't like working with others. But I definitely don't thrive in heavy team setups. I tried different collaboration models throughout the years. Most didn't work out. The major mistake I always made: I wasn't good at picking the right people to work with. Not that these people were bad to work with. Our abilities just didn't match well. Same pattern when I had to hire in company roles. I could see I wasn't good at it. So one thing I set for myself in 2025: be deliberate. Be careful. No jumping into things without being sure how they work out. This approach worked a lot better. I mostly did collaborations that were time-based. Always clear that we could say "this works" or "this doesn't work." I could adjust when things weren't going well. One collaboration that worked really well was the one with [Barbara](https://www.linkedin.com/in/barbara-galiza/). It started in 2025 and built up slowly. We did a public course together. Did live workshops together. We have a weekly check-in where we discuss things we're working on. Often, these calls turn into general stuff where we exchange ideas about projects we're both struggling with. You could see it adding up. Slowly but consistently. More and more layers. ![](/images/posts/2025-learnings-thoughts/image-8.png) At some point we said "maybe we want to do one or two projects together." Started doing them. Realized it actually works really well. We're very complementary in how we do things, but still have a lot of overlap in our views. We'll do more in 2026. I'm looking forward to it. You can see some of it already here: [https://fixmytracking.com/](https://fixmytracking.com/) - more of this to come. After struggling to find the right model for years, here's my takeaway: working together with people can take time. That's okay. Don't rush into collaborations saying "oh my god, such huge potential." Even when it looks like huge potential from the outside. The real potential shows when you actually work together. Let it develop. Also need to mention four other collabs here: With [Ergest](https://www.linkedin.com/in/ergestx/), we have a three-week cadence for our check-in calls. These calls always leave me with new ideas, refinements, and just happy. [Juliana](https://www.linkedin.com/in/juliana-jackson/) and I had big plans for 2025 - building a community together. But reality got the better of us, which is fine. But I enjoy every minute we find time to chat about things. And we still have our course, which we will make happen in 2026. And I am happy that [Pete](https://www.linkedin.com/in/petefein/) and I found more time together by the end of the year. These calls are super deep and extremely nerdy. But similar fuel for new ideas (like building a product). And finally, every exchange with [Robert](https://www.linkedin.com/in/robertsahlin/) (I know that you are not such a call person) is a great exchange about anything, like agentic coding and data platforms. And beyond that are the calls and chats with other people I had in 2025 - you all know who you are - thanks for these and your feedback and ideas. ### Having a coach This started in 2024 but really expanded in 2025. Working with Stefan: [https://revolutioncoaching.de/home/en](https://revolutioncoaching.de/home/en) One thought I have about this: I should have done this very early in my career. And then continuously. Always have one or two coaches. I've been working with Stefan for over a year now. It helped me significantly to work on some fundamentals. Everything positive I describe here, getting better focus, getting clearer on priorities, knowing where I want to go. I think this was only possible because of the work we invested before. The interesting part is how differently it turned out from what I expected. When I started looking for a coach/mentor at the end of 2024, I was looking for business advice. I thought my major weakness was my entrepreneurial mindset. I'm not the kind of person who is really good at going after the money. Making a lot of money. So I thought I needed someone to help with that. It turned out completely differently. We started the mentorship and almost immediately went in a different direction. We worked a lot more on fundamentals. How do I see myself? What kind of goals am I actually setting? In the end, I learned that I'm actually not that bad at the entrepreneurial stuff. I just had a very unusual way of doing things. The mentorship helped me look better at things I had already achieved. And then from there, find the next step. If I could recommend this to my former self: start two or three years into working. Always have someone to check in with about where you're going. I had informal versions of this before. Managers or other people within companies who acted like that to some degree. But it was never formalized. And especially when I became self-employed, that part was definitely missing. I'm happy I added it. I'll definitely keep it. I have some people I can check in with on different topics now. Getting different perspectives. Having someone help you take a step back. ## Personal operating system Someone like me will never become a good focuser. It's not my personality. 2024 and 2025 were years of exploration. An idea came, I went deep on part of it, a new idea came across, I added it. Testing a lot of directions. I have a stupid constraint: I have to actually do things to know if they work. I can't just look at something and predict how it plays out. I would love to be one of those people who can see something, visualize it, and already get a good idea of how it will develop. That doesn't work for me. I actually have to do the thing. So I tested a subscription model for specific content. Learned what works well: the connection to people who subscribe. Learned what doesn't work well: the pressure to continuously deliver value. I expected it to be challenging. I needed to see how challenging. And if I could work around it or not. The problem in 2025 was testing too many of these things. Creating too many isolated islands that don't play together. You can test as long as everything moves in the same direction. But when I look at my content strategy, there's no strategy. I pushed out so many different types of content with no clear structure of where they should end up. I never really found that structure. In the second half of 2025, I invested a lot more time in sessions with myself to think. AI helps here. It's a good soundboard. You can throw things against it, see different angles. I created a specific setup that knows about my weaknesses and strengths. Now I have the idea of what I want to achieve in the next few years. I'm getting better at breaking it down. Every new idea, I can run it against the path I have in mind. Does it actually fit? Or does it create too much overhead? Does it move me away from what I actually want to achieve? That was really helpful. For 2026, I reduced it to three monetary focus areas. I would prefer two. That's something I'll work on. Maybe I can get rid of one. Everything else feeds into these three outcomes: sign up for the product I'm building, hire me and Barbara to build a better marketing data stack, or hire me for advisory work with your data team or product team to get a better product analytics setup. At least I have clear outcomes now. I'll check along the year if these three should still be three. Two would be fine for me. I might end up with one. But one would mean not hedging bets. Giving up some freedom. I don't know if I want that. Good to get clarity on this. If you read until here, thanks for staying that long. I usually don't write personal posts, but from time to time, it helps to sort things. And maybe 1-2 things are helpful for you as well. Let me know (via email). Have a great 2026.

Design-driven Analytics

Tue, 09 Dec 2025 00:00:00 GMT

## The symptoms aren't the problem When I talk to people about their data challenges, we usually get to the pain points pretty quickly. "We have a real issue with data quality. It's just not good enough, and we're not making progress." Then there's the trust problem. "People don't trust our data." Sometimes this comes right after they've fixed the quality issues - which makes it even more frustrating. You fix the thing, and people still don't believe in it. And then you get to literacy. "I think our real problem is that the company doesn't know how to work with data. We have all this stuff, but nobody actually uses it properly." There's a whole collection of these. Data quality. Data trust. Data literacy. Data adoption. Each one gets positioned as _the_ reason why data isn't working. My answer to all of this is usually the same: I think you have a design problem. * * * Short break: I am running my final free workshop this year about building a roadmap for a better product and growth analytics roadmap. How can you turn your current setup into something that shows you with few metrics if your product is converting new users into regular and happy users: [Build Your 2026 Growth Intelligence Roadmap (Free Planning Workshop) · Zoom · Luma](https://luma.com/prux6etc) — Design your implementation plan before budgets lock in January. December is planning season. While others plan more dashboards nobody uses, you could plan… * * * What I mean is that the data setup itself has a root issue. It was designed with too many flaws from the start. The quality issues, the trust issues, the adoption issues - these are symptoms. They're what you feel. But they're not the cause. The cause is that someone built the whole thing pointing in the wrong direction. Most data setups start with a very generic goal. It usually sounds something like this: "We have so much data from all these different systems. We should use it to create better products. Or improve our marketing. Or make our operations more efficient." There are two buckets here. On one side, there's all the data you have or could have - the potential. On the other side, there's all the things you do that could be done better - the opportunity. And there are stories that connect these two worlds. Factory optimization after the war. Tech companies that seem to have figured it out. Some of these stories are true. There are definitely companies that found patterns in their data and used them to become more effective, to focus their resources better. But here's the thing: the bridge between these two worlds is really hard to build. And most data projects don't even try to build it. They just build the left side. "Let's collect everything. Revenue data - pipe it in. Ad platform data - pipe it in. Web tracking - well, let's just track all pages, all button clicks, everything. Then we bring it together and figure out how to support the business." It doesn't work. Because it starts on the wrong side of things. The logic feels reasonable. You don't know exactly what you'll need, so you collect broadly. You build the infrastructure first. Then you connect it to the business later. But "later" has a way of never arriving. Or when it does, you realize the data you collected doesn't actually map to the decisions you need to make. You have pageviews, but you can't tell which users are getting value from the product. You have button clicks, but you can't connect them to revenue. You have data from five systems, but the identifiers don't match up. The bridge between data and operations was never designed. It was just assumed it would appear once enough data was in place. ![](/images/posts/design-driven-analytics/image-1.png) It doesn't. ## Two ways to fail So if starting from data doesn't work, what does? Before I get to that, I want to show you both extremes. Because it's not just the data-first approach that fails. The opposite fails too. Customer data platforms are a perfect example of the data-first trap. They had their peak maybe five to seven years ago, when they were one of the big new things. The promise was powerful but vague - finally, we can do something magic with all our customer data. What that magic actually meant was never quite clear. But the market is still around. I still do CDP projects. The project always starts the same way. You identify all the places where customer data lives. One CRM. Maybe two. Actually, three. Then there's the email marketing tool - customer data there too. Web tracking, billing system, support tickets. It's everywhere. So you start bringing it together. And then you discover all the nasty bits. The data doesn't get along. Different systems, different structures. Identifiers are a mess - same customer shows up five different ways. You need an identity graph now. That's a project in itself. Data quality varies wildly across sources. One customer has twenty address entries, all slightly different. Which one do you pick? That needs rules. The rules need exceptions. The exceptions need documentation. You can keep a small team busy for a year with this. Easily. Always improving, enhancing, adding another source, fixing another edge case. There's always one more thing to clean up before it's ready. And the work feels important. It is important, in a way. You're building foundations. You're creating a single source of truth. These are real things that matter. But here's what's missing: none of this has been connected to any actual business process yet. Then finally, after all that, you connect it to something. You use it to send newsletters. ![](/images/posts/design-driven-analytics/image-2.png) The thing is - you were already sending newsletters before. The customer data platform didn't unlock that. It just made it more expensive. I don't think you need to be deep into math to see that spending a year of engineering time to send newsletters is not a great return on investment. The platform was built without operations in mind. The design started on the wrong side. Now let's look at the opposite extreme. When you work with young startups - especially ones with a very active, very smart growth team - you see a completely different approach. These teams are hardcore operational. They run multiple initiatives at the same time. In a six-week period, they might test three or four different ways to get more users into the product, improve stickiness, maybe add referral mechanics. Two or three people running two or three bigger operations each. Content initiatives, podcasts, paid campaigns, partnership experiments. High energy, high speed. And each initiative builds its own data setup. How much data they collect depends on the initiative and the person running it. Some do more, some do less. Some are comfortable building out tracking and dashboards, others less so. But in the end, every operation has its own little data world that supports it. Here's the thing: this actually works. Often it works really well. I've seen these setups deliver serious results. Fast feedback loops, quick iteration, real growth. So what kills it? At some point, the initial initiatives start to hit diminishing returns. The obvious wins are captured. And someone says: maybe we should align things. Bring it all together. Because right now, we're probably wasting time and money running these things in parallel. If we consolidate, we might see which initiatives actually matter. We might find patterns across them. We could be more effective. So they start to build a unified data setup. And that's when they realize they have twenty different data structures. Built by different people, for different purposes, with different assumptions. Identifiers don't match. Definitions don't match. Nothing was designed to connect. ![](/images/posts/design-driven-analytics/image-3.png) Now you're on the opposite side of the problem. You have highly energized operations, but no foundations underneath. You're trying to build the bridge while the trains are already running. Sometimes it works. Often it doesn't. Either way, it costs enormous energy and overhead. You're paying for the missing design work - just paying later, with interest. Both approaches fail for the same reason. Data-first gives you foundations without operations. You build infrastructure that never connects to the business. Operations-first gives you operations without foundations. You build momentum that can't be consolidated. Neither one designs the connection from the start. The bridge between data and operations is just assumed to appear at some point. It doesn't. ## The middle path Now here's where I have to be honest. What I'm about to present is not some mind-blowing simple model that magically brings these two extremes together. I don't have all the answers yet. There is still much I need to work through. But I do think there's something in the middle. And I think we should spend more time there instead of defaulting to one of the extremes. The middle path is a combination of both approaches. Not one, then the other. Both at the same time, from the start. Every initiative you run needs two things: First, a clear operational focus. You have to identify a specific process within the company - something where improvement would have a significant impact. Not "better marketing" but a specific channel. Not "improve the product" but a specific area, a specific user journey, a jobs to be done. Second, a fundamental architecture that can scale. You need a data model that doesn't collapse under its own weight as you add more use cases. Something that keeps complexity at bay while the scope grows. Most projects only do one. Build the infrastructure first, connect to operations later. Or run the operations first, worry about foundations later. Design-driven analytics means you do both in parallel. The operational use case shapes the data model. The data model is designed to support more than just this one use case. ![](/images/posts/design-driven-analytics/image-4.png) The operational side means picking a candidate. A specific part of the business that you want to support with data. This could be a specific marketing channel. A specific area within the product. A particular stage in the customer journey. An onboarding flow. A retention mechanism. A sales handoff process. My personal favorite is user activation - it's still the biggest gap I see where companies are missing opportunities. Most teams track signups and they track active users, but the space in between is a black box. What happens between "created account" and "getting real value"? That's where you lose people. And that's where supporting the operation with good data can have an outsized impact. The point is: it has to be specific. And it has to be something where improvement would actually move the needle. Not "better marketing" but a specific channel. Not "improve the product" but a specific user journey you can actually trace and measure. This is already a challenge. How do you identify these high-impact operations? I have some ideas that have worked for me. Look for operations where the team is already motivated but flying blind. Look for places where people are making decisions on gut feel because the data isn't there. Look for processes that touch revenue directly - acquisition, conversion, expansion, retention. Look for the thing that, if it improved by 20%, would change the trajectory of the company. But I don't have a formula. It requires talking to people, understanding where the leverage is, getting a feel for what's blocked and what's possible. Every company is different. ![](/images/posts/design-driven-analytics/image-5.png) What I can say is: without this step, you're back to building data infrastructure for its own sake. The operational focus is what keeps the whole thing honest. The architecture side is where things get tricky. Once you've picked an operational use case, you need to think about the data model that supports it. What data sources do you need? What entities are involved? How do they connect? Let's say you want to improve account activation. You'll have accounts as an entity. You'll have features or actions that indicate activation. Maybe you'll have a concept of milestones or success moments. These are your building blocks. Now here's the critical question: if you build this model just for activation, and then next month you want to support another use case - say, retention analysis or expansion signals - what happens? Do you add one or two entities to extend the model? Or do you end up doubling everything because the first model was too narrow? You're aiming for the first option. A model that grows by adding small pieces, not by multiplying them. This requires finding the right level of abstraction. Too abstract, and the model becomes generic to the point of uselessness. You end up with one entity called "thing" that requires twenty configuration options to make sense. Nobody can work with that. Too specific, and you end up with 200 entities that only apply to one use case each. Nobody can navigate that either. The middle ground is hard to find. Here's what helps me: Before you build, spend one or two extra days on design. Sketch it out on a whiteboard. Then do the zoom in/zoom out exercise. Take your current model and ask: what would this look like if I zoomed out two levels? What if I zoomed in two levels? Which version would make more sense as the foundation? This is where AI is genuinely useful. You can move through these brainstorming cycles much faster. "Show me this model more abstract." "Now more specific." "What if we merged these two entities?" It speeds up the thinking. ![](/images/posts/design-driven-analytics/image-6.png) The test I use: if a new person joined the team, could they understand how this data setup works within a week? If yes, you're in a reasonable place. If it would take them three months just to find where all the business rules are scattered - that's a sign the architecture has grown out of control. I've spent the last six months going deeper on this. I've made progress, but I'm not ready to publish the full approach yet. The principles are clear. The implementation details are still being refined. So what's the takeaway? Design-driven analytics means doing two things in parallel. You identify a high-impact operation that's worth supporting with data. And you design a data architecture that can scale beyond just that one use case. Neither side drives alone. The operational use case keeps you honest - you're building something that connects to the business. The architecture keeps you sane - you're building something that won't collapse when the next use case arrives. This takes two things. Experience helps - the more setups you've seen, the faster you recognize patterns. But more importantly, it takes deliberate time. Time on the whiteboard. Time sketching different models. Time asking "how can we make this simpler?" over and over again. That's my process. It's not a framework you can download. It's a way of thinking about the problem. Stop starting from data. Stop hoping the bridge will appear. Design it.

Basic analytics skills in the dawn of agentic whatever

Mon, 10 Nov 2025 00:00:00 GMT

On my YouTube channel, the video "[How I would start as a Data Consultant - if I could press Restart](https://youtu.be/heirGGdGsOg)" is the most watched video with over 7,000 views (which is a lot for my channel). I have published it in October 2022. So 3 years ago. It does not mention AI in any way, because it did not matter then. But it does now. Time for an update. Even when it is a bit more meta than the first one. A lot of people trying to predict how AI will change data work. Will analysts disappear? Will data engineers become obsolete? I don't have those answers. Nobody does. * * * Quick break. Before we start, I am running a free workshop in one week: **From Product Analytics to Growth Intelligence: The Metrics That Actually Explain Growth** We still have 15 seats open. [Claim yours here](https://dsd.re/work-nov25). There will be a recording, so no need to make it to the live session. * * * But here's what I've learned after 12 years in this field: you don't need to predict the future to prepare for it. You just need to invest time in skills that compound regardless of which scenario plays out. This is what this post is about. Let me be direct about something. The real threat isn't AI replacing data analysts, engineers. The real threat is simpler: you can't articulate your value beyond execution. We've all fear that moment. You imagine walking into a meeting where leadership announces the data team is getting "rightsized" - five or six people cut because the company is "investing in automation." You could think to yourself, "Yeah, good luck with your data quality after that." But that won't save you. Management makes decisions based on incomplete information all the time. The uncomfortable truth: if you can't explain why your work matters beyond building pipelines and writing queries, someone else decides your value for you (I know, the stupid v-word again). So what actually makes you hard to replace with AI whatever dangling over our futures? I'm going to walk through two areas. Not predictions. Not guarantees. Just two investments where your time compounds no matter how the tools evolve. The first one has protected me through constant technology changes and the second made my data life already much more interesting. ## Investment #1: Strategic Thinking & People Skills (The "Why" Skills) * * * People who can talk to people will always have more opportunities. Sorry. If you're reading this, you probably got into data work because you like working with data more than people. You want to dig into datasets. Build pipelines. Solve analytical problems without sitting through meetings. That's fine. That's why most of us got into this field. But here's the problem: AI is getting good at the solo execution work. Writing queries, building pipelines, creating transformations. The work you can do alone is exactly the work that's easiest to automate. What AI can't do - at least not yet and maybe never - is understand what people actually need. It can't navigate organizational politics. It can't take a vague request and trace it back through five layers of misinterpretation to figure out what problem someone is really trying to solve. ![](/images/posts/basic-analytics-skills-in-the-dawn-of-agentic-whatever/image.png) That requires talking to people. I'm not saying you need to become a full-time stakeholder manager who spends all day in meetings. You can still have periods where you focus on building. Where you work alone, dive deep into problems, do the technical work you enjoy. But you need to be able to come out of that room when it matters. You need to have the conversations that gather context. Ask the questions that uncover what people actually need. Understand the business problem behind the data request. Be curious to listen. This is a capability, not a personality change. You don't need to love small talk or be naturally social. You just need to be willing to engage when the work requires it. And the work requires it more now than it used to. Let me show you what this looks like in practice. Someone from marketing comes to you and says: "We need a campaign dashboard for this new initiative we're running. Just show us all the campaign performance metrics so the team can analyze what's working." **Path A - The Tactical Approach** You build what they asked for. Maybe you've done this before, so you pull up your blueprints from the last campaign dashboard. You implement it, ship it, move on to the next task. This is fine. It's competent work. But it's also exactly the kind of work that AI is getting very good at. **Path B - The Strategic Approach** You ask: "Can you tell me more about this campaign? What's the goal?" The marketing person gives you context: "We're trying to reach a new audience segment. We want to see if we can get this specific group of people interested in our product." You keep going: "What would success look like for this campaign? When would you consider it very successful?" Now you're in a dialogue. You're adding context to the problem. You're understanding it more deeply. ![](/images/posts/basic-analytics-skills-in-the-dawn-of-agentic-whatever/image-1.png) If this is a significant initiative, you go even broader: "Are there other people working on this campaign I should talk to? I want to make sure I understand what kinds of insights everyone needs." Over two or three conversations, you collect significantly more context than what was in the original request. _Context is the key asset here, we get to it later again._ **The Difference** In Path A, you're executing. In Path B, you're planning. With all this context, you might design something quite different from a standard campaign dashboard. Maybe you discover they actually need to track audience segment behavior over time. Maybe you learn there's a specific conversion event they're optimizing for that wasn't mentioned in the initial request. Maybe you find out three other teams are working on related initiatives and you can build something that serves all of them. You haven't just built a better dashboard. You've positioned yourself as someone who understands the business problem, not just the technical requirement. And that's much harder to automate. ### The "Why" Practice (Going Deeper) This isn't just about talking to stakeholders. It's about questioning your own work. Practice asking yourself: "Why am I building it this way?" You're designing a data warehouse in a specific way. Maybe based on experience. Maybe copying patterns you've used before. Stop and challenge yourself: Why this approach? What assumptions am I making? What trade-offs am I accepting? Take an hour to think through the decisions. Use an AI model as a thinking partner. Show it your design and ask it to challenge your choices. ![](/images/posts/basic-analytics-skills-in-the-dawn-of-agentic-whatever/image-2.png) Say you're setting up event tracking in a particular format. Ask yourself: Why am I structuring events this way? Is this genuinely the best approach for how this company works, or is this just the pattern I learned three years ago? What would change if I designed this from scratch knowing what I know about this organization? Sometimes you'll come out reinforced - yes, it's a solid pattern for this context. But most of the time, you'll find areas where you're running on autopilot, applying patterns without understanding whether they fit. This makes you stronger at designing systems. Your decisions become intentional rather than habitual. When someone asks you why you did it that way, you have an answer, a good and solid one. Once you're in the habit of asking "why" about your own decisions, extend it to the work itself. You're building a pipeline for some new tracking requirement. If you trace it back far enough, you realize you're at the end of a chain of interpretations. Someone told someone who told someone who told you this needs to be built. By the time it reaches you, it's been through five or six layers. Go back to the source. Talk to the person who originally needed this. Understand their actual motivation. Most of the time, the work is justified. But what you usually discover is that the original need is slightly different from what got translated down the chain. When you understand the real motivation, you can find a better approach. Something that actually solves the problem they're facing rather than the request that emerged from the telephone game. Sure, there are the five whys - but these are simple to say. It needs practice to become good at it. ### The Compound Effect Here's what happens when you consistently practice this kind of strategic thinking and questioning: over time, you become a different kind of data professional. You become the person who actually understands how the company works. You understand the processes that drive the business - not the documented ones in some wiki, but how things actually happen day-to-day. You know which teams depend on which data, who makes what decisions, where the bottlenecks are. You can map where data has the highest impact versus where it has low impact but high potential. You're not guessing about what to prioritize - you actually know because you've had the conversations and traced the work back to its source. You understand the growth model. You know what levers actually move the business. When someone asks for a new metric or dashboard, you can quickly assess whether this is touching something that matters or if it's noise. This is the compound effect of always asking "why" and always gathering context. Your solutions start to look different. They're not generic blueprints copied from best practices blogs or your last company. They fit this specific organization - how it operates, what it cares about, where it's trying to go. You're no longer just implementing requirements. You're designing for the business. And this is exactly the kind of value that's very hard for AI to replicate - because it requires institutional knowledge, relationships, and judgment that accumulates over time through actual human interaction. That's the first investment: strategic thinking and people skills. Understanding context. Asking why. Positioning yourself as someone who designs solutions, not just executes tasks. But there's a second investment. While you're building these capabilities, the execution of data work is changing fast. If you ignore that change, you'll spend all your time on tasks that could be automated. Which makes it harder to do the strategic work that actually protects you. The second investment is learning to work with AI as a fundamentally new skill. ## Investment #2: Agentic Work as a New Skill I need to set expectations correctly. This is not about becoming 10x faster. You're not going to read this and suddenly ship projects in half the time. Working with AI to build data systems is a new skill. Not a productivity hack. A genuinely new way of working that you have to learn from scratch. The hype content will tell you AI makes you ridiculously faster. Technically, once you're good at it, maybe there are efficiency gains. But that's not the point, and it's not what you should optimize for right now. ![](/images/posts/basic-analytics-skills-in-the-dawn-of-agentic-whatever/image-3.png) When you're learning this skill, you will be slower than doing it the old way. Significantly slower at first. You'll get frustrated. You'll be tempted to just write the SQL yourself because it would take five minutes instead of an hour of fighting with context and prompts. That's normal. That's what learning a new skill feels like. What changes isn't primarily speed. It's what you build. The shape of your work. The way you approach problems. If you go into this expecting to immediately save time, you'll quit after two frustrating experiments and decide "AI isn't ready yet." If you approach it as learning a new way to work - with all the awkwardness and time investment that requires - you'll develop a valuable capability. Here's how to build this skill properly. ### The practice method: "Tie your hand behind your back" Pick a small, local data project. Something real but low-stakes. Not production work - you need room to experiment and fail. Now the critical rule: you're not allowed to write any Python code. You're not allowed to write any SQL. Zero code from you. ![](/images/posts/basic-analytics-skills-in-the-dawn-of-agentic-whatever/image-4.png) Not to build the pipeline. Not to fix a bug. Not to "just quickly adjust this one thing." You can look at the outputs. You can analyze whether the results make sense. You can review what was generated. But you cannot touch the code itself. This feels ridiculous at first. You'll want to break this rule constantly. You'll see something that would take you 30 seconds to fix and spend an hour trying to get the AI to do it correctly. Do it anyway. What you're learning isn't how to build pipelines faster. You're learning how to provide context. How to break down work. How to communicate requirements in a way that produces results. If you let yourself "just fix this one thing," you're not learning that skill. You're just using AI as a slightly frustrating autocomplete. The constraint forces you to work entirely through context provision. That's the skill that matters. ### The reality of starting out When you first try this, it doesn't work very well. The output is okay. Not terrible. But far from what you would have created if you'd just written the code yourself. You're working with someone who has technical skills but is completely missing the big picture. They can write code, but they don't understand the connecting pieces. They don't know your conventions. They don't have context about why you're building this or how it fits into the larger system. You have to figure out how to give them that context. You'll sit there thinking "I could have built this in 20 minutes, and instead I've spent two hours writing prompts and I still don't have what I need." That frustration is the learning process. You're discovering what context is missing. What assumptions you normally make that aren't being communicated. How much of your work relies on tacit knowledge you've never had to articulate. If your first few attempts work perfectly, you're probably working on tasks that are too simple. The struggle is where the learning happens. ### What you have to learn (Breaking down tasks) The biggest thing I had to learn - and what made the difference between frustrating failures and actually useful outputs - was breaking down tasks into much smaller chunks and revisit my understanding of structure of tasks. When I started, I was far too general. I'd say something like "build me a data pipeline for this source" or "create a staging layer for these tables." That doesn't work. The AI has skills, but it needs bounded problems. Clear scope. Specific expectations. Let me give you a concrete example of what a well-broken-down task looks like. ![](/images/posts/basic-analytics-skills-in-the-dawn-of-agentic-whatever/image-5.png) **The Staging Table Example** Let's say you want to create a staging model. You already have a source table loaded from somewhere - an API, a database, whatever. Now you need to create the staging layer that brings this data into your data warehouse. A staging table does very specific, limited things: - Takes the source data and aligns it to your naming conventions - Casts fields to the right data types (timestamps look like timestamps, strings are strings, etc.) - Applies basic, non-business-logic transformations - Prepares everything so downstream models have clean, consistent inputs This is a straightforward, well-defined concept. It's bounded - you're not asking for the entire data model, just this one specific layer for one model. When you break it down to this level, you can write much better context. You can say: "Here's the source table. Here are our naming conventions. Here's how we handle timestamps. Create a staging model that follows these patterns." That's specific enough to work with. **The Effect of Task Breakdown** Once I started breaking work into these smaller, bounded chunks, everything got better. I could write clearer initial prompts and better context because I knew exactly what I was asking for. I could develop better testing and validation approaches - because I could define clear success criteria for each small piece. I could iterate faster - if something wasn't right, I knew exactly which bounded task to refine rather than trying to debug an entire system. This is the skill. Not prompting. Not "talking to AI." Breaking down your work into well-defined, bounded tasks that can be tackled systematically. ### What actually changes I cannot tell you I'm now 6x or 7x faster at building data systems. That's not how this works. When I look at time spent on projects now versus before, there might be a small shift. Nothing dramatic. Nothing that would make you think "AI just 10x'd my productivity." What changed is not primarily the speed. It's what I build. When I build an analytics layer now, it looks different from what I would have created manually. Documentation is often included by default. I go broader in some areas - exploring edge cases I would have skipped. The structure might be different because I'm thinking about the problem differently when I have to articulate it clearly enough for AI to execute. It's hard to compare the "before" and "after" versions because the approach is different. If I tried to time myself building a core layer manually three years ago versus using an agentic approach now, the comparison would be meaningless. Different scope. Different outputs. Different level of documentation. Different considerations. The work I produce now has a different character. More comprehensive in some ways. More systematically documented. Sometimes more robust because I'm forced to think through edge cases when writing context rather than just handling them intuitively. But also: requires more upfront thinking. More explicit planning. More articulation of things I used to just "know." Is that better? Depends on the project. Depends on what you're optimizing for. You're not learning this to save time on the exact same work you're doing now. You're learning it because it's a different way to work, and that difference will matter as the tools evolve. But again, we don't know how this will look like in just 6-12 months. But with these two methods, you can try to stay on top of things. If you like let me know how you approach this at the moment, by replying to this email.

Pragmatic Orthodoxy - Data Signals #1 - 03.11.25

Mon, 03 Nov 2025 00:00:00 GMT

Hello, welcome to a new bi-weekly format of this newsletter. I am thinking about it a while already, how I can share interesting content I come across. It shouldn't be just a next, here is a list of great articles I read, type of post. These are great, but there are already plenty of them. So, I developed this signal format. The basic idea is that I highlight interesting ideas and thoughts I come across when reading a post. Then spend some time with an AI model to riff and brainstorm over these thoughts and then compile a new edition of this post featuring a specific thought or idea that you can simple read or dive deeper into it by reading the referenced source content. At the end I will describe the whole workflow, so you get an idea how this format is created. * * * Before we start, I am running a free workshop in two weeks: **From Product Analytics to Growth Intelligence: The Metrics That Actually Explain Growth** We still have 25 seats open. [Claim yours here](https://dsd.re/work-nov25). * * * ## Signals #1: When the Old Orthodoxies Crumble I like to read any article about data architecture/model—stuff from people building actual systems, not just theorizing—and something keeps showing up. The orthodoxies we all learned in the 2000s and 2010s? They're being quietly replaced. Not with chaos or some flashy new paradigm, but with something more pragmatic. The pattern I'm seeing: start simple, add complexity only when reality demands it. And the interesting part is how this same idea shows up across completely different domains. Let me walk you through what I mean. ### Signal 1: Storage is Cheap, Your Time is Not **Source:** Zach Wilson & Sahar Massachi - "Stop Using Slowly-Changing Dimensions!" **Link:** [https://blog.dataexpert.io/p/the-data-warehouse-setup-no-one-taught](https://blog.dataexpert.io/p/the-data-warehouse-setup-no-one-taught) Look, we're still teaching Slowly-Changing Dimension Type 2 patterns like it's 2008. Zach and Sahar call this out directly. These patterns were designed for an era when storage was expensive, and we had to be clever about how we tracked changes over time. I am guilty of this as well. It's baked in muscle memory. Their alternative? Just date-stamp your data snapshots. Embrace intentional duplication. Boom. The insight that landed for me: "Storage is cheap. Your time is not." Here's what they propose: a three-tier architecture (raw → silver → gold) (tbf - I don't like any of these medallion references, mine are raw, core, analytics - but we all like different names) where everything starts virtual. Just views. You only materialize tables when performance data actually proves a query path is hot. No premature optimization. No complex versioning schemes until you have evidence you need them. **One takeaway:** The bottleneck shifted from storage costs to engineering team bandwidth. We should be optimizing for human time, not disk space. Takeaway for me - switching a current local project to data snapshots and see how it goes and feels like. Now here's where it gets interesting. This "virtual first, physicalize when proven" thinking? It shows up in an unexpected place—data modeling patterns that span Kimball, Data Vault, and even graph databases. ### Signal 2: One Model, Many Projections **Source:** Robert Anderson - "One Model, Many Projections: Toggling Stars, Graph, and Data Vault" **Link:** [https://medium.com/@rdo.anderson/one-model-many-projections-toggling-stars-graph-and-data-vault-9d9d937e0986](https://medium.com/@rdo.anderson/one-model-many-projections-toggling-stars-graph-and-data-vault-9d9d937e0986) Robert's HOOK pattern does something clever. Define identity, time, and proof once in your semantic layer. Then render them as whatever shape you need: Kimball star schema for BI, property graphs for lineage, Data Vault for audit trails. All of it starts as virtual views. The key insight for me: these aren't competing models that require separate implementations. They're different projections of the same semantic foundation. You create Kimball dimensions as views over your core layer. Need a hot fact table for heavy BI workload? Fine, materialize just that specific aggregation. Need graph traversals? Add adjacency caches only when deep traversals prove slow. Think about what this means—you never remodel, you re-render. New use case? Add a projection. Don't rebuild the foundation. **One takeaway:** Stop choosing between modeling approaches. Build the semantic foundation once, project it many ways. I am doing this right now in my current own stack. My semantic layer here is a Yaml model that defines the core entities, how they build up and how you use them for analytics work. Based on that I can materialize them in any way. The thing both Zach and Robert are assuming: you need semantic clarity first. You need to know what your entities are and how they relate before you can toggle between physical representations. That's where conceptual modeling stops being just tactical and becomes strategic. ### Signal 3: Discovery Path vs Design Path **Source:** Juha Korpela - "Conceptual Modeling: Thinking Beyond Solution Design" **Link:** [https://commonsensedata.substack.com/p/conceptual-modeling-thinking-beyond](https://commonsensedata.substack.com/p/conceptual-modeling-thinking-beyond) Juha's substack is quite new - make sure to subscribe. Juha challenges something I think most of us take for granted: that data modeling exists only to design specific solutions. He proposes conceptual models actually operate on two paths. There's a DESIGN path—yeah, creating specific solutions. But there's also a DISCOVERY path—building enterprise semantic knowledge that accumulates over time. The historical failure we've all seen? Those massive Enterprise Data Models that tried to comprehensively model entire organizations top-down. They collapsed under their own scope. Every time. Juha's alternative makes a lot more sense to me: domain-driven consolidation. Multiple models organized around business domains, built bottom-up from actual solution work. Not a monolith, but an "interlinked collection." I do this in my projects as well. Here's the payoff: semantic discoveries from solution-level work feed an enterprise repository, which then accelerates future solution designs. It's a self-reinforcing loop where tactical work builds strategic assets. **One takeaway:** Don't try to model "everything." Model domains, then connect them. Bottom-up emergence beats top-down declaration. So what's the pattern underneath all of this? Notice what ties these together. Zach says materialize only proven hot paths. Robert says create projections only for actual use cases. Juha says build domains from real solution work, not abstract enterprise schemas. ### Signal 4: Evidence-Based Complexity What I'm seeing across these signals is a philosophical shift I'd call for new **evidence-based complexity**. The old orthodoxy was prevention-based. Design for scale upfront. Normalize aggressively. Create comprehensive models before building anything. The implicit assumption: change is expensive, so get it right the first time. The new orthodoxy is adaptation-based. Start with the simplest thing that works. Instrument it. Let reality show you where complexity is justified. The implicit assumption: change is cheap now—storage is cheap, compute is cheap, refactoring is manageable—so invest complexity only where it delivers proven value. We could call this agile, but this was burned a long time ago. This doesn't mean "move fast and break things." It means "move deliberately and let data guide investment." Zach's three-tier virtual architecture. Robert's toggle-to-physicalize pattern. Juha's bottom-up domain consolidation. All variations on the same theme. **One takeaway:** Treat architectural complexity as a response to measured constraint, not a hedge against imagined future needs. * * * How this edition works: I collect interesting posts in Readwise's Reader (I am using the Mac app). Here, I also follow the blogs I am interested in. So discovery are usually three areas: Linkedin (far less than it was), Reader (Blog RSS) and ChatGPT or Claude Research (when I have a topic I want to learn more about). I scan the posts or watch the videos on higher speed (my attention span for work topics is difficult). And mark ideas, thoughts and concept that I find interesting. Then I extract them and distill them with my thoughts about by a LLM (Claude Code at the moment) into my obsidian vault. When it's time for the signal edition, I use the LLM again to discover potential topics. Then we find 2-3 sources in my vault and then we develop together a main thought that ties them together and a storyline to tell this thought. This goes down to very specific bullet points. I then let the model write the final version based on my writing style. If you have any feedback - hit reply and let me know.

dbt blues — notes on the Fivetran merger

Wed, 22 Oct 2025 00:00:00 GMT

When dbt Labs announced their merger with Fivetran, the data community had feelings. A lot of them. The dbt fear index—Oliver from Lightdash's brilliant term for the spike in repository forks—told the story better than any LinkedIn post could. [🚨 The dbt Fear Index just spiked 🚨 The number of of dbt forks are going crazy on rumors of a Fivetran acquisition and the timing couldn't be more wild with Coalesce right around the corner. In… | Oliver Laslett | 59 comments](https://www.linkedin.com/posts/olaslett_the-dbt-fear-index-just-spiked-the-activity-7379468972464816128-PpRq?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAVmDYYBJYYRQhp6GZzC3jTQVw18yFDI2s4) — 🚨 The dbt Fear Index just spiked 🚨 The number of of dbt forks are going crazy on rumors of a Fivetran acquisition and the timing couldn’t be more wild with Coalesce right around the corner. In times of crisis: traders buy gold, bros buy crypto, and data people... fork dbt. Obviously 90% of us will never touch that fork again. But it’s a nice safety blanket for our emotions. But the signal is clear: when rumors shake the stack, engineers want control. dbt-core is the spine of modern data pipelines, it’s absolutely prolific. It’s definitely got people thinking about real platform risks that are out there. The good news is: the core technology is simple. It’s been proven twice already that a dbt-compatible interface can be built (SDF Labs - Tobiko - a Fivetran company). So I’m excited to see what happens next. I’m sat with my popcorn and watching my dbt-core logs fly by. Taking in this calm moment before the storm. 👉 What about you: - Did you fork already? Are you looking for alternatives? - Is this healthy paranoia, or just jitters? - Do you not care at all and just love how good my chart looks? | 59 comments on LinkedIn I'm not going to analyze the deal here. I'm not a finance expert, and frankly, Ethan Aaron already wrote the best summary of what happened: it's mostly about money, investment structures, and valuations that never quite added up. [As someone someone who worked in M&A before starting Portable, I’m quite fascinated by the Fivetran <> dbt Labs merger… 1. The customer overlap is crazy high — I.e. there isn’t much of a cross-sell… | Ethan Aaron | 25 comments](https://www.linkedin.com/posts/ethanaaron_as-someone-someone-who-worked-in-ma-before-activity-7383538864348360706-3QC-?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAVmDYYBJYYRQhp6GZzC3jTQVw18yFDI2s4) — As someone someone who worked in M&A before starting Portable, I’m quite fascinated by the Fivetran <> dbt Labs merger… 1. The customer overlap is crazy high — I.e. there isn’t much of a cross-sell opportunity (they’ve both saturated the market) 2. Fivetran already had a transformations product that was similar to dbt cloud (it’s not a wildly new feature set) 3. Tobiko was effectively a pawn. I expect the combined company to go all in on cloud + enterprise (OSS is dead). When you have an enterprise solution, there’s no real benefit to spending a ton of time on a truly OSS solution 4. It’s killed the partnerships model in the ecosystem. Any transformation, orchestration, reverse ETL company (and honestly others too) need to find new partners (check out Portable if you need and ELT partner) __ Why do I believe is this happening: 1. A enterprise ready solution (new logo acquisition) 2. Power to increase prices on existing clients 3. Strategic postering for IPO or potential strategic acquisition __ Is it good for data teams? Doesn’t matter… that’s not why these deals happen. They happen for investors. (Personally I think this puts data teams in a bad spot if you don’t yet have a viable alternative you can actually switch to for the various components, and the ability to do so quickly) __ Run a data team and trying to hedge against crazy cost increases this renewal cycle? You need an actual alternative that meets your needs. I’d recommend demos with: 1. Portable for ingestion (1500+ integrations, predictable pricing) 2. Coalesce.io or Coginiti for transformation 3. Portable for reverse ETL (coming very very soon — happy to demo it for you) 4. Streamkap for streaming and change data capture sources 5. Orchestra for data orchestration If you don’t have a viable alternative that meets your needs, you have ZERO negotiating leverage and you will simply become a stat in Fivetran IPO docs (something along the lines of ‘revenue from existing clients grew 75% year over year’ which effectively equates to your team’s cost being jacked up for the same product year over year). Who else would you recommend checking out? __ Through all these acquisitions, very few new clients are being introduced for cross-sell (outside of trying to unlock faster enterprise deals). __ So, just remember: Your cost increases are Fivetran’s growth numbers. | 25 comments on LinkedIn That high valuation from the peak investment years? It was always hanging around dbt's neck. But here's what I am interested in: this moment is giving people a glimpse of something they maybe forgot—open source isn't forever, and convenience isn't the same as fundamental. dbt did something remarkable. It unlocked "modern" data stacks that probably wouldn't exist in their current form without it. The convenience layer it added wasn't just nice-to-have; it was transformative. But it was also always solving the easy problem, not the hard one. And understanding that difference explains where dbt came from, why it could never evolve beyond what it was, and what comes next. Not for "the industry." For me. For the work I'm doing now. This isn't really the blues. It's the end of one chapter and the beginning of another. ## 1\. The Rise: dbt's convenience layer unlocked modern data stacks Let me start with what dbt actually is, because I think people sometimes forget: at its core **dbt is an SQL orchestrator.** ![](/images/posts/dbt-blues/image-1.png) That's it. When you're building a data model—or really, just writing a set of queries—you need them to run in order. Five queries? You can schedule them in your database. But the moment you get beyond that, when query B depends on query A finishing first, manual scheduling falls apart fast. By the time you have 200 transformations, you need something that understands dependencies and can run everything in the right sequence. Before dbt came around, people were solving this problem with home-built systems. I worked on two or three projects with custom implementations—basically, pre-dbt versions cobbled together to do the same thing. Fishtown Analytics had the same problem everyone else did. But they did something smart: they standardized it. They created a framework that was easy to understand, easy to adapt, easy to execute locally, and easy to run in production. The building blocks were clean. That's why dbt got massive adoption so fast. Possibly the fastest-growing data application in the last 10 years. It solved an immediate, obvious problem that people already understood and desperately wanted help with. It wasn't revolutionary—it was the right standardization at the right time. ### Peak 1: Product From a product perspective, dbt peaked the moment it launched. It nailed orchestration. The system was straightforward, easy to learn, reasonably easy to extend. People were thrilled. It was the missing puzzle piece. And then... it never really went beyond that. ![](/images/posts/dbt-blues/image-2.png) Don't get me wrong—they released features for dbt Core, and then there was dbt Cloud. But dbt Cloud was never an evolution of the product. It was a weird extension. A managed service that moved some things around: where jobs run, maybe a slightly smoother developer experience for teams working on the same codebase. But running jobs was never really a big problem for most companies. Setting up a build job in GitHub Actions is pretty straightforward. The developer experience? Sure, if you've got a 50-person team, dbt Cloud might smooth out some configuration headaches. But the implementation never felt well-executed. And here's the thing: analytics engineering is often tedious. Debugging can eat hours. There was—and still is—space to solve real problems in that process. dbt never did. That's why I say they peaked at launch. They solved orchestration brilliantly and then plateaued. Maybe that explains why we're here now, with a merger instead of a Snowflake-level trajectory. They never created a product that became valuable enough that people would happily pay serious money for it. ### Peak 2: Community The community peak might have been the most significant one. And I'd guess it happened initially by accident. ![](/images/posts/dbt-blues/image-3.png) There was this huge void in the data space. People doing data engineering, writing SQL, maintaining data models, running ad hoc queries—they had nowhere to ask questions, to come together. It's complex work. There are so many questions. And a lot of new people were starting to build data setups. And suddenly, dbt wasn't just a tool; the dbt community became _the place_ to ask those questions. What the dbt team did early on was excellent: open office hours, the early Coalesce conferences, local dbt meetups, dbt blogs. It was all practitioner-driven. The vibe was good. The content quality was high. It created this backbone where people developed real affection—maybe even love—for the tool. The local user groups are still good. Coalesce eventually shifted more enterprise, which is probably natural, but that early energy? That was special. They discovered the community and then facilitated it beautifully. ### Peak 3: The Standard All of that led to the third peak: dbt became a standard. When I talk to enterprise clients now, I don't have to explain what dbt is. They've heard of it. Maybe they've tried it. Maybe they're using it. Even if they're not, I don't have a hard sell when I say, "We should run dbt here—orchestrating transformations is real work, and dbt makes it straightforward." That's an easy conversation in environments where things usually sell hard. dbt became _a standard_. But what kind of standard? Here's the question nobody can really answer. Fivetran was the ETL standard for a some few years—especially for loading generic datasets. Then it diluted, other tools came in. But dbt is still in stack definitions. Still showing up in conversations. People say things like, "We use dbt for our data model." Except... they don't. Not really. And this is where things get interesting. ### The convenience that unlocked everything Some People say dbt is a standard for data modeling. It's not. It could never be. The data model itself is a design. It happens on a whiteboard. It's decisions about structure, relationships, and logic. dbt can represent some of that structure—how you organize your queries, how you break down your transformations. But dbt is an **executor**, not the modeling layer. You use dbt to physically execute or process the data model. It runs the transformations. It makes sure they happen in order. That's orchestration, not modeling. But here's what dbt did do: it made orchestration convenient enough that teams could actually build and maintain sophisticated data operations. Without that convenience layer, most modern data stacks wouldn't exist in their current form. The barrier to entry would have been too high. Teams would have gotten stuck in the mess of custom solutions and given up, or they'd have thrown their hands up and said, "We'll just use scheduled queries and hope for the best." dbt lowered the barrier. It standardized the approach. It made analytics engineering accessible at scale. That convenience wasn't a nice-to-have—it was transformative. It unlocked an entire wave of data work. That impact was real. And we should recognize it. ## 2\. The Ceiling: Why dbt could never evolve beyond convenience Let me explain my view why dbt was never going to become Snowflake. Snowflake, BigQuery, Redshift, Databricks—they solve a fundamental problem. You need to store massive amounts of data somewhere. You need to query that data in a cost-effective way. Without that capability, the entire data use case collapses (or changes in its form). Take those services away, and you've taken the whole solution away. That's what makes them fundamental. ![](/images/posts/dbt-blues/image-4.png) Is dbt fundamental? Not really. ### You can always work around it Technically, we're just running SQL queries. We've been running SQL queries forever, in different ways. You can schedule them—yes, that doesn't work if you have 200 or 1,000 transformations, but it works for simpler setups. Or you can use a generic orchestrator. When dbt got started, people were already using tools like Jenkins. Airflow exists. You can run all your SQL in there. dbt made it easier. The dependency management, the testing framework, the documentation—it was all more convenient. But there's no absolute _need_ for it. If you took dbt away from every data team tomorrow, they'd come up with a solution in four to six weeks. You could extract the compiled SQL. You could literally have an intern run it every morning if you wanted. Or—and maybe this is the better option—you'd use it as an opportunity to redesign your data model, because maybe the reason it got so complex in the first place is that you kept layering transformations on top of transformations without stopping to ask if there was a simpler way. This is the part dbt never solved. Teams could always work around it. And if dbt became too expensive? They'd find that workaround fast. That's why dbt never became something people were willing to invest massive amounts of money in. It's convenience, not necessity. ### The problem dbt never solved Everyone understood the benefit of dbt: orchestrating transformations. Dependencies handled. Tests in place. Documentation generated. Great. But dbt never answered the harder question: **What transformations do I actually need to do?** And that's the problem that actually matters. When you have 1,000 transformations in your data model, maintaining it is hell. Ensuring data quality across that many moving pieces? It's a job you can only lose. Something will break. Something will drift. Some assumption someone made six months ago will turn out to be wrong, and you'll spend days tracking it down. ![](/images/posts/dbt-blues/image-5.png) dbt tried to help with data quality tests. You could add tests to check for nulls, for uniqueness, for referential integrity. But tests are reactive. They tell you something broke. They don't prevent you from building an unmaintainable mess in the first place. The core problem is the transformations themselves. How many do you actually need? How should they be structured? What's the right level of granularity? When should you merge logic, and when should you split it apart? These are design questions, and dbt doesn't help you answer them. It can't. It's an executor. It runs what you tell it to run. (Snowflake btw, also doesn't answer these, but it is one layer down, far away to think about it). People kept saying they were "using dbt for data modeling," but they were really using dbt to execute a data model they'd designed somewhere else—on a whiteboard, in their head, through trial and error. The modeling itself? That was still hard. dbt didn't make it easier. ### Why it could never evolve This explains why dbt peaked at launch and never really went beyond it. Orchestration was the "easy" problem. dbt nailed it. But once you've nailed orchestration, where do you go? You can add features—testing, documentation, packages, macros—but those are all extensions of the same core capability. They don't change the fundamental value proposition. To evolve beyond that, dbt would have needed to solve the hard problem: transformation design. How do you help teams build maintainable data models? How do you guide decisions about what transformations to create? How do you make the design process itself better? That's a much harder product to build. It requires a different approach, different primitives, maybe even a different mental model. And dbt was never structured to do that. It was built to orchestrate SQL, and it did that well. But "orchestrate SQL well" has a ceiling. You hit it pretty fast. ### The merger makes sense now When you look at it this way, the Fivetran merger makes more sense. dbt couldn't become Snowflake-level valuable because it wasn't Snowflake-level fundamental. It was a convenience layer on top of something teams could always do another way. The same what Fivetran is now. That high valuation from the peak investment years? It was always a weight around their neck. How do you grow into a valuation that assumes you're solving a more fundamental problem than you actually are? You don't. You find a different path. The community loved dbt. The product worked. But love and functionality don't necessarily translate to the kind of revenue that justifies a massive valuation. Especially when your customers know, in the back of their minds, that they could work around you if they had to. That's not a criticism of dbt. It's just the reality of what they were solving. And maybe—just maybe—that's okay. Because the fact that dbt was a convenience layer, not a fundamental piece of infrastructure, means we're free to move on when something different comes along. ## 3\. My Next Iteration: Moving beyond orchestration to meta models The core layer is shifting. Data engineers are working on the next iteration of how we store and query data. Turns out, even models like Snowflake and BigQuery have limitations when you're dealing with the scale they enabled. More use cases mean more data, more queries, more compute. At some point, the bills start growing faster than the business value, and engineers start looking for ways to handle even bigger datasets more effectively. You can follow this rabbit hole if you want—experiments with table formats like Iceberg, what's happening with DuckDB and Duck Lake. It's fascinating. The paradigm of how data gets stored and queried might change significantly over the next five years. It's already starting. And just like dbt was only possible because Snowflake, BigQuery, and Redshift created new use cases, these new fundamentals will create space for new approaches. New tools. New ways of thinking about the problem. But for me, the question isn't about what tools come next for everyone. It's about what comes next for the work I'm doing. ### The problem dbt never touched There's something dbt unlocked but never addressed: metadata management. How we approach data modeling and transformation design at a higher level. And tbf - these approaches and tools existed before and next to dbt. But just for a much smaller audience. dbt let you write a lot of transformations. It chained them together. It ran them in the right order. It gave you some helpers—DRY principles through macros and packages—but anyone who's worked with macros knows that was never ideal. It was functional, but clunky. Metadata models that generate data models aren't new. They existed before dbt. But they were never really accessible to most people—usually proprietary software, expensive, hard to work with. And that's the problem: a 1,000-transformation model is hard to maintain. That's not a surprise. If you want to ensure data quality across 1,000 transformations, it's basically a job you can only lose. You can't win. So the question becomes: how do you avoid building/maintaining 1,000 manual transformations in the first place? ### My current experiments: standardization and meta layers I don't have a universal answer. I have an answer for myself. At least what I want to test more. I work with a very standardized approach to building data models. I do similar projects—mostly in the growth area, combining event data with product usage, marketing attribution, and subscription metrics. Over time, I've developed a standardized data model for these setups. I know what the structure looks like. I know what the entities are. I know how they relate to each other. ![](/images/posts/dbt-blues/image-6.png) What I'm doing now is building a meta layer on top of that standardization. It makes it easier to define the inputs—what entities we're looking at, how they're configured, what properties matter. Then I let the meta model generate the output. Right now, that output is dbt. I'm basically putting a layer on top of dbt. But here's the thing: it doesn't actually matter if it's dbt or not. I could extend it to output Airflow configurations, or Dagster, or whatever. The meta layer is what I'm managing. The orchestration is just... there. And that's when I realized: I don't really need a transformation orchestrator anymore. I just need an orchestrator. Something that runs things in the right order. That's it. ### Agentic workflows and configuration-as-code There's another piece to this that I'm still exploring: agentic analytics engineering. When you work with a meta model, you're working with typed entities. You define clearly how an entity should be transformed, how it should be built up. Everything is configured in code—not SQL you write by hand, but configuration that generates the SQL. That makes it much easier to build agentic workflows. You can tell an agent, "Here's the entity structure. Here's the configuration format. Generate the transformation for this new entity based on these parameters." The agent doesn't need to write perfect SQL from scratch. It needs to fill in a configuration template. Simon has written about this in a really good way: [Data Modeling for the Agentic Era: Semantics, Speed, and Stewardship](https://www.ssp.sh/blog/agentic-data-modeling/) — Master the three pillars of agentic data modeling: Metrics SQL for semantics, sub-second analytics for speed, and AI guardrails for trusted insights. He comes to the same conclusion: when everything is configured in code, you don't need a transformation orchestrator. ### dbt got us here None of this would be possible without dbt. dbt made analytics engineering accessible. It created the space for people to think about these problems. It matured the practice enough that we can now see what the next level looks like. I'm not trying to predict "the next wave" for the industry. I'm just describing my next iteration. What I'm building for the work I'm doing now. And I can only do it because dbt existed, because it unlocked this whole space, because it gave me a foundation to build on. But I don't need dbt anymore. Not for what I'm trying to do next. ### Not the blues—the beginning So when I see people panicking about the dbt/Fivetran merger, when I see the dbt fear index spiking, I get it. Change is uncomfortable. The thing you relied on feels less stable. But this isn't really something to mourn. dbt did its job. It was the essential bridge that got us here. It unlocked modern data stacks. It created a community. It gave us the space to experiment and grow and figure out what we actually needed. And now? Now we're ready for what comes next. For me, that's meta models and agentic workflows. For you, it might be something else. But we're all past the point where we need a transformation orchestrator to solve the hard problem. The hard problem is transformation design - data models. And we're finally mature enough to start tackling it (ok, tbf - enough people were mature enough all the time). That's not the blues.

The End of Digital Analytics

Mon, 15 Sep 2025 00:00:00 GMT

as we know it. When Amplitude announced their new chief evangelist some days ago, most people saw a standard hire - congratulations comments galore. I saw something different: a clear signal that digital analytics as we know it is fundamentally over. This wasn't just any hire. They brought in someone who embodied everything that Google Analytics 4 represented—the old marketing analytics world that digital analytics had been built around for two decades. It's like watching a species evolve in real time, except the original habitat is disappearing. Amplitude is essentially saying "we're the new Google Analytics," but for a world where Google Analytics no longer makes sense. The choice feels deliberate. Amplitude is absolutely competing with Google Analytics, but they're going after the pro marketers—the ones with serious budgets who need more than GA4's confusing interface and limited capabilities can deliver. But here's what makes this moment significant: it's not an isolated move. It's the clearest signal yet that the era of digital analytics—the one defined by marketing attribution, "shedding light into dark spaces," and that persistent feeling that maybe we're all participating in an elaborate scam—is over (sorry, that might sound a bit harder than intended) I should know. I've spent the last two years writing about how product analytics was changing, how attribution was crumbling, and watching my own client base shift from product teams to revenue people asking completely different questions. The Amplitude announcement just confirms what I've been seeing in my projects. [Leaving product analytics](https://hipster-data-show.ghost.io/leaving-product-analytics/) — an analysis of the current state of product analytics and beyond The foundations that held up digital analytics for 20 years are cracking. What comes next is still taking shape. ![](/images/posts/the-end-of-digital-analytics/image.png) In this post, I'll break down why digital analytics was always built on shaky ground—promising data-driven decision making but mostly delivering the feeling of being scientific without the actual business impact. I'll show you how the collapse of marketing attribution (the one thing that actually worked) combined with GA4's disaster created the perfect storm that's ending this era. More importantly, I'll walk you through the two distinct paths emerging from this collapse: **operational customer experience optimization** for teams that need speed over sophistication, and **strategic revenue intelligence** that finally connects user behavior to business outcomes. Both represent fundamental shifts away from traditional analytics, and understanding them will help you navigate what's coming next. * * * A quick break before we dive in: If you like join [Juliana's](https://julianajackson.substack.com) and my new sanctuary for all minds in analytics and growth that love to call out BS and really want to do stuff that works and makes an impact: [ALT+](https://goalt.plus) -> we have thoughtful discussions like this one and we run monthly deep dive cohorts to learn together about fundamental and new concepts in growth, strategy and operations. Head over to [https://goalt.plus](https://goalt.plus) and join the waitlist - we are opening up by the end of September 2025. And we have limit the initial members to 50 in the first month. * * * ## **Part I: What Digital Analytics Actually Was** Before I call out the end of something, I need to define what that something actually looked like. Because if we're honest about it, digital analytics was always built on a contradiction that most of us just learned to live with. We told ourselves we were doing "data-driven decision making." We built dashboards that showed visits, users, conversion rates. We created elaborate tracking setups, instrumented tag managers (even created a sub-genre for this - the tracking engineer) and celebrated when we could tell you exactly how many people clicked a specific button. But there was always this nagging question underneath it all: what does any of this actually mean for the business? I felt this contradiction more acutely than most. In every project, I'd set up these sophisticated analytics implementations, deliver insights about user behavior, and watch clients get excited about finally "understanding their users." But I always had this uncomfortable feeling that I should probably mention that most of this data wouldn't actually change much for them. Because knowing that 200 people clicked a button doesn't tell you what to do about it. ![](/images/posts/the-end-of-digital-analytics/image-1.png) That contradiction defined the entire era. And to understand why digital analytics is ending, we need to understand what it really was—not what we pretended it was. ### The Promise That was Great at the Time but Never Delivered Let me take you back to when digital analytics felt revolutionary. At least to me. Google Analytics launched in 2005, and suddenly you could see exactly what was happening on your website. People were visiting your pages! They were clicking your buttons! You could track their entire journey from the first page view to checkout. For the first time, the black box of user behavior was cracked open. The promise was intoxicating: build-measure-learn. Eric Ries was preaching the lean startup gospel, and analytics was supposed to be the "measure" part that made everything scientific. You'd launch a feature, measure how people used it, learn from the data, and iterate. No more guessing. No more building things users didn't want. Pure, data-driven decision making. Trust me, I truly believed in that. ![](/images/posts/the-end-of-digital-analytics/image-2.png) Product teams were told that if you weren't measuring, you weren't serious about building great products. Marketing teams were promised they could finally prove ROI and optimize every dollar spent. The analytics industry convinced everyone that the difference between successful companies and failures was whether they had proper tracking in place. And the tools got more sophisticated. Amplitude and Mixpanel emerged with event tracking that made Google Analytics look like the entry level kid. Now you could track custom events, build complex funnels, analyze cohort retention. You could segment users by any behavior imaginable. The data got richer, the dashboards more beautiful, the possibilities seemingly endless. But here's what actually happened in practice: You'd implement comprehensive tracking. You'd build dashboards showing user flows, conversion rates, feature adoption. You'd present insights to stakeholders who would nod appreciatively at the colorful charts. And then... not much would change. Sure, you'd learn that 15% of users clicked the blue button versus 12% who clicked the red one. But what were you supposed to do with that information? The difference could be noise. It could be meaningful. Most of the time, you couldn't tell. You'd discover that users dropped off at step 3 of your onboarding flow. But was it because step 3 was confusing? Because users weren't motivated enough to continue? Because they'd already gotten what they needed? Because you were targeting the wrong audience? The data showed you _what_ was happening, but rarely _why._ ![](/images/posts/the-end-of-digital-analytics/image-4.png) The more sophisticated the tracking became, the more this gap became apparent. You could slice and dice user behavior in infinite ways, but the path from insight to action remained frustratingly unclear. Companies would spend months implementing detailed analytics setups only to find themselves asking the same question: "This is all very interesting, but what should we actually do about it?" The lean startup promise of rapid iteration based on data feedback loops worked for simple cases—A/B testing button colors or headlines. But for the complex questions that actually mattered to businesses—why users weren't converting, what features to build next, how to improve retention—the data rarely provided clear answers. Instead, what digital analytics delivered was the _feeling_ of being data-driven without the actual business impact. It gave teams something concrete to discuss in meetings, numbers to put in reports, charts to present to management. It made the messy, uncertain process of building products feel more scientific and controlled. ![](/images/posts/the-end-of-digital-analytics/image-5.png) But underneath it all, most product and business decisions were still being made the same way they always had been: through intuition, customer feedback, market research, and educated guesswork. The analytics just provided a veneer of quantitative support for decisions that were fundamentally qualitative. This wasn't entirely useless. Having some data was better than having no data. And occasionally, the insights were genuinely actionable. But the gap between what was promised and what was delivered was enormous—and almost everyone in the industry knew it, even if we didn't talk about it openly. ### The Two Things That Actually Worked Despite all the overselling and unfulfilled promises, digital analytics wasn't completely worthless. When I look back honestly at what actually delivered business value, two things stand out. **Marketing Attribution (The Real MVP)** Marketing attribution was the killer app of digital analytics. Full stop. Before Google Analytics, if you ran digital ads on multiple campaigns, you were basically flying blind. You'd spend money on Google AdWords, some banner ads, maybe early Facebook campaigns. Each platform would claim credit for conversions, and you'd end up with 150% attribution because everyone was taking credit for the same sale. Google Analytics solved this by being the neutral referee. It sat on your website, tracked users from all sources, and told you which marketing touchpoints actually led to conversions. For the first time, marketers could confidently say "this campaign drove 47 conversions, that one drove 12" without having to trust the advertising platforms' self-reported numbers. ![](/images/posts/the-end-of-digital-analytics/image-6.png) This was genuinely revolutionary for marketing teams. It enabled proper budget allocation, campaign optimization, and ROI calculations. When your CFO asked why you needed that $50k marketing budget, you could show them exactly which channels were driving revenue. And the feedback was fast, just some days of data collection. The attribution models got more sophisticated over time—first click, last click, time decay, position-based. Tools like Amplitude and Mixpanel eventually added multi-touch attribution that could track the entire customer journey across months of touchpoints (well, in theory at least). This was real, actionable data that directly impacted how businesses spent money. Marketing attribution worked because it solved a concrete business problem: "Where should I spend my marketing budget?" The data quality might not have been perfect, but it was infinitely better than the alternative of blindly trusting advertising platforms or making gut decisions. But over time multiple things happened - browsers blocked specific tracking, users did the same, then cookie consents did it, but what "destroyed" marketing attribution with Google Analytics was the way how marketing works today: cross multiple channels, including plenty that never create any immediate touchpoints on your website. A lot of direct and brand traffic and the uneasy feeling that this is all triggered by other things that were not tracked (LinkedIn posts, YouTube videos,...). **Shedding Light Into Dark Spaces (The Softer Win)** The second thing that worked was much less dramatic but still valuable: making the invisible visible. Before analytics, you literally had no idea what was happening on your website. How many people visited? Which pages were popular? Where did people get stuck? It was all a black box. Analytics opened that box and showed you patterns you never could have guessed. You'd discover that 40% of your traffic was going to a blog post you'd forgotten about. Or that users from mobile were behaving completely differently than desktop users. Or that your carefully designed homepage flow was being bypassed by 70% of users who came in through search and landed on product pages. ![](/images/posts/the-end-of-digital-analytics/image-7.png) This "shedding light" value was real, even if it rarely led to immediate action. It changed how teams thought about their products and websites. It made abstract concepts like "user behavior" concrete and discussable. UX designers could see which parts of interfaces were being ignored. Product managers could identify features that nobody was using. Marketing teams could spot traffic patterns they'd never noticed. Content creators could see which pieces resonated and which flopped. The problem with this second category was that the insight-to-action gap was enormous. Learning that users spent an average of 2.3 minutes on your pricing page was interesting, but what were you supposed to do with that information? Was 2.3 minutes good or bad? Should you make the page shorter or longer? The data raised more questions than it answered. But it still had value. It gave teams a shared vocabulary for discussing user behavior. It made conversations more specific and less speculative. Instead of arguing about whether users "probably" did something, you could look at the data and see what they actually did. **The Harsh Reality** Here's the uncomfortable truth: marketing attribution probably accounted for 80% of the actual business value that digital analytics delivered over the past 20 years. Everything else—all the sophisticated funnels, cohort analyses, user journey mapping, behavioral segmentation—was mostly the "shedding light" category. Interesting, sometimes useful, but rarely driving major business decisions. Even in my own work, when clients pushed me to identify the specific business impact of our analytics implementations, it almost always came back to marketing attribution. That was where the rubber met the road, where data actually changed how money got spent. This imbalance was the industry's dirty secret. We sold comprehensive analytics packages that promised to transform how companies understood their customers. But in practice, most of the value came from solving the much narrower problem of marketing measurement. The rest was elaborate window dressing that made the whole thing feel more important and scientific than it actually was. ## Part II: The Foundation Is Cracking ### The Collapse of Marketing Attribution [Barbara](https://www.barbaragaliza.com) and I have been running [marketing attribution workshops](https://attributionmasterclass.com) for over a year and a half now. We decided to do this because we both had the feeling that something was shifting significantly with attribution, and there was a lot of confusion about it. What we discovered in our research was sobering: the role and possibilities of click-based attribution—the foundation of marketing analytics for two decades—is decreasing every year. And it's decreasing for reasons that go far deeper than the obvious culprits everyone talks about. ![](/images/posts/the-end-of-digital-analytics/image-8.png) **The Obvious Factors (That Everyone Mentions)** Yes, consent requirements in Europe matter. When you have to ask users for permission to track them, you immediately limit your ability to connect marketing touchpoints to conversions. Apple's anti-tracking initiatives have been devastating for attribution, especially on mobile where they have dominant market share. And there's now a whole cottage industry of consultants promising to get you "a little bit more data back" through server-side tagging and other technical workarounds. ![](/images/posts/the-end-of-digital-analytics/image-9.png) But these technical and regulatory changes are just symptoms of a much bigger shift. **The Real Problem: Marketing Has Evolved Beyond What Attribution Can Handle** The fundamental issue is that marketing itself has changed dramatically, while attribution tools are still built for a world that no longer exists. In the early days of ecommerce, digital marketing was relatively simple. You ran most of your campaigns on Google Ads (Adwords at the time) because search was the dominant way people discovered products. Maybe you added Facebook when social advertising became viable. You had two channels, maybe three, and attribution could realistically capture 70% of the customer journey because there weren't that many touchpoints to track. Today's marketing is exponentially more complex. You can't design a successful marketing strategy around just paid search and paid social anymore. Companies run campaigns across dozens of channels: influencer partnerships, podcast sponsorships, YouTube content, LinkedIn thought leadership, community building, email sequences, retargeting campaigns, affiliate programs, PR initiatives, and SEO content strategies. ![](/images/posts/the-end-of-digital-analytics/image-10.png) The more channels you operate across, the harder attribution becomes. When someone watches an impactful YouTube video you created, gets retargeted on Instagram, reads your newsletter, sees your founder speak at a conference, and then finally converts—how do you attribute that sale? Traditional attribution systems can maybe capture two or three of those touchpoints if you're lucky. **The Black Box Takeover** Here's where it gets really problematic: advertising platforms have largely given up on deterministic attribution too. They've moved to probabilistic models—black boxes that use machine learning to estimate conversions and optimize campaigns automatically. Google Ads and Facebook Ads now essentially say "trust us, our AI knows what's working" rather than providing granular data about which specific touchpoints drove conversions. They've shifted to broad targeting and automated bidding strategies where you hand over control to their algorithms and hope for the best. This fundamentally changes the relationship between marketers and data. Instead of using attribution data to fine-tune campaigns, marketers are increasingly just setting broad parameters and letting platform AI handle the optimization. The detailed attribution analysis that justified tools like Google Analytics becomes less relevant when you're not making granular targeting decisions anymore. **The Brutal Reality Check** Here's a (n educated guessed) number that could terrify anyone in the analytics industry: from all the analytics setups I know, maybe 10% are actually used to make good marketing analytics decisions. Most companies still have Google Analytics or Amplitude running. They still generate monthly reports showing visits, conversions, and channel performance. But when you dig into how marketing budgets actually get allocated, how campaigns actually get optimized, and how strategic decisions actually get made, the attribution data plays a much smaller role than it used to (or a too big one based on the sample sizes). The rest is just data theater—numbers that get presented in meetings to make discussions feel more scientific, but don't fundamentally change what anyone does. Marketing attribution was the one part of digital analytics that consistently delivered business value. It was the foundation that justified the entire industry. And now it's crumbling, not because the technology failed, but because the marketing landscape evolved beyond what attribution systems can meaningfully measure. When the main thing that actually worked stops working, everything else becomes much harder to justify. ### The Google Analytics 4 Disaster GA4 remains a mystery to me. I made a video a long time ago trying to explain why GA4 ended up the way it did, but the gist is this: it looks like GA4 is a product that was trying to serve 2-3 completely different strategies, and that definitely didn't work out well in a product. Maybe it's not even that complex. Maybe GA4 just reflects that Google Analytics doesn't play a significant role for Google anymore, and the decision was made that GA could be just a good entry product for getting people onto Google Cloud Platform and BigQuery. This would totally fit Google Analytics' history, since it was always a bit of an entry product for a different offering. Initially, Google Analytics was the sidekick of Google Ads. Now maybe Google Analytics 4 is the sidekick of Google Cloud Platform. **Abandoning the Core Audience** But here's what's fascinating about what happened in the marketer space: Google basically abandoned their default audience. For everyone who worked in marketing, Google Analytics was the default tool. It was the thing every marketer knew they had to invest time in understanding, at least the basics, to do their job properly. You'd learn how to set up goals, understand acquisition reports, and work with the basic attribution models. It wasn't perfect, but it was predictable and learnable. One thing you always have to keep in mind, which I think is often forgotten in the analytics space, is that marketers have a lot of things to do. Analytics and analyzing performance is just a small part of their job. They need tools that work out of the box and don't require extensive training to get basic insights. Google Analytics Universal was exactly that. It had a very opinionated and strict model for doing things, but that's what made it work so well. The constraints actually helped—marketers knew where to find things, how to interpret the data, and what the numbers meant. **The Migration Nightmare** Then came the GA4 migration, which wasn't really a migration at all. Google forced everyone to implement completely new tracking code, learn a entirely different interface, and adapt to a fundamentally different data model. This was essentially a "rip and replace" project disguised as an upgrade. For a tool that was supposed to be accessible to everyday marketers, this was devastating. The new interface was confusing. Basic reports that took two clicks in Universal now required navigating complex menus or building custom reports. Simple concepts like "sessions" were replaced with more abstract event-based models that required technical understanding to interpret correctly (tbh - I loved the move to an event-based user-focussed model, but for other reasons). The timing made it worse. Google announced the Universal sunset with barely 18 months' notice, forcing teams to scramble. Many companies spent months or years getting their Universal Analytics setup just right, only to be told they had to start over from scratch. **A Product Without a Clear User** The most damaging thing about GA4 is that it's unclear who it's actually built for. The interface is too complex for casual marketers but not powerful enough for serious analysts. The reporting is too limited for agencies but too overwhelming for small business owners. The data model is more flexible than Universal but harder to understand for non-technical users. It feels like Google tried to build one product that could serve everyone and ended up serving no one particularly well. Universal Analytics succeeded because it had clear constraints and made specific trade-offs. GA4 feels like a compromise that satisfies no one's needs completely. **The Opportunity This Created** Google's misstep created a massive opportunity for tools like Amplitude and Mixpanel. Suddenly, there were millions of frustrated marketers who had to relearn analytics anyway. If you're going to invest months learning a new tool, why not learn one that actually solves your problems instead of creating new ones? This is exactly the market opportunity that Amplitude is going after with their evangelist hire. They're positioning themselves as "Google Analytics, but actually good"—the tool for marketers who have outgrown GA4's limitations and have budget for something better. The GA4 disaster proved that the market was ready for professional-grade marketing analytics tools. Google had essentially trained an entire generation of marketers to expect more from their analytics, then delivered a product that provided less. That gap is what Amplitude, Mixpanel, and others are trying to fill. **The Final Crack in the Foundation** The combination of attribution collapse and the GA4 disaster created a perfect storm. Just as the main value proposition of digital analytics (attribution) was becoming less reliable, the dominant platform that millions of teams depended on was imploding. This left the entire digital analytics ecosystem in chaos. Teams that had been running on autopilot with Universal Analytics suddenly had to make active decisions about their analytics strategy. And when they looked honestly at what they were getting from their current tools, many realized they'd been going through the motions for years without much actual business impact. GA4 didn't just fail as a product—it forced an entire industry to question whether traditional digital analytics was still worth the effort (tbh - I am exaggerating here a bit - most teams swallowed the toad (not even sure if you say it like that in English - but that's a German saying) and just implemented GA4 and did nothing with it like before). ## Part III: Two Paths Forward Digital analytics will still tag along for a long time. Things rarely disappear quickly—usually they just stick around because people have invested in them. But I'm seeing two distinct directions where the real activity is moving. ![](/images/posts/the-end-of-digital-analytics/image-11.png) Both have their origins in the old marketing and product analytics world, but they represent fundamental shifts in what we're trying to accomplish. One direction is highly operational and immediate. The other is more sophisticated and strategic. And interestingly, they're attracting completely different audiences than traditional analytics ever did. This is where it gets interesting. ### Path 1: Customer Experience Optimization **Marketing Teams Need Speed, Not Deep Analysis** One characteristic of marketing teams that often gets overlooked: marketing is (or can be) highly operational and fast-moving. What I mean is this: marketing teams want to experiment constantly, implement quickly, and see results within days or weeks, not months. When you look at the experimentation space, it's interesting how differently product teams and marketing teams approach testing. Product teams will run careful A/B tests on new features, analyzing user behavior over weeks or months to understand the impact. But they're often working with relatively low volumes—maybe a few thousand users see a new feature, and you need time to gather meaningful data. Marketing experimentation is completely different. You can test new ad creative and get results within hours. You can launch a new email campaign to 50,000 subscribers and know by the end of the day whether it worked. You can experiment with different landing pages, messaging approaches, targeting parameters, and budget allocations across multiple channels simultaneously. ![](/images/posts/the-end-of-digital-analytics/image-12.png) The volume and speed of feedback loops in marketing is extraordinary. The best digital marketing teams I've worked with had high rates of experimentation built into their DNA. They were constantly testing new channels, new messaging, new campaign structures. They wanted to know immediately whether something was working so they could either scale it up or kill it and move on to the next test. This creates a fundamentally different relationship with data than what traditional analytics provided. Marketing teams don't need sophisticated cohort analyses or complex user journey mapping. They need to know: did this campaign drive more conversions than that one? Which landing page performed better? More local insights, than complex big picture data. What marketing teams have always needed is operational analytics—data that directly enables action within their day-to-day workflows. They want to create a segment of users who visited the pricing page but didn't convert, then immediately send those users a targeted email or show them specific ads. They want to identify which blog posts are driving the most qualified traffic and create more content like that. They want to see which campaigns are underperforming and pause them before they waste more budget. _The AI Acceleration Factor_ With AI capabilities expanding, this operational speed is going to increase dramatically. Marketing teams will be able to generate multiple versions of ad creative, test different messaging approaches, and optimize campaigns in real-time with minimal human intervention. This amplifies the need for analytics that can keep up with the pace of iteration. When you're testing dozens of variations simultaneously and making daily optimization decisions, you can't wait for weekly reports or manual analysis. You need systems that automatically surface insights and enable immediate action. The gap between "here's what the data shows" and "here's what you should do about it" needs to shrink to zero. Marketing teams operating at AI speed won't have time for the traditional analytics workflow of data exploration, insight generation, and then separate implementation phases. They need integrated systems that combine analysis and action in the same interface. This is why the future of marketing analytics looks not like the traditional digital analytics we've known. It's not about better charts or more sophisticated analysis—it's about making data operationally useful within fast-moving workflows that are only getting faster. **From Analytics to Action: The Amplitude Evolution** When Amplitude started moving into the marketing space, they did the obvious first step: feature parity with Google Analytics. They added attribution capabilities, channel grouping, and ecommerce functionalities. With cart analytics introduced about three years ago, they were essentially saying "we can do everything GA does, but better." This made sense strategically. There were millions of frustrated GA4 users who needed to migrate anyway, so why not offer them a more powerful alternative? But achieving feature parity was just the entry ticket. The real evolution happened when they started thinking about what comes after analytics. _The CDP Experiment_ Amplitude experimented with becoming a Customer Data Platform (CDP). The logic was sound: if you can identify interesting user segments in your analytics data, why not enable users to immediately act on those insights by sending targeted messages via SMS, email, or ad platforms? They quickly learned that building CDPs is complex work. Anyone who's worked with identity matching and identity graphs knows this pain. You can see they've scaled back from the full CDP vision—they still offer basic activation capabilities, but they've strengthened their integrations with dedicated CDP tools like Segment rather than trying to replace them entirely. But this experiment taught them something important: the gap between insight and action was where the real value lived. _The AI Agent Approach_ Amplitude's second AI product launch was much more interesting than their first (which was the obvious "chat with your data" feature that everyone shipped). They introduced an AI agent that continuously analyzes your data, identifies improvement opportunities, and most importantly, connects those opportunities to specific actions you can take. [Meet Amplitude AI Agents](https://amplitude.com/ai) — AI experts that analyze metrics, deliver insights, and enable intelligent action. Instead of just telling you "many users drop off during onboarding step 3," the AI suggests concrete changes you could make to that step. It might recommend specific messaging tweaks, identify friction points that could be streamlined, or suggest A/B tests that could improve conversion rates. This is a fundamental shift from traditional analytics. You're not getting reports that require interpretation—you're getting actionable recommendations that are ready to implement. The analysis and the suggested action are bundled together. _A thought experiment: Analytics + CMS + AI_ This approach could go much further than what Amplitude has built so far. Imagine connecting your analytics platform to your content management system with AI-generated messaging capabilities. You could have a content pool designed for your main personas across different use cases. Based on real-time analytics data, the system identifies that a visitor appears to match persona A and seems interested in use case B. The platform then dynamically generates and serves content optimized for that specific combination. Instead of static websites with fixed messaging, you'd have adaptive experiences that evolve based on user behavior patterns, continuously optimized by AI, and informed by real-time analytics data. The feedback loop between data collection, analysis, and content optimization would happen automatically within minutes instead of months. **The Hotjar Pattern: Why Simple Beats Sophisticated** Hotjar was always a weird tool for me. I'd encounter it constantly in client setups, running alongside Google Analytics, and I'd think: "Why do teams need this when GA already has low value, and Hotjar feels even more limited?**"** But I was missing the point entirely. _What Hotjar Got Right_ Hotjar solved one specific problem extremely well: it made user behavior immediately visible and actionable. You could see a click map overlaid on your landing page within minutes of implementation. You could watch session replays of actual users interacting with your site. You could set up simple surveys that appeared at key moments in the user journey. ![](/images/posts/the-end-of-digital-analytics/image-14.png) The data wasn't sophisticated. The insights weren't groundbreaking. But the feedback loop from question to answer was incredibly fast. A UX designer could see exactly where users were clicking, identify obvious friction points, and make immediate changes. A marketer could watch recordings of users struggling with the checkout flow and spot conversion killers within hours. While I was judging Hotjar for lacking analytical depth, UX designers were using it to make their interfaces better every day. They weren't doing complex behavioral analysis—they were seeing patterns that helped them solve specific problems quickly. _The ContentSquare Acquisition_ Hotjar's acquisition by ContentSquare validates this approach on an enterprise scale. ContentSquare has been the leader in customer experience optimization for years, but primarily at the enterprise level with enterprise-level complexity and pricing. The customer experience optimization space was always missing something below the enterprise tier. These enterprise solutions felt like classic sales-driven products—not attractive by default, only valuable after a sales team convinced a CMO they needed the "magic" capabilities. ContentSquare acquiring Hotjar (and Heap after that) gives them access to the bottom-up market that Hotjar had built. Suddenly, customer experience optimization isn't just for enterprises with six-figure budgets and dedicated teams. It's accessible to any company that wants to understand how users actually interact with their interfaces. _Why This Pattern Matters_ The Hotjar pattern reveals something important about the direction analytics is moving. Users increasingly prefer tools that are: - **Immediately actionable** rather than analytically sophisticated - **Workflow-integrated** rather than requiring separate analysis phases - **Problem-specific** rather than trying to serve every possible use case - **Fast feedback loops** rather than comprehensive data collection Amplitude has clearly learned from this pattern. They're building features that feel more like Hotjar—click maps, auto-tracking capabilities, and immediate visual feedback—rather than just adding more analytical complexity. The future of customer experience optimization isn't about more sophisticated analysis. It's about making user behavior visible and actionable for the people who can immediately improve the experience. Sometimes the simpler tool that solves one problem well beats the sophisticated platform that solves many problems adequately. This is the operational direction that digital analytics is moving toward: less analytical depth, more immediate utility. Tools that help teams make better decisions faster, rather than tools that help analysts generate more comprehensive reports. Interestingly, this makes this space less interesting for me. Which is totally fine. ### **Path 2: Revenue Intelligence** **The Audience Shift: From Product Teams to Revenue People** About 16 months ago, something changed in the type of projects I was getting approached for. I'd spent years working with product teams on classic product analytics—helping them set up tracking, build funnels, understand cohort reports, and get insights from their behavioral data. But I started getting contacted by a completely different profile: revenue people. CFOs, Chief Revenue Officers, heads of growth or the data teams working for them. And they were asking fundamentally different questions. _"We Only See Revenue as a Post-Fact"_ Their problem was consistent across companies: "Right now, we're looking at revenue as a post thing. We see the sales numbers for a specific week or month in our BI reports. We can break it down by product, maybe by marketing channel. But that's it. We don't have much to work with for planning or improving our revenue." This was interesting to me because these were the people who actually controlled budgets and made strategic decisions about company growth. And they felt blind to the leading indicators that could help them understand how revenue gets generated. They wanted to see beyond the final numbers. Questions like: "How many accounts do we lose because we never activate them properly? How much revenue potential are we leaving on the table? When we see usage patterns declining, can we predict churn risk before it shows up in cancellations?" _The Demand for Early Intervention Signals_ What struck me was how different their needs were from traditional product analytics. Product teams wanted to understand user behavior to improve features. Revenue people wanted to understand the entire pipeline from first touch to recurring revenue, with the ability to intervene before problems became expensive. They needed predictive signals, not just historical reports. They wanted to identify accounts at risk of churning before the churn happened. They wanted to spot revenue expansion opportunities while there was still time to act on them. They wanted to understand which early user behaviors actually predicted long-term value. ![](/images/posts/the-end-of-digital-analytics/image-15.png) Most importantly, they wanted to combine different data sources that had never been connected before. They had rich behavioral data from their product, subscription data from their billing systems, and marketing data from their campaigns. But these lived in separate systems and were never analyzed together in a way that could predict business outcomes. _Why Product Teams Weren't Enough_ This shift made sense when I thought about it. Product teams, despite all the talk about being data-driven, often struggle to connect their work directly to business outcomes. They can tell you that Feature X has 40% adoption, but translating that into revenue impact requires assumptions and indirect measurements. Revenue people don't have that luxury. They're accountable for actual business results, not engagement metrics or feature adoption rates. They need to understand the complete customer journey from acquisition through expansion and renewal, with clear visibility into where revenue gets created or lost. The audience shift I experienced reflected a broader recognition that behavioral data is most valuable when it's connected to business outcomes, not when it's analyzed in isolation. Product analytics was always a step removed from the business metrics that executives actually cared about. Revenue intelligence closes that gap. _The Budget Reality_ There's also a practical element: revenue people control bigger budgets and have more urgency around their problems. A CFO who can't predict churn risk or identify expansion opportunities is dealing with million-dollar blind spots. A Chief Revenue Officer who can't see early indicators of pipeline health is flying blind on the most important part of their job. Product teams often struggle to justify analytics investments because the ROI is indirect and hard to measure. Revenue teams can directly calculate the value of better forecasting, earlier churn prediction, and improved conversion tracking. When you can prevent one high-value customer from churning, the analytics investment pays for itself. This audience shift signals something important about where the analytics industry is heading. The future belongs to tools that can directly connect user behavior to business outcomes, not tools that provide interesting insights about user behavior in isolation. **Breaking Free from SDK Limitations** The breakthrough moment for me came when I realized that what I wanted to achieve wasn't possible with SDK-based tracking systems anymore. This was the essential step to understanding how to bridge the gap between user behavior and business outcomes. _The Fundamental Problems with SDK Tracking_ SDK-based tracking means sending events from the browser or server using tracking libraries. You implement code that fires events when users perform actions, and those events get sent to your analytics platform. This approach has severe limitations when you're trying to build serious business intelligence. First, you never have guaranteed 100% data delivery. Networks fail, browsers crash, users navigate away before events finish sending. The worst projects I worked on were the ones where we tried comparing analytics numbers to actual business data—like comparing "account created" events in Amplitude to actual account records in the database. The numbers never matched because tracking isn't designed for guaranteed delivery. Second, SDK tracking requires significant implementation and maintenance work from developers. And here's the problem: tracking isn't their main job. Their main job is building the product. So you constantly hit this tension where data quality depends on developer time and attention, but developers have other priorities. The tracking setup slowly degrades over time as features change and tracking code doesn't get updated. _The Data Warehouse Approach_ The solution was to approach this like a classic data project instead of a tracking project. Treat different data sources as inputs, bring them together in a data warehouse, apply a well-designed data model, and then create the metrics and insights that different teams need. ![](/images/posts/the-end-of-digital-analytics/image-16.png) This meant getting off the platforms and building everything in the data warehouse where I could control data quality, combine multiple sources, and ensure complete data capture. In my current project, we track only 2-3 events with SDKs. Everything else—10-15 additional events—comes from other sources: external systems, application databases, webhook data. We identify the information we need from existing data sources and transform it into event data rather than trying to track everything through code. _The Technical Benefits_ This approach solves multiple problems simultaneously: **Complete data capture**: When you pull events from database records, you get 100% coverage. Every account creation, subscription change, or product interaction gets captured because it has to be recorded in the database for the application to function. **Better data quality control**: You can run data quality tests, monitor for anomalies, and fix issues retroactively. If you discover a problem with how you defined an event six months ago, you can reprocess the historical data. Try doing that with SDK-based tracking. **Sophisticated identity resolution**: You can spend time properly stitching accounts across different systems. When you have HubSpot data and product usage data, you can determine exactly how well they match and what percentage of accounts you can connect. It becomes an engineering problem with measurable solutions, not something you just hope works. **Retroactive changes**: You can create new events from historical data, add calculated properties to existing events, or fix data quality issues. This is impossible with SDK tracking but trivial in a data warehouse. _Beyond Event Analytics_ But the bigger breakthrough is that this approach enables you to combine behavioral data with all the other business data that traditional analytics never touched. You can connect product usage patterns to subscription changes, marketing attribution to customer lifetime value, and support ticket volume to churn risk. When you build everything in the data warehouse, you can create synthetic events that represent business outcomes rather than just user actions. You can generate events like "account became at-risk" or "expansion opportunity identified" based on complex logic that combines multiple data sources. This is where revenue intelligence becomes possible. You're not limited to analyzing what users clicked or which pages they visited. You can analyze the complete customer journey from first marketing touch through renewal and expansion, with all the business context that makes behavioral data actually meaningful. The SDK approach kept digital analytics trapped in a world of user actions without business context. Moving to the data warehouse approach finally enables you to connect user behavior to business outcomes in a rigorous, measurable way. **Building the Assembly Line to Revenue** This is where revenue intelligence becomes practical. Instead of looking at revenue as a mysterious black box that either goes up or down, you can map out the entire assembly line that produces it—and identify exactly where the breakdowns happen. _The Pipeline That Actually Matters_ Let me give you a concrete example. You're a SaaS company that gets 1,000 new accounts this month. Without revenue intelligence, you'd track this as "1,000 new signups" and maybe celebrate the growth. Three months later, you'd see that very few of them converted to paid subscriptions, but you wouldn't know why. With revenue intelligence, you track the complete pipeline: 1,000 new accounts, but only 100 reach an "activated" state—meaning they've done something in your product that gives them a real sense of what value it can deliver. That means you're losing 900 accounts before they even understand what they signed up for. ![](/images/posts/the-end-of-digital-analytics/image-17.png) This isn't just an interesting metric. It's a business emergency with a clear dollar value attached. You can calculate exactly how much revenue potential you're losing by failing to activate 90% of new accounts. More importantly, you can start investigating why activation is so low and test interventions to improve it. _Early Intervention Opportunities_ Traditional business intelligence tells you what happened after it's too late to fix it. Revenue intelligence gives you signals when you can still do something about it. When you see an account's usage declining, you can flag them as at-risk before they actually churn. When you identify patterns that predict expansion opportunities, you can reach out while customers are still receptive. When you spot activation problems, you can fix the onboarding experience before you lose more potential customers. This is fundamentally different from the "post-fact" reporting that revenue teams complained about. Instead of looking at last month's churn numbers and wondering what went wrong, you're getting alerts about accounts that might churn next month—while there's still time to save them. _Connecting the Dots_ The real power comes from connecting behavioral patterns to business outcomes across the entire customer lifecycle. You can see that accounts who complete specific onboarding actions within their first week have 3x higher lifetime value. You can identify that accounts from certain marketing channels take longer to activate but have better retention once they do. This kind of analysis was impossible with traditional analytics because you couldn't connect the behavioral data to the business outcomes data in a meaningful way. Revenue intelligence bridges that gap by building everything in the data warehouse where you can combine subscription data, product usage data, marketing attribution data, and support interaction data into a single model. _The Metric Structure That Explains Growth_ What you end up with is a metric tree or growth model that explains how revenue gets generated rather than just measuring how much revenue you got. You can see the conversion rates at each stage of the actual customer journey: from visitor to account to activated user to trial subscriber to paying customer to expanded account. When revenue growth slows down, you can immediately identify which part of the pipeline is broken. Are you getting fewer new accounts? Is activation declining? Are paying customers churning faster? Each problem has different solutions, but you can only fix what you can see. This diagnostic capability is what revenue teams have been missing. They could see the symptoms (revenue growth slowing) but not the underlying causes. _Beyond Traditional Analytics_ This approach solves the fundamental problem that always limited digital analytics: the disconnect between user behavior and business impact. Traditional analytics could tell you that users clicked buttons and visited pages, but couldn't tell you which of those actions actually mattered for the business. Revenue intelligence flips this around. It starts with the business outcomes that matter—revenue, retention, expansion—and works backward to identify which user behaviors actually predict those outcomes. Instead of measuring everything and hoping some of it is important, you measure the things that demonstrably drive business results. The difference is profound. Revenue teams finally get the forward-looking insights they need to manage growth proactively. And for the first time, **analytics becomes genuinely strategic** rather than just informational—because it directly enables better decisions about the activities that drive business success. * * * Wow, this is 7,500 you read so far - I had a lot of thoughts and I hope they were useful in some way. Digital analytics as we knew it is over. The foundation it was built on—marketing attribution and the promise of data-driven decision making—has crumbled. GA4's disaster accelerated the collapse, but the underlying problems run much deeper. Two directions are emerging from the wreckage: **Customer Experience Optimization** represents the operational future. Tools that enable immediate action rather than deep analysis. AI agents that suggest specific improvements rather than generate reports. Systems built for marketing teams who need speed, not sophistication. **Revenue Intelligence** represents the strategic future. Analytics that finally connects user behavior to business outcomes. Data warehouse approaches that combine behavioral patterns with subscription data, marketing attribution, and business metrics to predict and prevent revenue problems before they happen. Both solve different pieces of the original promise that digital analytics never delivered. Neither looks much like the analytics we've known for the past 20 years. The era of collecting data and hoping it's useful is ending. What's beginning is the era of systems that either enable immediate operational improvements or directly predict business outcomes. Everything else is just data theater. Which brings us back to Amplitude where this post started. This is my opinion (without having any clue about their strategy). Clinging on the old marketing analytics past, will not get them into the future. Yes, it can be an entry to marketing teams, that cling to GA, to have them let go and start with Amplitude. Their real future is in the operational customer experience space. They have all the ingredients, now they need to cook. And with that in mind, I can't understand why they pick a person from the "old" analytics world to spread the words. They had it with Adam Greco and he moved on to a modern CDP approach. But Amplitude always played the game on different levels at the same time. So they might do this here is well. We will see. Interestingly, everyone I worked closely with at Amplitude has now moved on. This can tell something, or nothing. Revenue intelligence will be no opportunity for them. They tried to push the warehouse integration. But, I guess, most companies don't have the data setup to go that way. And to be honest, Amplitude is not flexible enough as a tool to support this use case. But that is my view and obviously very opinionated. What are your thoughts? Connect with me on [LinkedIn](https://www.linkedin.com/in/timo-dechau/) and write me your opinion as direct message. Or join Juliana and my new sanctuary for all minds in analytics and growth that love to call out BS and really want to do stuff that works and makes an impact: [ALT+](https://goalt.plus) -> we have thoughtful discussions like this one and we run monthly deep dive cohorts to learn together about fundamental and new concepts in growth, strategy and operations. Head over to [https://goalt.plus](https://goalt.plus) and join the waitlist - we are opening up by the end of September 2025. And we have limit the initial members to 50 in the first month.

Vibe Analytics — when everyone becomes an analyst

Wed, 04 Jun 2025 00:00:00 GMT

## **1\. Introduction: The Vibe Shift** So, vibe coding is a thing. I won't spend this post arguing why vibe coding is a thing, because that's not really the purpose here. To be honest, the whole thing is in a weird space right now. On one hand, you see the super enthusiastic people posting about how they built a million-dollar app over a weekend. Definitely not true. But on the other hand, you see very experienced developers pointing out how this whole thing cannot really work. And then in the middle, you have developers saying, well, it's really good for prototyping, but not much more. Look, I'm not a professional developer (although I studied computer science), so I don't have a strong opinion on this matter. But I've been testing these tools for over a year now, and the speed of progress is crazy. When I compare the apps I was building a year ago to what I'm building now - it's wild. And if I project this forward another 12 months, we'll be having a completely different discussion. So I'm not going to dive into where we are right now, because it's not really relevant. What I want to explore is this: software development might be the first area where we're seeing new paradigms evolving. There are several reasons why it's happening there, but the interesting thing is - if vibe coding is becoming a thing, what about vibe analytics? And I want to collect some early thoughts about this. Nothing here is solid, nothing is properly tested. It's just an early thought experiment. So let's look into vibe analytics and see where this might take us. The thing is, if we're honest about it, analytics has always been in a weird spot. We've built these complex data stacks, created elaborate dashboards that no one really wants to look at, and spent countless hours modeling data that only a handful of people can actually understand. And now, with LLMs reshaping how we work with code, it's natural to ask - what happens to analytics? I've been thinking about this for a while now. Not in a "will AI replace analysts" kind of way - that's boring and probably the wrong question. But more like: if the barrier to building software is dropping dramatically, what happens when the barrier to doing analytics drops too? What happens when anyone can just dump their data somewhere and get insights? And more importantly, what happens to us, the people who've been doing this analytically-minded work for years? Where do we fit in this new world where everyone can theoretically "do analytics"? And here's what I think might happen: analytics, as we know it, is becoming invisible middleware. It's becoming this background layer that just works, like plumbing or electricity. You don't think about it, it's just there. The whole data modeling, the ETL pipelines, the carefully crafted metrics - all of that becomes infrastructure that hums along in the background. It still needs platform engineers, but far fewer people are doing the heavy lifting. ![](/images/posts/vibe-analytics-when-everyone-becomes-an-analyst-and-analysts-become-everything-else/image.png) But here's the interesting part - while analytics becomes invisible, analysts themselves are evolving into something completely different. We're not going to be the people building complex data models or insights anymore. Instead, we're becoming operators and strategists. We're moving into the actual business, into growth, into product, into marketing. We're bringing our detective mindset, our pattern recognition skills, our ability to ask the right questions - but we're applying them directly to business problems, not data problems. And we have some superpowers that we can bring to the business. ## **2\. The Evolution of Analytics Enhancement** ### **Phase 1: Chat with Your Data** (Why this isn't vibe analytics) You definitely see the first attempts at enhancing analytics with LLMs. Most of them revolve around this cringy concept of "chat with your data" or "have a conversation with your data." I mean, yeah, it's the obvious one. We see ChatGPT working, we know we have all this data that no one can really use, and we're stuck building these ridiculous dashboards that no one wants to look at. Plus, let's be honest, most people don't actually understand what the dashboards mean unless you have a really good analyst explaining it to them. ![](/images/posts/vibe-analytics-when-everyone-becomes-an-analyst-and-analysts-become-everything-else/image-1.png) So the obvious step is to say, okay, let's make it conversational. Let's let people chat with their data. And look, I've tested some of the early tools. It's definitely cool to see how well they can return some initial results. But this is just enhanced analytics, not vibe analytics. It's like putting a chatbot on top of your existing mess and calling it innovation. The biggest problem is - and this has always been the case - the core challenge in analytics isn't getting answers. It was always about asking the right questions. Good analytics teams spend most of their time figuring out what questions actually matter, what questions will lead to real business impact. Once you have the right question, finding the answer is usually pretty straightforward. Most analytics initiatives fail not because we couldn't find the data or build the dashboard, but because the questions we were answering had no real connection to business outcomes. So this whole "chat with your data" thing doesn't solve the fundamental problem. Sure, someone from marketing might have a specific question that's been bugging them for months, and they can finally get an answer without bothering the data team. That's nice. But they're still asking the same limited questions, just faster. They're not suddenly asking better questions. And that's why this isn't vibe analytics - it's just the same old analytics with a conversational interface slapped on top. ## **3\. How Vibe Analytics Actually Can Look Like** ### **The Marketing Campaign Example** Let me give you a concrete example. Say the marketing team runs this big initiative - they've created an industry report, spent good money on a survey, compiled everything into a shiny PDF. It's a classic lead magnet play. People download the report, give their email, and hopefully, these are decision-makers who might buy your software. The marketing team runs campaigns around it - some Google Ads, LinkedIn thought leadership pieces, the whole thing. It's a big thing. In the traditional approach, you'd have data scattered everywhere. Web analytics data about landing page visits, source information in GA4, leads in your CRM with form responses, maybe some enrichment data, campaign performance in various ad platforms. If you have a really good analytics team, they'll sit down with marketing, understand the whole initiative, figure out how to connect these data sources in the warehouse, maybe extend some models, and eventually produce a nice dashboard. The premium version is when an analyst manually compiles a report with insights and recommendations. Everyone hopes for the best. But here's what vibe analytics could look like. Someone from marketing sits down, creates a new chat - maybe in Claude or ChatGPT - and just dumps everything in there. The initial strategy docs, all the campaign assets, the Google Ads data, LinkedIn Ads data, GA4 exports, HubSpot data, even qualitative feedback from sales calls. Everything. Then they ask the obvious question: "I want to do a post-mortem on this marketing campaign. What went well, what didn't work, where can we improve?" No mention of making an analysis of the data. ![](/images/posts/vibe-analytics-when-everyone-becomes-an-analyst-and-analysts-become-everything-else/image-2.png) The key difference isn't just that it's faster or easier. It's that you're combining context with data. The LLM understands what you were trying to achieve, not just what happened. It can see patterns across disconnected data sources that would take weeks to model properly. And most importantly, it can help you ask better follow-up questions. You're not limited to the pre-built dashboard metrics anymore. You can explore, dig deeper, ask "why" and "what if" questions that would normally require a new analytics project. That's when analytics stops being about delivering answers and starts being about discovering insights. Does it have flaws? Of course, it has. Can it go wrong? Of course it can. However, this could also be possible in the old model. The major shift here is that data becomes just one ingredient in the mix and not the focus topic like in classic initiative reporting. ### **From Analytics to Operations** And here's where it gets really interesting. When you start doing this, you realize vibe analytics very quickly becomes vibe fine-tuning. It's not just about analyzing what happened anymore - it's about constantly adjusting and optimizing in real-time. The line between analysis and action basically disappears. Let me paint you a picture. You run initiatives - could be marketing, growth, product, whatever. Every two weeks, you do a check-in. You pull all the data - and data here means everything. Analytics data, sure, but also qualitative feedback, customer interviews, support tickets, strategy documents, asset performance, even Slack conversations about the initiative. You dump it all in, run a full analysis, develop next steps, and implement them over the next two weeks. Then you repeat. This isn't traditional analytics anymore. It's operations. You're not building dashboards for other people to maybe look at and hopefully make decisions. You're directly involved in the fine-tuning process. You're asking questions, getting answers, and immediately turning those into actions. The whole thing becomes this continuous loop of sense-making and adjustment. ![](/images/posts/vibe-analytics-when-everyone-becomes-an-analyst-and-analysts-become-everything-else/image-3.png) What this means is that everyone starts becoming either an operations manager or a strategist. If you're in the weeds, constantly fine-tuning based on data and context, you're doing operations. If you're zooming out, looking at the bigger patterns, thinking about where to place the next big bet, you're doing strategy. The traditional analyst role - the person who sits between the data and the business, translating one to the other - that role just evaporates. We all become operators or strategists, using data as naturally as we use Slack or email. Ok, I think, you get the idea, that I am hopelessly optimistic. But why not dream about what could be possible? ## **4\. The Product Manager Parallel** This reminds me of something from my product days. The product manager role is actually kind of weird when you think about it. It basically exists because there was (and is) this massive disconnect between engineers and the business side. You needed someone to translate business needs into technical requirements, to decide what should actually be built, to bridge these two completely different worlds. The PM became this necessary translator, this bridge between two groups that couldn't really understand each other. But here's the thing - I've worked with some companies that don't have product managers at all. They just work from first principles. Designers design features, developers build features (and sometimes vice versa), sales people suggest features based on customer feedback. Product development happens through this complex, organic system where everyone contributes. It's messy, but it works because everyone understands both the business context and has the tools to actually build things. Now imagine where vibe coding takes us. When everyone in a company can potentially build a feature, when the sales person can prototype their idea over a weekend, when the customer success team can fix that annoying bug customers keep complaining about - what happens to specialized roles? The PM role might just dissolve because the bridge isn't needed anymore. Everyone speaks both languages. And I think the same thing can happen to analysts. When everyone can analyze, when everyone can pull insights from data as easily as they write an email, the specialized analyst role stops making sense. We don't need translators anymore - we need people who can think analytically while doing actual business work. Guess who that will be? ## **5\. The New Role of Analysts** ### **Guardrails and Metrics Systems** But here's the thing - when everyone can build features, run campaigns, and make data-driven decisions, you need incredibly solid guardrails. Think of it as an insurance system for the whole company. You need a rock-solid metrics system that tells you if the company is heading into danger territory. This isn't about vanity metrics or nice-to-have dashboards (or any granular analysis no one really cares about - who clicked on this navbar item). This is about fundamental indicators that scream when something's going wrong, when someone's experiment is tanking core metrics, when the business is drifting off course. ![](/images/posts/vibe-analytics-when-everyone-becomes-an-analyst-and-analysts-become-everything-else/image-4.png) This kind of foundational metrics work becomes even more critical in a vibe world. It's not sexy work - it's deep, careful thinking about what actually matters for the business, how different metrics relate to each other, what the real leading indicators are. And it requires constant fine-tuning as the business evolves. Someone needs to maintain this insurance system, to make sure it's actually catching problems before they become disasters. This might be where some analysts end up - not building reports, but building and maintaining the fundamental measurement infrastructure that keeps the whole vibe operation from going off the rails. It's like being the person who designs the traffic signals and guardrails on a highway where everyone just got their driver's license. ### **The Detective Mindset Advantage** What analysts bring to the table is this weird detective mindset. We're really good at finding strange patterns and then obsessing over them until we understand what's actually happening. We see a weird spike in the data and we can't let it go. We need to know why it happened, what caused it, whether it's real or just noise. This investigative instinct, this pattern recognition ability - that doesn't go away just because analytics becomes invisible middleware. If anything, it becomes more valuable when applied directly to business problems. Think about it - when analysts move into growth or marketing or product, they approach problems differently. A traditional marketer might see a campaign performing well and scale it up. An analyst-minded person sees the same thing and immediately starts asking: but why is it working? Which specific segment is driving this? What happens if we isolate this variable? Is this performance sustainable or are we just capturing low-hanging fruit? We bring this systematic, investigative approach to everything. And in a world where everyone can run experiments and launch features, having people who can spot patterns, investigate anomalies, and really understand causation becomes incredibly powerful. We're not analyzing data anymore - we're analyzing the business itself, in real-time, with all the context, and with the ability to immediately act on what we find. ### **Data as Invisible Middleware** So what happens to the actual data infrastructure? I think we'll still need core people working on what I'd call platform engineering for data (Robert Sahlin has a great post about it:[https://robertsahlin.substack.com/p/the-golden-path-revolution](https://robertsahlin.substack.com/p/the-golden-path-revolution)). Someone needs to make sure data flows into the right places, that it's accessible, and that it's reasonably clean. But this is really foundational work. Here you lay the foundation that vibing is even possible. You're not building complex data models (for reporting purposes only) or intricate business logic anymore. You're just ensuring the foundational data is there and shaped to meet people's needs. The whole idea of complex business models in the data layer might just disappear. Think about it - we built these elaborate models because different teams needed different views of the data, and SQL was too hard for most people to use directly. But if an LLM can understand your business context and work with raw data to answer your specific question, why do we need these pre-built models? They were always a compromise anyway - they worked great for specific use cases but were useless for others. We will still need an excellent model for the source layer. The data warehouse becomes less like a carefully curated museum and more like a well-organized storage room. ## 6 - Pipedream or what To be honest, no idea if that can play out like this. But we will see change, massive change. And it is a good idea to think about what this change could look like. Running through scenarios. And most importantly, test and experiment. At least, this is what I will do next. Run the kind of scenarios I described before - combining marketing, product, and data, and come up with operational and strategic insights. I will keep you posted.

The Double Three-Layer Framework for Tracking Setups

Tue, 27 May 2025 00:00:00 GMT

_It all started on a drizzly Tuesday, somewhere between the flat-pack sofas and the cafeteria (vegan) meatballs._ You just landed a fresh analyst role at IKEA, and your first orders were simple on paper: **figure out how shoppers use the in-store screens.** Sounds clear, right? But here’s the thing—“use” lives on several floors of the same house. Swipe to scroll? Tap to check stock? Stand there day-dreaming in front of the rug-chooser? Each layer tells a different story. **Step 1 — Spot the many flavors of “use.” - This will be essential later** Ask yourself: - Are we logging _any_ interaction, or only the ones that change a setting? - Do minutes of idle screen time matter? - What are the user's jobs here? - How will this data help the people who actually run the store tomorrow morning? Most likely, ninety-five out of a hundred analysts would go for the first layer—interactions—and never make it to the top floor where the strategic insights hide. ### **A tale of two tracking plans** Fast-forward to past project of mine. I was part of a team of excellent analysts for marketing & product analytics for a giant publishing site, swapping one analytics tool for another. They brought our team in as external consultants, and the whole project ticked all the boxes: 1. **Stakeholders on tap.** We interviewed editors, marketers, and product managers to nail their real questions. 2. **A balanced tracking plan.** Enough detail to be useful, light enough to maintain. Yes, we did really well with the use of properties. 3. **Developer buy-in.** Middlewares, abstractions, tidy code and high data quality—music to an analyst’s ears. 4. **Training bootcamps.** We showed every team where to find numbers they cared about. By the final demo, I was sure we’d built the gold standard project. I pictured dashboards lighting up every Monday stand-up. ### **Reality check, three months later** I walked back in expecting a bustling data bazaar. Instead, the analysts were still crunching the same old reports, and everyone else had quietly moved on. SEO folks shrugged: “I can see button clicks and scroll depth in the analytics tool, but my world is Google rankings. How do those connect?” In plain English, the data never hopped the fence into their daily workflow. That little phrase—_connected to my work_—stuck in my head the whole train ride home. It became the seed for what I now call the **Double Three-Layer Framework**. ## Why Click-Counting Alone Lets You Down You spent two weeks wiring up every essential (no auto-tracking here) click, submit, and scroll inside our app—felt downright good watching those charts spike. Then the CEO walked over and asked a simple question: **“So…are we making people happier or just busier?”** Your screen lit up with heat-map confetti, yet you have no clue how any of these interactions tie back to real value. ### Why interactions don't get you to something valuable: - **They ignore intent.** A thousand “Save” clicks might mean love…or that the auto-save keeps failing. - **They disguise frustration.** Rapid-fire button mashing often signals “this isn’t working,” not “I’m engaged.” - **They don’t map to value.** Your business cares about engagement, churn, revenue, and referrals, not whether people changed the font size. ![](/images/posts/the-double-three-layer-framework-for-tracking-setups/image.png) Think of click logs like monitoring how often a fridge door opens. You’ll know activity is happening, but you still have no idea if anyone actually ate a meal, let alone enjoyed it. Until you pair those door-swings with a meaningful outcome—cooked dinner, fed the kids, cleared out leftovers—you’re guessing at satisfaction. **One way to handle complexity** is to swap the question from _“What did they click?”_ to _“What did they accomplish?”_ That single pivot turns a pile of UI trivia into signals of progress, adoption, and (yep) revenue. ## **Meet the Double Three-Layer Framework** _Picture me six months after that “clicks-are-noise” epiphany, hunched over a whiteboard the size of a small planet._ I needed a model that kept the **business** front-and-center yet still gave designers their precious UI details. After plenty of walks, three truths bubbled up: 1. We’re really watching **three different movies** at once. 2. Each movie follows **three repeating scenes**. 3. If we mix them up, the plot makes zero sense. That’s the backbone of the Double Three-Layer Framework. #### **The three perspectives (the “rows”)** **Layer** **What it answers** **Who cares most** **Customer** “Where is the user in their journey?” Execs, Growth, Success teams **Product** “Which feature or object did they touch?” Product managers, Data teams **Interaction** “Exactly what did they click, view, or scroll?” UX designers, Researchers The higher you go, the closer you get to revenue. The lower you go, the closer you get to pixels. #### **The three building blocks (the “columns”)** 1. **Entities** – the nouns of your product: Account, Board, Asset. Keep it to 3-7 or your schema becomes spaghetti. 2. **Activities** – past-tense verbs that mark progresss: _Board Created_, _Board Shared_, _Subscription Canceled_. 3. **Properties** – the spices: board\_id, asset\_type, account\_plan. They add flavour without forcing new events. Put the grid together and you get a tidy matrix—nine little boxes that map every question from “Why did churn spike?” down to “Which toolbar icon hides in plain sight?” ![](/images/posts/the-double-three-layer-framework-for-tracking-setups/image-1.png) ### **A Walk-Through with Miro as the Guinea Pig** Let’s put the framework to work on a concrete example: Miro. I picked it because many of us have dragged a sticky note around in there, and its feature set is deep enough to test whether a tracking plan holds up. Grab a (virtual) sticky, and we’ll walk through the basics. #### **Identify the core entities** First job: name the big nouns—the pieces customers actually care about. **Entity** **One-liner** **Account** A single login with billing info and workspace settings. **Board** The blank canvas where ideas live. **Asset** Any object you slap on a board: sticky, shape, image, connector. My fun exercise in this is usual what is the heart beat entity? The one entity that is the core of everything, when it is gone, the product is gone. Here the Board is the clear Heartbeat entity. That’s it. Three buckets keep our schema lean; anything else (projects, teams, widgets, templates) can wait until v2. #### **List the key activities** Now bolt verbs onto those nouns. Stick to past tense so events read like a timeline. One trick: these activities in most cases describe the lifecycle within an entity. **Account** - _Account Created_ - _Account Updated_ (rarely useful—skip if it’s just email tweaks) - _Account Deleted_ **Board** - _Board Created_ - _Board Viewed_ - _Board Shared_ - _Board Exported_ - _Board Deleted_ **Asset** - _Asset Created_ - _Asset Updated_ - _Asset Deleted_ Notice what’s missing? “Clicked Sticker Tool,” “Hovered Zoom,” “Shook Mouse.” Those live in the Interaction layer, not here. #### **Attach properties at the entity level** Here’s the crucial bit: **design properties for the entity, then reuse them across every event that involves that entity.** That way you avoid the chaos of one-off properties tacked onto individual events. **Entity** **Reusable properties (examples)** **Account** account\_id, account\_plan\_type, account\_signup\_source **Board** board\_id, board\_template\_used, board\_created\_from (web, mobile) **Asset** board\_id, asset\_type (sticky, image), asset\_size Each event simply inherits the property set for its entity: - _Board Created_ carries all **Board** properties. - _Board Shared_ carries the same **Board** properties plus something share-specific, e.g., share\_method. - _Asset Updated_ reuses the **Asset** properties without inventing new ones. And it can also use the board properties. Design once, reuse everywhere—that’s the secret to a schema your data team can keep in their heads. #### **Run the value litmus test** Ask of every event: _Does this prove or unlock user value?_ - **Board Shared?** Yes—signals collaboration, a core Miro promise. - **Asset Updated?** Maybe—handy for power-user research but not a north-star metric. - **Account Updated?** Usually no—unless a plan change hides inside. If an event flunks the test, demote it to a generic element\_clicked in the interaction layer and move on. Your dashboards—and developers—will thank you. ### **From Product Events to Customer-Journey States** _Quick intro scene._ Last quarter our marketing lead pinged me: “We fired off a ‘We miss you’ email blast—any chance you can prove it moved the needle?” My event tables knew every click, but I still couldn’t tell who was genuinely slipping away versus folks just on vacation. That’s when the life-cycle lens comes into play. #### **Five core states** Think of each account walking a simple path. Define the steps once, revisit them quarterly. **State** **Plain-English test you can run in Amplitude** **New** _Account Created_ in the last 30 days. **Activated** Has **New** flag _and_ either • _Board Shared_ at least once, **or** • _Asset Created_ ≥ 20 times in 30 days. **Active** In any rolling 30-day window: did at least one high-value event (share, export, bulk create). **At Risk** Was **Active** 31-60 days ago, but not in the last 30. **Dormant** No high-value events for 61+ days. Feel free to add a **Power** tier—same logic, just bump the volume thresholds. ![](/images/posts/the-double-three-layer-framework-for-tracking-setups/image-2.png) You can read more about user states in this post [Introducing user states in product analytics](https://hipster-data-show.ghost.io/introducing-user-states-in-product/) — The simplest model to measure your product performance #### **Why bother mapping states?** 1. **Sharper retention math.** Conversion isn’t a one-and-done funnel; it’s a loop. Tracking the share of accounts in each bucket tells you whether you’re gaining stickiness or springing leaks. 2. **Targeted nudges.** “At Risk” cohorts feed re-engagement email lists automatically—no SQL gymnastics each time marketing needs names. 3. **Feature bets with teeth.** If a new “bulk export” button bumps more people from Activated → Active, you’ve got proof beyond vanity clicks. #### **A lightweight setup recipe** **Step A — Build the cohorts.** In Amplitude, create a cohort per state using the event definitions above (remember: events already carry their entity-level properties). **Step B — Track movement weekly.** A simple bar chart of cohort sizes over time shows whether experiments push people forward or let them drift back. **Step C — Close the loop.** Sync _At Risk_ and _Dormant_ cohorts to your email tool; cue a helpful “Need a fresh template?” message—no begging the data team. ## Moving from interaction-focus to business-focus Let me recap here. Over years my tracking approach was based on interactions. And it's a natural start. You look at the application. You think, how do users use it. Using means where do they click, doesn't it. That's why so many people start with tracking interactions. But using also means a use case. And a use case is far beyond interactions. A use case is: Sharing a board with a team (technically this can be button click, but the meta level is more important) is a use case. And you want to know about this. So, did this approach change how analytics data is adopted? Yes, it did. It did not do any magic, most analytics is still not used as it should be. But I saw more business immediately get the data and starting to dive into it. This is enough for me right now. Give it a try - creating event data design just needs a piece of paper. You don't have to ask someone for permission. If you like to learn more about the Double Three-Layer framework, I wrote a whole book about. Check it out here: ![](/images/posts/the-double-three-layer-framework-for-tracking-setups/Check-out-the-Book.png)

The dark side of growth metrics

Wed, 30 Apr 2025 00:00:00 GMT

This will help you to understand why your CFO doesn’t like your growth rate. Metrics by themselves aren't easy. They have these different levels that make them surprisingly complicated. On the surface, metrics look straightforward - you've probably seen those LinkedIn posts promising "the ultimate list of 200 SaaS metrics" where you just pick what works for you and build your KPI dashboard. The tricky part comes when you dig deeper. Actually applying and using metrics creates real frustration because metrics need proper context and clear definitions. That's one challenge that trips up many teams. But today I want to explore something different - an area often missed in public metric discussions. I call it "the dark side of gross metrics." Before explaining what I mean by this, I need to walk you through some background so everything fits into place. This hidden aspect of metrics might change how you think about measurement in your marketing and product work. ## The shiny side of growth metrics and models When we look at the available content about growth analytics, we can find some excellent resources. For example, there's Abhi's metric tree that covers the growth side for a subscription-based business in amazing detail. It provides lots of inspiration for how to build a metric tree for subscription businesses. ![](/images/posts/the-dark-side-of-growth-metrics/image-9.png) *Check out the extensive Miro Board for the full metric tree* Then there's the growth model that Duolingo posted, which has become really popular - probably one of the most shared resources I pass along to people. ![](/images/posts/the-dark-side-of-growth-metrics/image-11.png) And I've written about some metric trees myself, like this e-commerce example: ![](/images/posts/the-dark-side-of-growth-metrics/image-12.png) These models are extremely helpful for understanding how to grow your business. But they're missing something important. These metric trees have this relative who never gets invited to parties - because he's very rational, no fun, and can instantly kill the mood. This uninvited relative is the unit economics of everything happening on the growth side. In my experience, looking at unit economics and the cost side of things is often completely ignored in analytics. We live in this nice fantasy where as long as we create revenue, everything will be fine. But unfortunately, there are plenty of cases where businesses grow revenue significantly, increase their accounts dramatically, and still go out of business because they don't make money in the end. This is where costs and unit economics come in. The promise of this post isn't to fully explain unit economics and their impact. I am working on a course for that (you can join the waitlist here: [https://lp.timodechau.com/strategic\_analytics/](https://lp.timodechau.com/strategic_analytics/) if you want to understand it more deeply). This post is mainly to make you aware that there's a dark side to every growth metric we happily ignore because it might end the party immediately. I want to walk through different examples to show what it would look like if we didn't just focus on the positive side of a metric tree, but also designed the "shadow branch" that tells us whether this whole thing actually makes sense financially. ## **Developing the shadow branch** In our first example we pick an example most of us have come across: E-Commerce Revenue. Let's have a look at our E-commerce Tree snapshot: ![](/images/posts/the-dark-side-of-growth-metrics/image-13.png) So far, so good. We want to grow revenue by new customers since our business wants to scale dramatically and we can't just do it by increasing returning customer revenue. We get more new customer revenue by either increasing the total orders of new customers or by increasing the avg. order value (in best case we increase both). Let's say our marketing team did a great job and they could increase the orders by 20% and even increased the AOV by 5%. Huge party during the monday meeting. But we should look at the shady branches of this part of the tree. The first extension is a quite well known one - we are adding the Return on Ad Spent (ROAS). The ROAS itself deserve a post by it's own, but this is something for the future (or you read Juliana's post about it: [https://julianajackson.substack.com/p/not-all-revenue-is-created-equal](https://julianajackson.substack.com/p/not-all-revenue-is-created-equal)). With the ROAS we at least get a better feedback about if we acquired the users on reasonable economic terms (one problem of ROAS is the time dimension of the initial acquisition and not the long term of the customer value - but we don't want to get started on CLV). ![](/images/posts/the-dark-side-of-growth-metrics/image-14.png) At least here we see if our initial new orders generate can generate profit when we deduct the direct marketing acquisition costs. But this is just part of the picture. Most agencies celebrate good ROAS for their campaigns. It's time to stop the music. This part is usually the party crasher and is loved to be ignored for a long time: ![](/images/posts/the-dark-side-of-growth-metrics/image-15.png) Every product we sell is not pure revenue. We have to acquire or produce and ship the product to our hard-won new customers. So first we can deduct the Costs of Goods sold for each product to get to the Gross Profit (we can also calculate the Gross margin to make it easier to keep in mind what is actually left for profit contribution). But COGS are just the beginning, when we are serious about our business we will calculate the different levels of profit: **Subtract from revenue** **Resulting margin** **COGS** **Gross margin** COGS + variable ops (pick/pack, payment fees) Contribution margin COGS + variable ops + marketing Marketing contribution margin All operating expenses Operating margin All expenses incl. taxes Net margin With all this in place we can calculate a net ROAS, which I can tell you from experience, can ruin a bright agency call pretty quickly. Again ROAS is a short term view - projected on CLV will paint a different picture, but is also a more complex topic. So the shady branch of all growth metric trees are the branches that cover the cost impact and they often represent what I have mentioned before: the unit economics. So let's do a quick refresher. ## What are unit economics Unit economics zooms in on the profitability of a single item you sell and asks a simple question: _does each unit put more money in your pocket than it takes out?_ In our product example, the product sells for €40 while its cost of goods sold (COGS)—the materials and manufacturing—runs €25. That €15 difference is your **gross profit** per unit. Expressed as a percentage, it's a 37.5% **gross margin**, meaning that for every euro of revenue the widget brings in, roughly 38 cents stay with you after covering production. Why does that matter? Because gross margin is the first—and usually the largest—layer of cushion that has to absorb everything else: fulfillment fees, marketing spend, overhead, and eventually your desired profit. A positive, healthy gross margin tells you the product is fundamentally sound; a razor-thin or negative one signals you'll bleed cash no matter how many units you sell. By grounding the rest of your P&L in this single-unit perspective, you can forecast scale-up scenarios, set discount floors, and decide how much customer-acquisition cost (CAC) you can afford without eroding profitability. ## How to work with unit economics in your growth analysis In my work I add the layer of unit economics usually when I have created a growth model. For some models the unit economics are quite straightforward. Let's say for my book ([https://timodechau.com/the-analytics-implementation-workbook/](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/)) I have to pay ~8% for my payment provider and sometimes people use discounts to buy the book. The calculation of my gross profit would look like this: **Layer** **Formula (per unit)** **No discount** **20 % discount** **50 % discount** **Sticker price** P₀ **$40.00** **$40.00** **$40.00** **Discount** P₀ × discount% – $0.00 – $8.00 – $20.00 **Net selling price (cash in)** P₁ = P₀ × (1–discount%) **$40.00** **$32.00** **$20.00** **Payment-processor fee (8 %)** P₁ × 0.08 – $3.20 – $2.56 – $1.60 **Gross profit (€/unit)** P₁ – fee **$36.80** **$29.44** **$18.40** In my growth model, for all organic traffic sources I could now use these values as unit costs to calculate my gross profit. For any paid marketing I would need to extend it with the actual marketing costs for each channel. So I could define $10 contribution margin that I want to make with my book. This would leave me space to afford up to $26.80 Customer acquistion costs (in case there is no discount use - also a good reminder, that discounts can hurt your economics). So, why is all this so important? If you would take the 40$ revenue, define a contribution margin of 10$ - I would calculate with 30$ CAC. Let's assume, I would really cap at 30$ CAC, this means with every book sold, I would loose 3.20$. That would be no fun. ## But we have raised money and need to grow Well congrats. Even in a classic startup growth scenario, where you have raised money to fuel your growth, the unit economics are essential to know. Because it gives you control over how much money you are bleeding with every new customer. It gives you control about the growth curve. Let's highlight this with another example: Meet **“AICanvas” (Gen-AI Image & Design Platform)** **Core offer:** “Unlimited images” plan at **$20 / mo** for creators. **Average usage**: 80 % of users create **100 images/mo**; 20 % create **2 000 images/mo**. **Variable COGS**: GPU inference **$0.03/image** (compute + storage). Hidden cost: model-fine-tuning amortization ≈ **$5 / user / mo**. (the investment) **Customer acquisition**: Social + creator bounties: **$8 CAC**. **Retention / churn**: 6-month average life for light users, 18 months for heavy users ⇒ blended **annual churn ≈ 80 %**. (it's AI crazyness - we need to keep this in mind) **Unit-economics reality**: - Light users: revenue $20, cost $3 + $5 = $8 ⇒ **60 % margin**. (PARTY) - Heavy users: revenue $20, cost $60 + $5 = $65 ⇒ **–225 % margin**. (WELL) Weighted gross margin: (0.8×$12 + 0.2×–$45) / $20 ≈ **–15 %**. **Every new subscriber increases cash burn. CAC doesn’t matter when gross margin is negative.** **Red analytics flags:** - Usage distribution is a **power law**: 20 % “power users” drive 80 % of GPU cost. - Model-tuning spend scales with user count, not usage, so even light users are under-priced. - Cash burn per incremental user **increasing** despite headline “hyper-growth.” The Cash burn can be ok, because it is priced in the last investment round - but being aware of the power user problem, should be addressed in future product or pricing strategy. 1- **Usage-based margin histogram**: reveals heavy-user losses instantly. 2- **Scenario model**: raise price, cap images, or introduce token-based overage fees and re-run margin curve. 3 - **Churn-segmented LTV**: shows that even light users are unprofitable after model-tuning cost, prompting a free-tier/trial + pay-as-you-go design instead of “unlimited.” ## Do we need to build more complex growth models and metric trees now? No. But you should add line items in your growth models for each marketing channels to calculate the marketing contribution margin to evaluate marketing initiatives based on that instead of the revenue. And have your unit economics up to date. This is especially true for SaaS models, where it feels like zero-marginal costs, but that's not true. And for the metric trees - the more I work with them, I see them as snapshot visualization. So I use them as I did here, to showcase a scenario, like how can we calculate a Net ROAS instead of the usual ROAS. Again - if you like to learn more about unit economics, join the waitlist for our course, where we will spent plenty of time on this topic: [https://lp.timodechau.com/strategic\_analytics/](https://lp.timodechau.com/strategic_analytics/)

Product Analytics in a feature factory

Fri, 11 Apr 2025 00:00:00 GMT

There were some heated discussions in the product world some months ago when two product realities clashed significantly. On the one hand, you have the ideal picture of product management: a bunch of really smart people who take a lot of care in product discovery and research (Lenny's podcast with Marty was a good example covering this: [https://www.lennysnewsletter.com/p/product-management-theater-marty](https://www.lennysnewsletter.com/p/product-management-theater-marty)). They excel at prioritizing different initiatives based on their findings. Then, they test features with a small set of users, supported by a well-defined measuring system. Once released, they measure performance, collect feedback, and either iterate on features or develop adjacent ones. It's this ideal product-building machine that creates minimal waste while making significant progress. But on the other hand, you have the feature factories. These are the little dirty secrets of product management. In theory, you think, "Oh, I work in product, so that's great—I can shape things." But the truth is that there are often other forces in your company telling you what to build. I even know product teams without anyone directing them who are still feature factories because the head of product doesn't really care, or the whole team doesn't care. They're just randomly building stuff. Or as Marty calls them: "Product Management theater". ![](/images/posts/product-analytics-in-a-feature-factory/image-5.png) The heartbeat of a product team is a feature release. A product team starts getting questioned when they can't produce features anymore. "Hey, why don't you just build a new feature?" * * * **Quick service announcement:** I have finished my Analytics Implementation Workbook, and I am launching a Reviewer Program. So, if you would like to get a free copy of the book and review it, click the link below. Seats are limited on a monthly basis, first come, first serve. [Fill out the form here.](https://tally.so/r/mV462E) ![](/images/posts/product-analytics-in-a-feature-factory/image-8.png) * * * So there was this heated discussion between the purists or idealists on one side and the pragmatists on the other. Where the idealists say, "Hey, if you're running a feature factory, you're not doing product management." While I definitely have some sentiment for that view (and would count myself in this idealistic camp), you cannot bend reality. If you end up in a position where you are a feature factory, you just build the stuff and do your job. What I want to explore today is what this actually means for product analytics. You could make the case that when you're just a feature factory and your only driving force is to create features, it doesn't matter if you measure them. I mean, you just build features. So far, no one has really asked what kind of features you should prioritize based on anything other than people telling you how to prioritize them. So why should we measure them? Why does it even matter? Truth be told, there's always this weird situation I've seen plenty of times: you have a feature factory setup, but you still have to do product reporting because people expect it. Even when, in the end, the reports are like the feature factory itself—no one actually cares about them. You just have to produce the report, and everyone is happy when it lands in their inbox. Obviously, don't expect that anyone would actually check the report. It's a strange space. So, I want to take some time to look into the different options you might have when you're in this scenario. ## Why do Product analytics at all? Even when it sounds a little bit ridiculous, it's a fair question that goes much deeper into the value product analytics can bring. The idealistic approach would suggest that measurement is an essential part of a good product development setup. However, in a lot of setups that would call themselves good product management setups, the implementation and utilization of product analytics are surprisingly poor. These setups often consisted of either very high-level product metrics or granular interaction measures that didn't provide useful insights for product features. Product analytics is not easy to implement. It's challenging to find the right level of detail that provides meaningful insights to improve product experience. Because of that, especially for a "feature factory," there might be a reasonable argument to leave out product analytics altogether. ![](/images/posts/product-analytics-in-a-feature-factory/image-6.png) Instead of creating vanity reports that will never have any impact, it might be more honest to simply acknowledge that you don't care about data. You could potentially replace quantitative data with qualitative methods like interviews, tests, and demos, avoiding quantitative measurements entirely. There is no imperative that you must run your product with quantitative measurements. It's worth critically reviewing whether you truly need quantitative data for your specific product context. ## Doing Product Analytics for yourself But let's say you are highly motivated and find yourself as an analyst in a feature factory. You got hired, and the craziest thing is that people are often brought into the role of a product analyst without a clear idea of the benefits they can provide. In the classic idealistic product development flow, product analytics is a driving force in the build-measure-learn cycle. However, in many feature factories, the process is reduced to simply "measuring" without truly learning. As a product analyst in such an environment, you might feel frustrated, but there are still opportunities to approach your work idealistically. Even when features are built through management decisions without user-centric input, you can still measure them in a meaningful way. You can track feature adoption, measure the impact on overall product performance, and develop a comprehensive understanding of product mechanics. This includes identifying core and supporting use cases and expressing them through relevant metrics. This approach might seem like you're creating a personal utopia, but it's more about muscle training. By building an idealistic system, you develop a deep understanding of the product that might not yet exist in your organization. When the right moment comes—perhaps when new people join and start asking different questions—you'll be prepared with insights. In my experience, working in feature factories can be unfulfilling, but occasionally, things change. Someone new might arrive and start asking probing questions. Without your prepared insights, these inquiries might quickly fade away. However, with your comprehensive analysis, you can provide valuable information and potentially find allies who become interested in understanding product performance. For instance, you might discover that out of 10 features rolled out last year, only two gained significant traction. By investigating why these two features resonated more with users, you can identify small but meaningful improvements. You might find that a particular rollout strategy or presentation method worked better and can suggest incremental changes to the product team. ![](/images/posts/product-analytics-in-a-feature-factory/image-7.png) While this approach won't create a revolution in feature factory environments—where systemic issues run deep—it offers opportunities for gradual, meaningful improvement. By creating small, consistent "dripping drops" of insight, you can slowly shape how products are built and understood. ## Better ingredients for Product Analytics in a feature factory So, what are the ingredients if you really want to do a good job in a feature factory? The first ingredient is getting the right kind of data. You need to assess your current position and available resources. In some environments, you might already have some data and the possibility to obtain missing information. For example, you could introduce measurement requirements early on in JIRA tickets for new features. You can create a very lean approach for a tracking plan using this awesome book: [The Analytics Implementation Workbook](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/) — The book that tells you everything you need to know about Analytics Implementations Analytics and data always look great from the outside. Companies present at big conferences about the huge competitive advantages they have gained because of their data and analytics setup. So, why is this not working for us? If you face resource constraints, there are several strategies to collect data. One approach is to ask developers to implement explicit auto-tracking (something better than the thing tools offer), such as creating a generic event tracker for all button and link clicks. You capture detailed context in properties like button text, position, and destination. The data can be dumped into a data warehouse, where you can later make sense of the information and transform it into meaningful events. Another method is to read request access to the application's database or create a read replica. By extracting information directly from the database, you can build a complete product analytics setup without using an SDK. This approach provides more flexibility in analyzing data. I describe it in this video: The second ingredient is knowing what to do when no one is listening. First, find allies within the company who are also frustrated with the feature factory model. Group together and share insights that might interest them. You can also create a public channel for your insights, such as a newsletter highlighting feature adoption and performance. Be strategic about how you share information. Some insights might be best shared directly with the product team, while others can be broadcast more widely. The goal is to create awareness gradually. Over time, people may become more interested in your analysis and seek your input. The key is to do the best job possible with the resources available. Even in a feature factory, you can support the team and gradually build credibility. While the situation might not be ideal, you can still make meaningful contributions and potentially influence the organization's approach to product development and analytics. What are your experiences working as a product analyst in a feature factory. Feel free to drop me a DM on LinkedIn: [https://www.linkedin.com/in/timo-dechau/](https://www.linkedin.com/in/timo-dechau/)

How to refactor your tracking design?

Wed, 02 Apr 2025 00:00:00 GMT

Imagine the not-so-unlikely situation of starting a new job as an analyst, where behavioral data analysis (aka marketing & product analytics) is essential. This involves tracking activities on the marketing website and within the product, and numerous questions quickly arise. People want to understand aspects like the onboarding process and the effectiveness of recently released features. As a new employee, you're highly motivated. However, when you begin to investigate the data, you discover it's not in good shape. The documentation is outdated, and when reviewing the analytics data, only one or two events make sense. Many events appear cryptic or duplicated. Given your experience, creating a new structure is not a big thing. You develop a two-week plan that includes: - Conducting interviews - Performing event storming sessions with marketing, product, and sales teams - Mapping customer and user journeys - Creating an event data design to address emerging questions This approach is inspired by a lovely [book on event data design](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). Everything seems promising until you discuss the plan with the CTO during lunch. When you explain that you've created a new design that just needs implementation, the CTO is surprised. With sprints already planned for the next four to five months and limited resources for tracking implementation, your ambitious plan faces significant challenges ("We did a whole sprint with tracking implementation some months ago, that's it what we can do"). The reality is that discussing better tracking design is easy. Talking and sketching ideas on a whiteboard costs nothing. While it's valuable to understand the ideal state, you must also confront practical limitations. You're facing two big problems. First, how do you get the resources to implement anything? Second, what do you do with all the existing events while you transition? Let's focus on how to actually refactor your tracking setup with these constraints in mind. ![](/images/posts/how-to-refactor-your-tracking-design/image.png) Let me show you four different strategies for tracking event refactoring. ## Strategy 1: The Greenfield The Greenfield strategy is quite a rare approach, typically occurring in specific circumstances. It's most common when working with young organizations or startups that already have a temporary setup in place. In this scenario, the organization is aware and comfortable with the understanding that their current system will be replaced. This allows for a complete rebuild from scratch, which involves several radical steps. ![](/images/posts/how-to-refactor-your-tracking-design/image-1.png) First, you'll need to throw away existing systems entirely. This means removing all tracking code, eliminating integrations, and shutting down the current analytics tools you're using. Next, you'll shut down or remove all previously collected data. This can be difficult psychologically for organizations that have been collecting data for any length of time, but it's necessary for a true fresh start. Finally, you'll start entirely anew with proper planning, documentation, and implementation. This gives you the opportunity to avoid all the mistakes and technical debt that accumulated in the previous system. The Greenfield strategy often emerges from a place of desperation, usually after multiple unsuccessful attempts to make the existing system work. It becomes viable in several specific situations. When the previous setup lacks real value, there's little reason to maintain it. If nobody is using the dashboards or the insights aren't driving decisions, the cost of starting over is minimal. If the data is questionable or unusable due to implementation errors or poor design, continuing to build on this foundation only extends the problems. The beauty of Greenfield is the freedom it gives you. No legacy constraints means you can implement current best practices from day one. You can design your event taxonomy properly, set up clear naming conventions, and build documentation that actually makes sense. But it is like a unicorn. ## Strategy 2: New features first When facing reality, we cannot simply build everything from scratch. Most of the time, there is already something in place that cannot be completely discarded. Core reports use existing data with a few events, and people have become accustomed to this information over time. You cannot abruptly remove everything and tell people that their previous metrics were wrong. Such a confrontational approach is unproductive and not worth pursuing. ![](/images/posts/how-to-refactor-your-tracking-design/image-2.png) You might call this the "golden handcuffs" of analytics - you're locked into a system that's deeply flawed, but it's also deeply embedded in your organization's decision-making processes. Breaking free requires finesse, not force. When you can't replace existing systems, there are two core strategies. The first strategy is to create a good new example of how things could look by doing it for new things. When the product team is developing new features, you have an opportunity to implement an improved event data structure. By doing this, you set an example of how good event data design can look. But most people won't immediately recognize the benefits of a better naming convention or clever property usage. The real value of your new event data design will become visible when you create feature reports and analyses. Get involved early in the process, design events, and create feature dashboards before launch. Do an excellent job from event data design through feature reporting, analysis, and recommendations. If you execute this approach well, people will be impressed by the new analytics approach. You'll build trust, and when stakeholders become uncertain about existing reporting, they'll be more open to your proposed changes. The benefit of this method is that you can add new features without adapting existing analytics tools, unless you want to reuse or slightly modify old events. ## Strategy 3: Core events renovation You also can start by focusing on core events as a strategic starting point. If you enter a new setup and notice that existing metrics are not trusted, this presents an opportunity to re-implement key events with best practices. Consider selecting four to five core events and applying rigorous implementation techniques. For example, you might move these events to the server side to immediately improve data quality and eliminate potential browser-related data drops (this is also a strong argument to get resources for this). ![](/images/posts/how-to-refactor-your-tracking-design/image-3.png) The implementation is straightforward, but managing event name changes can be tricky. Different analytics tools handle event naming and changes differently. The old Google Analytics had significant limitations with event renaming. You couldn't just change names without losing historical data, forcing many teams to maintain outdated naming conventions far longer than they should have (some analytics platforms still have this issue). GA4 now allows [event renaming](https://support.google.com/analytics/answer/10085872?hl=en), which is a welcome improvement. This gives you more flexibility to clean up your taxonomy while preserving your historical data. Platforms like Amplitude, Posthog and Mixpanel offer more flexible options for handling event transitions. You can rename the display of existing events without changing the underlying data. You can also merge old and new event names to maintain reporting continuity, which is extremely valuable during a transition. When changing event names, you should first check your analytics tool's capabilities to understand what's possible. Then verify if event name changes or merging are supported in your specific implementation. Finally, ensure that your approach maintains long-term reporting consistency so stakeholders don't lose access to historical trends. This approach allows you to gradually improve your analytics implementation while maintaining historical data integrity. It's like renovating a house room by room while still living in it - more challenging than building from scratch, but often the only practical option. This can also be combined with strategy two. ## Strategy 4: The Data Warehouse When it comes to event data design (or model), I prefer an approach that involves managing most of the event data processing in the data warehouse. With this method, we track only about 5% of events in the front end, sourcing the majority of events from the application database and external services like Stripe. In this setup, you prepare your entire event data model in the data warehouse and then make it available to analytics platforms such as Amplitude, Mixpanel, or PostHog. Alternatively, you can use tools like Mitzu to visualize event data directly from the warehouse without copying data to another system. ![](/images/posts/how-to-refactor-your-tracking-design/image-4.png) The first major benefit is full control over event definition and naming. You can standardize naming conventions across your entire organization, regardless of where the event originated. You also gain complete flexibility to rename or remodel events. Changed your mind about your taxonomy? No problem - just update your data model in the warehouse. Perhaps most powerfully, you can define how long changes are applied retroactively. Want to fix a calculation error that affected the last six months of data? You can do that without losing historical information. When syncing data back to analytics platforms, it's crucial to understand their sync modes. Mixpanel, for example, supports applying edits and deletions to historical data through its Mirror feature. Amplitude currently supports this only for specific data warehouses. PostHog directly queries your data in your warehouse, so any change will be applied immediately. This approach is most suitable when you already have a data warehouse and your data team is already pulling information for BI reporting. It's perfect if you want to consolidate data and unlock better data quality by applying consistent transformation rules. It really shines when you aim to combine events from different platforms into a unified view of customer behavior. I've seen this work particularly well for companies with complex products where user actions span across multiple platforms or services. For instance, a SaaS company might want to analyze how users move between their web app, mobile app, and integration with third-party tools. While not definitively the future of product analytics, this method is increasingly gaining traction due to its flexibility and enhanced analytical capabilities. The days of siloed event data are numbered, and a warehouse-first approach puts you ahead of the curve. Let me know if these strategies would work for you when you refactor your event data setup.

European Analytics — a regional perspective

Tue, 25 Mar 2025 00:00:00 GMT

No worries. This will not become any kind of polarized post. But if you want to go and pick your analytics and any kind of data software providers not from the US and you really want to go for European options, here is a collection of option but also a good reminder what would be missing. Obviously, if you've followed all the Google Analytics alternative discussions, where your data is saved definitely has an impact. The data storage is already something today you need to take into account. Now we can add another dimension: the somewhat unsecure future of how our relationship will develop between the US and Europe. I don't know if you can add tariffs on software-as-a-service products, but maybe you can 🤔 😁. This list is just off the top of my head. So if you know other tools that would fit in here, please let me know. I'm setting up a directory of European data and analytics solutions, so just let me know and I will add it (link at the bottom of the post). Now,what qualifies as a European analytics and data solution? This makes it already a little bit harder because there will definitely be edge cases. The first decision I made is to include UK-based companies. Yes, the UK left the EU, but for me, they're still part of Europe, and in our hearts they still kind of belong to the EU, even when technically they don't anymore. But if you're optimizing for data storage within the EU, you have to take this into account since if the data center is in London, it's not in the EU anymore. [Support my work and subscribe to this newsletter](#/portal/signup/free) Let's start the list. ## Classic Digital Analytics When we talk about digital analytics, we often refer to tools like **Google Analytics 4**. This category encompasses a wide range of analytics solutions, from simple tracking tools to more advanced platforms. However, classic digital analytics solutions are generally **not specialized for highly professional use cases**—instead, they work well for most websites and applications to gather behavioral data. Below, I'll walk you through some of the European alternatives in this space. ### Piwik PRO (PL) ![](/images/posts/european-analytics/image-6.png) The most obvious entry in this category is Piwik PRO. It started as a fork of the original Piwik project, which later evolved into Matomo—but I'll get to that shortly. Unlike Matomo, Piwik PRO is not open source and never was. The company is based in Poland, with additional offices across Europe and beyond. If you choose their European hosting, your data remains within the EU, making it a strong alternative to Google Analytics for businesses concerned with data privacy. For those with experience in Google Analytics Universal, Piwik PRO feels like a natural fit. It shares many similarities with GA while adding enhanced privacy controls. You get more granular configuration options for a privacy-focused analytics setup, along with a built-in tag manager—similar to Google Tag Manager, but seamlessly integrated within the platform. A major advantage of Piwik PRO is its consent management feature, which works smoothly with its tag manager and analytics tools. This makes it significantly easier to manage consent settings compared to the often cumbersome setup required with Google Tag Manager. If you're looking for a no-brainer alternative to Google Analytics 4, Piwik PRO is an excellent choice. ### Plausible Analytics (EE) ![](/images/posts/european-analytics/image-7.png) Plausible Analytics is another strong contender, though it has fewer features than Piwik PRO. It's a great choice for those who only need core metrics to understand their website's performance. Unlike Piwik PRO, Plausible does not include a tag manager or consent management, so you'll need to handle those separately. However, it offers a unique advantage: you can configure it to avoid tracking individual users altogether. For example, if metrics like returning users aren't critical to you, Plausible lets you run an analytics setup that collects anonymous, privacy-friendly data. Developed by two indie developers in Europe, Plausible is a lean and well-designed solution. It's also affordable, making it a great choice if you want a privacy-friendly alternative without unnecessary complexity. ### Simple Analytics (NL) ![](/images/posts/european-analytics/image-8.png) Simple Analytics falls into a similar category as Plausible. It doesn't have the extensive features of Piwik PRO or Google Analytics, but for many use cases, it's 100% sufficient. Before implementing an analytics solution, it's always worth asking: What features do I actually need? If your requirements are straightforward, Simple Analytics can be a perfect fit. This tool is developed by an indie creator in the Netherlands, making it another European-based solution. If you value supporting small, independent teams, this is a great product to consider. ### Matomo (NZ) ![](/images/posts/european-analytics/image-9.png) Matomo originated as Piwik before evolving into its current form. Today, it's mostly known through Matomo Cloud, the managed service operated by InnoCraft, a company based in New Zealand. However, Matomo's founder, Matthieu, started the project in France, and the tool retains strong European roots. What sets Matomo apart? It's still open source – You can self-host it on your own servers if needed. It's the go-to choice for privacy-conscious organizations – If you need an analytics tool that minimizes privacy discussions and concerns, Matomo is a safe bet. It can be configured to collect almost no personal data, making it one of the most privacy-friendly solutions available. There can be a nice adoption of the IBM quote: _"No one gets fired for choosing Matomo when privacy concerns are on the table."_ For organizations prioritizing compliance and data protection, Matomo is a solid addition to this list. Classic digital analytics tools provide behavioral insights for websites and applications, but not all solutions are created equal. - If you need a GA4 alternative with a built-in tag manager and consent management, go for Piwik PRO. - If you want simpler analytics with a strong privacy focus, check out Plausible or Simple Analytics. - If privacy compliance is your biggest concern, Matomo is your best bet. Ultimately, the best tool depends on your specific needs—so before choosing a solution, ask yourself: How much complexity do I really need? ## Product Analytics While classic digital analytics often includes marketing analytics, product analytics is a different category of tools altogether. Product analytics requires a specific event schema, which tools like Piwik PRO and Matomo don't provide. It also needs specialized reports that go deep into user behavior, such as funnel analytics, cohort analytics, and advanced segmentation. These are the hallmarks of product analytics, and you typically find them in tools like Amplitude or Mixpanel. But what are the European alternatives to these US-based solutions? ### PostHog (~UK) ![](/images/posts/european-analytics/image-10.png) PostHog is a bit of a tricky case. Technically, it's a US-based company—likely due to funding reasons—but it originally started in the UK. The founding team was based there, and PostHog still has a strong developer presence in Europe. So, while it's not a purely European solution, it at least comes with an asterisk. PostHog has a strong product analytics offering and has expanded beyond that. It now also includes classic web analytics features and data warehouse integrations, allowing you to bring event data from your warehouse into PostHog for analysis. PostHog still has an open-source core, so you can check out the software yourself. However, most users rely on PostHog's managed service, which can be hosted on European servers. If you're looking for the closest alternative to Mixpanel or Amplitude, PostHog is the way to go. ### Mitzu (HU) ![](/images/posts/european-analytics/image-11.png) Mitzu is similar to PostHog in some ways but also quite different. Mitzu is a Hungarian software company and a relatively young player in the space. I've used it in several projects, and what makes it stand out—especially compared to Amplitude and Mixpanel—is that Mitzu does not have its own SDKs. Instead, Mitzu works directly on your data warehouse. This means you don't send events directly to Mitzu. Instead, you store event data in your warehouse first and then use Mitzu on top of it—similar to how a BI tool works. Why is this powerful? Once connected to your event data table, it enables funnel reporting, cohort analysis, segmentation, and more. Because Mitzu sits on top of your data warehouse, you have much more control over your data. You can prepare, clean, and transform event data before sending it to Mitzu, and you can refactor data easily—something that's a nightmare in traditional product analytics tools. In SDK-based tools like Amplitude and Mixpanel, once an event is tracked incorrectly, it's permanently stored. You can't easily rename properties or modify event structures without complex workarounds. With Mitzu, you can adjust your data model, run the new model, refresh Mitzu—and you're done. This flexibility is a major advantage. That said, Mitzu is still young and not as feature-complete as PostHog. It doesn't yet have all the report types you'd find in a mature tool. But if you already store event data in your warehouse, Mitzu is a strong option to consider. [Support my work and subscribe to this newsletter](#/portal/signup/free) ## Tag Managers ### Client-Side Tag Managers For client-side solutions, we can revisit some of the tools already mentioned in the classic analytics category. ### Piwik PRO Tag Manager (PL) Piwik PRO includes a Tag Manager that is feature-wise quite close to Google Tag Manager. It works in a similar way, making it easy for those transitioning from GTM to quickly adapt. One important point is that you don't have to use Piwik PRO as your analytics tool. You can use Google Analytics or any other system with it. The main difference compared to GTM is how consent mode is handled. It's not built-in the same way, so you'll need to configure a custom solution. However, there are guides available to help with this. ### Matomo Tag Manager (NZ) Matomo also offers a Tag Manager, which functions similarly to Google Tag Manager. If you're looking for a self-hosted alternative to GTM, Matomo's Tag Manager is a solid option. ### Server-Side Tag Managers Server-side Tag Management has become quite popular, though in many cases, it's overhyped. While there are some good use cases, I still believe that for most companies, it's oversold. ### European Server-Side Tag Management There are some European companies offering Google Tag Manager hosting, such as Stape. However, these don't really count as full-fledged European alternatives, since they still rely on Google's solution. The only European server-side tag management solution I am aware of is ### Jentis (AT) ![](/images/posts/european-analytics/image-3.png) Jentis has developed a unique approach to the omnipresent sGTM. They follow a different technological approach by mirroring the client-side journeys on the server-side. Their approach to using a pool of IDs (synthetic users) to remove personal identification from ad identifiers, such as the GCLID, while still retaining campaign information, is very interesting. I hope that at some point, a strong open-source project will emerge in Europe to provide a real alternative—one that allows companies to deploy server-side Tag Management in a way similar to Google Tag Manager. For now, though, a managed European alternative doesn't exist. ## Event data pipelines (CDI) Event data pipelines, potentially not CDPs. This category is challenging to define because categorizing Segment and Rudderstack as CDPs (as they self-identify) would expand this to an unwieldy category deserving its own post. Therefore, I'm focusing specifically on their event pipeline functionality rather than their CDP aspects. The core functionality of Segment and Rudderstack involves collecting events from diverse sources including front end, server side, and webhooks. They process this data—transforming it as needed—then proxy it to various destinations, typically marketing platforms. This remains a common and widely implemented use case. ### Snowplow (Uk) ![](/images/posts/european-analytics/image-12.png) The European alternative in this space is Snowplow, which effectively covers similar functionality but with greater strength in data collection. Having existed for quite some time, Snowplow offers arguably the best trackers for various platforms and an excellent event data model for collection purposes. Recent developments include Snowbridge, which enables forwarding data to other destinations—a capability that wasn't previously core to their offering. This expands Snowplow from primarily event data collection to supporting real-time data distribution when needed. A significant limitation is that Snowplow is no longer open source. They also discontinued their cloud product, which briefly offered smaller businesses self-service event data pipeline functionality similar to Segment and Rudderstack. Currently, Snowplow operates as an enterprise solution, leaving a gap in alternatives. While some open source packages offer comparable features, no clear dominant solution has emerged to fill this space. ## Business Intelligence The BI market is still largely dominated by Tableau and Power BI for various reasons. But what are the European alternatives? Let's take a look. ### Lightdash (UK) ![](/images/posts/european-analytics/image-13.png) Lightdash is an open-source and managed BI tool that integrates seamlessly with dbt. If you already have a well-structured dbt workflow, adding Lightdash can create a highly analytics-engineering-friendly setup. Lightdash stands out because you can define metrics in dbt, and they automatically populate in Lightdash. Everything can be run via the CLI, and they now support dashboards as code, making it even more developer-friendly. This makes Lightdash a great BI tool for teams that follow modern data engineering practices. ### Count (UK) ![](/images/posts/european-analytics/image-14.png) Count was one of the first BI tools to introduce a free-form, whiteboard-like canvas for data visualization and modeling. What makes Count unique is that you work on a blank canvas where you can perform calculations, create modeling steps, and build essential business visualizations in the same place. This is important because you can work visually on your data models (like directly manipulating a lineage graph). It's fantastic for metric trees—offering flexibility in designing and presenting metrics. If you like working visually, Count is an excellent option. ### Steep (SE) ![](/images/posts/european-analytics/image-15.png) Steep is a newer-generation BI tool that follows a metric-first approach. I haven't tested it yet, but it's high on my list because metrics are first-class citizens—you define them first, and then build BI reports around them. This approach could make metric management and reporting much easier. From everything I've seen and heard, Steep looks extremely promising as a European BI alternative. ### Supersimple (EE) ![](/images/posts/european-analytics/image-16.png) Super Simple is another one of the newer BI tools. Besides classic BI functionality, what sets them apart is their strong metric definition capability. You basically have a semantic layer within the tool. And I would say they really emphasize AI assistance to enable business users to get to specific insights much quicker. ## Cloud Data Warehouses European cloud data warehouse options present an interesting challenge as there is no obvious option. When discussing cloud data warehouses, I'm referring to products like Google's BigQuery, Snowflake, Databricks, and Azure Synapse—analytical databases that leverage cloud capabilities to scale beyond on-premise or single-node compute solutions. No clear European alternative exists in this space. I researched European cloud providers to investigate their data offerings, as they initially focused primarily on compute services similar to AWS's early days. There are some promising developments: Scaleway, a prominent French cloud provider, has data warehouses on their roadmap, though still in the discovery phase. OVH Cloud has a public roadmap commitment to investigate data warehouse solutions, though details remain limited. Exasol, a German company with an established data warehouse product, has been around for some time. I worked on a project using Exasol several years ago, but we migrated away due to prohibitive costs. This situation may have changed, but Exasol still appears positioned as an enterprise product without self-service access options. When extending the European definition to include Israel, Firebolt emerges as a compelling option comparable to BigQuery or Snowflake in capabilities. For organizations with modest data requirements, deploying a PostgreSQL instance on European cloud providers remains a good option. Despite my tendency to default to BigQuery even for small projects due to its straightforward setup process, PostgreSQL would suffice for many smaller implementations. One disadvantage of PostgreSQL is the continuous instance operation cost versus consumption and storage-based pricing models of Snowflake or BigQuery. Another potential avenue involves leveraging Iceberg with the S3-compatible storage offered by most European cloud providers. This approach would allow storing CSV or Parquet files and potentially establishing an Iceberg instance on top, creating a data warehouse alternative—but honestly I don't know enough about this kind of setup to give a good judgement here. For smaller data setups, DuckDB has become central to data setup experimentation in recent years. While typically a single-node solution, some practitioners have deployed DuckDB in cloud environments through creative implementations on compute functions or Lambda services. Mother Duck offers cloud capabilities but, being American, falls outside our European focus. The complexity of these alternatives indicates the lack of a straightforward European cloud data warehouse solution. So, I am really hoping that this is something that will come to the European cloud platforms sooner than later. Please let me know about other alternatives I may have overlooked. ## Data Integration and Orchestration Data integration and orchestration—I've just bundled these two together here. While they could technically fall into separate categories, they're often so closely linked that it makes sense to have them side by side. In the data integration and orchestration space, there are dominant U.S.-based players like Fivetran or Airbyte, and orchestrators like Airflow/Astronomer. You also have newer options like Dagster or Prefect as well. But what about European alternatives? ### Keboola (CZ) ![](/images/posts/european-analytics/image-17.png) One alternative I frequently use is Keboola, which comes from the Czech Republic. Keboola effectively merges data integration and orchestration capabilities. They offer various connectors to easily load your data. Once loaded, you can transform the data right within Keboola and then push it back out to other platforms. It’s a solid European alternative that I like for their simplicity but still open to integrate custom solutions. ### Kestra (FR) ![](/images/posts/european-analytics/image-18.png) In the category of emerging new orchestrators, there is Kestra, founded in France, which got a lot of traction and attention in the last year. Similar to Airflow and Dagster it is also open-source. It's special flavour is the YAML first approach. Pipelines are consequently defined as YAML files, which makes it potentially more approachable for people with less development backgrounds. ### Weld (DK) ![](/images/posts/european-analytics/image-19.png) Then there's Weld, based right next to me in Copenhagen, Denmark. Weld has a similar setup like Keboola combining intregration, transformation and orchestration. In comparision, I would say they focus more simplicity and have fewer options to enhance it with special configurations. On the other hand they integrate more significantly AI-assistant features. ### Funnel (SE) ![](/images/posts/european-analytics/image-20.png) A classic in this category is Funnel.io. Funnel.io has been around for quite a while, initially focusing solely on marketing data integration. Over time, they've expanded significantly to offer what is essentially a comprehensive marketing analytics platform. You can still integrate numerous marketing sources, carry out transformations directly within the platform, and then push your data into various destinations for reporting and visualization. ### Supermetrics (FI) ![](/images/posts/european-analytics/image-21.png) Another classic one in this category. First of all the love child of every technical marketer that wanted a good solution to bring Google ads data into Google Sheets. Today it offers integration, a custom storage and destinations to load the data to. Each of these European solutions brings something unique to the table, providing strong alternatives to their American counterparts. ## Data Transformation Data transformation? I mean, data transformation can be a vast field with countless approaches. Here, though, I’m focusing on the kind I work with most—SQL transformation. In this space, dbt is the dominant player, and I’m convinced it will stay on top for a long time thanks to its massive distribution and strong brand. Even with newcomers like SQL Mesh entering the field, it seems unlikely they’ll dislodge dbt unless someone brings something entirely different to the table. At the moment, I just don’t see that happening. They’re open source, yet their companies are firmly rooted in the US market. As far as I’m aware, there isn’t a direct alternative coming out of Europe. Sure, there’s Dataform—a UK-based company that was acquired by Google and is now part of GCP—but beyond that, a distinctly European option seems to be missing. ## Final thoughts Okay, so this was my attempt to come up with a list of European analytics alternatives. As already said in the beginning, there's no intention to recommend anything to you. There's no intention to introduce a morality category here to say we definitely have to go all-in on European solutions. I think there are different aspects. First of all, it's always good to know about different kinds of alternatives. Just because maybe something smaller, maybe something with a different kind of angle just fits better to your current setup. The second thing is something that I learned in Denmark. I grew up in Germany where "made in Germany" was something quite present in my childhood. You could always say, "okay, when it's made in Germany, it must be really high quality." This changed over time just because these standards changed a little bit. When we moved to Denmark, I was introduced to something slightly different. In Denmark, people emphasize buying locally. Locally means, the first step is you really buy something from the region where you live. The next step is you buy something Danish nationally. And the step afterwards is you buy something from Europe. In Denmark they make it very visible. You will often see from which region a Danish product actually originates. You will always see a Danish flag everywhere that indicates this comes from Denmark. And now even one of the big grocery store chains introduced a new label to show which products are coming from the EU so that consumers can make a decision based on that. I really started to appreciate this approach when we moved here. When you support your local economy, you support the place you live. Supporting European data vendors will help them hire more people that will live here, that will pay taxes. In the end, you strengthen your ecosystem and your environment in a small degree. Again, take the list with a grain of salt. Finally, I had this idea to create the post and then I thought, okay, I will definitely miss out on plenty of other solutions. I have no intention to blow up this post and make it unreadable, so I will keep the post as it is. I will set up a new directory called "European Analytics." I know there are other directories for European alternatives which are broader (eg. [https://european-alternatives.eu](https://european-alternatives.eu)). But this one is really just about data and analytics use cases. When you have a tool and you're headquartered in Europe, just let me know so I can add it. I might send you a survey to fill out some information. In the end, we can have a directory where we can check what's up in the European analytics and data space.

The business questions scale, but your analytics setup doesn't

Wed, 19 Mar 2025 00:00:00 GMT

When implementing data analytics, there's this tricky inflection point that often catches everyone by surprise. It's that moment when your initial dashboards finally deliver value, people start using them regularly, and then - inevitably - the follow-up questions start rolling in. ## How Data Projects Usually Get Started Let's say we start in a very simplified way. First step would be when you have some experience you establish a metric system or build a metric tree. This will cover 10 to 20 core metrics that are important for business. With this foundation, you take care of data integration. This depends a little bit on where the data is coming from. Some are straightforward - like ad platforms data. And then there's the complicated data integration - any kind of weird systems you have to pull data from. With data integration in place, you develop your data model. It doesn't really matter what kind of approach you take, but at least you take some data modeling approach. In the end, you'll have data ready for analysis that you can hook up with your analytics or BI tool to present the first insights and metrics in dashboards and reports. ![](/images/posts/the-business-questions-scale-but-your-analytics-setup-doesnt/image-1.png) ## The Crucial Moment And now comes the crucial part. If you've done this right, the stakeholders that work with the dashboards will start to deep dive into it. They'll use it in their weekly meetings. You've basically handed over a data answer to their important questions. This is the make-or-break moment. If everything works out, the first set of insights will generate a lot of follow-up questions. The tricky thing with these follow-up questions is they become much more complex - and they become complex easily. Because asking questions is easy to do. So in the end, we show "okay here is the core conversion funnel" and then someone else says "hmm that's interesting. Can we maybe break down this conversion funnel by x, or y or z?" And the x,y and z will be super plausible. It could be simply "hey maybe new and returning users" or "users that have been in touch with this kind of campaign" or "can we see this for users that have been in touch with our customer service?" For every business person, this sounds like the next logical step. And it is! I would ask the same kind of question if I were in this position. ![](/images/posts/the-business-questions-scale-but-your-analytics-setup-doesnt/image-2.png) ## When Things Get Complicated The problem is, from a data perspective, this can cause extreme headaches. Some breakdowns might be straightforward. Like in the example I used before - breaking down by new versus returning customers might be quite close to your current implementation. But everything else not. When they ask for the conversion funnel broken down by customer service interaction, we're leaving the realm of just e-commerce data. Now we have to combine it with our behavioral data from the website. And then they'll ask: "Why doesn't this data match up with what we see on the website?" And you'll have to explain the tracking differences. Every follow-up question opens up new complexity. And I think this is where data projects start to get derailed. It's scary because you're seeing first adoption. The reason people are asking these follow-up questions is that they get the value of the whole setup. You have first product-market fit with your data stack! The problem is the next version of your product becomes much more complex. ![](/images/posts/the-business-questions-scale-but-your-analytics-setup-doesnt/image-3.png) ## How Do You Handle These Situations? I don't really have an answer that works 100% because I haven't spent enough time in this area in the last years. But I see two main approaches: ### Turn Questions into Business Initiatives Let's assume implementing a request would take four to six weeks. That's a significant investment. When we treat it like an investment decision, this means when the question is raised, it needs at least a 30-minute or one-hour session with the stakeholder to investigate the purpose and feasibility. First, let's look at purpose. We just achieved data product-market fit - people are asking questions. We don't want to shut them down immediately by saying "great question, we can't answer, next please." But saying yes and taking six weeks is basically doing the same thing in a passive-aggressive way. We need to be proactive. When discussing purpose, we need to understand what these people are planning to do with the data. It sounds complicated, but it's not. We say, "We need to understand more about what you're planning to do with this dataset. Does this question come from an observation that makes you believe, for example, that customers keep buying because our customer service is great?" ![](/images/posts/the-business-questions-scale-but-your-analytics-setup-doesnt/image-4.png) Maybe their business case is understanding the return on investment for customer service because "our customer service is great, we train people right, we invest time and money, and we want to see indicators that it's paying off." If you get this context, the whole situation changes. It becomes an important business case - investigating if a significant investment makes sense for the company. In that case, a four-week extension of the model is totally reasonable. If their answer is "no, it was just an idea, would be interesting to see the impact of customer service" - that's valid but doesn't motivate going deeper. But I wouldn't end there. I'd ask: "Let's say we figure out people who've been in touch with customer service have a lower CLV, they come back less frequently, buy less. What would you do with that?" The important thing is to see if there's a clear path leading from potential insights to actionable items that could impact the business. If the business team says, "First, we want to understand if this is the case. Second, we want to make a business case. We want to see how many people come to customer service and don't come back - that's lost revenue. And maybe there's an even higher group who don't even reach out to customer service and just don't come back. We have a problem somewhere, and we want to use this data as a first step to investigate" - then it becomes a business initiative. This approach prevents us from just producing a dataset that takes time, makes the data model more complex, but then nothing is done with it. When you turn research into a business initiative, you work hand-in-hand to solve a potential business problem. It's also a great filter mechanism. Many complicated data setups happen because ideas weren't well thought out or contained thinking mistakes. When two or three people think about it together, you might realize the question doesn't change much. Having this open dialogue can channel complex questions in a different way. ### Broaden Your Data Modeling Approach The second option is broadening your data modeling approach. Different applications require different approaches, and you can mix them. There's no one truth that rules them all. Let me share a specific example. I worked with a company that went through the stage I described. They had covered all the core metrics and insights, and then people started asking different questions - a lot about sequences. "What happened before revenue happened?" They reached out because these questions were increasing, and they understood their current data model couldn't easily answer them. They saw my work on activity schema and thought it might be an option. ![](/images/posts/the-business-questions-scale-but-your-analytics-setup-doesnt/image-5.png) We extended their existing data model - we didn't invent something new, just added a new layer where we created an event stream. With this, they unlocked a new class of questions they could answer about what happened before certain events. They served a new version of their data product to customers who could now get new insights. Event data modeling is great for answering questions about sequences. You make revenue somewhere and want to know what happened before or after. The "what happened before" can be extremely different - marketing touchpoints (classic attribution question), specific discovery behavior - the sky's the limit. In my current project, I'm combining both approaches. When I saw questions becoming more complex, I got involved again to understand the motivation. We had built the data model in different ways with an event data model in place, but now I'm creating a new layer on top because the questions call for it. I'm investing about two weeks to extend the model so we can answer this class of questions going forward - not just one question but different similar ones. ## Two Key Takeaways 1- Turn data questions into business initiatives when possible. Even when it sounds bold, I've seen great things start this way. 2- Think about your data model like a product. A product doesn't stay the same all the time - you extend features and possibilities. Apply this thinking to your data model as well.

Metric Trees for Digital Analysts

Mon, 24 Feb 2025 00:00:00 GMT

Content in digital analytics tends to be pretty predictable. [Juliana](https://julianajackson.substack.com) pointed this out a while ago - most content in our field focuses on implementation, rarely touching on actual data analysis or how to work with data meaningfully. I see similar patterns. ![](/images/posts/metric-trees-for-digital-analysts/Screenshot-2025-02-20-at-11.38.33.png) *Juliana's presentation from 2022* If you look at what people publish or check out talks at typical conferences, it's heavily weighted toward implementation topics and technical hacks. These days, you'll see tons of content about working with GA4 data in BigQuery - it's the hot new thing, but again, just another implementation topic. On the other end of the spectrum, you get these philosophical, high-level wisdom "let me tell you how the world works" kind of talks of industry veterans. ![](/images/posts/metric-trees-for-digital-analysts/image-12.png) *The big gap in the middle* But there's this huge gap in the middle: how do we actually apply analytics in practice? I understand why this gap exists. It's challenging to write about real analytics work for two main reasons. First, when you're doing actual analytical work for your company or clients, you usually can't just share it openly. Sure, you can do those high-level celebration case studies (though we all know real projects never work out that neatly). But you can't really dig in and say "here's the specific problem my client faced, here's how we dug through different datasets, here's what we found" - you don't want to put your company or client in an awkward position. ![](/images/posts/metric-trees-for-digital-analysts/image-13.png) The second reason, I think, is that it's just genuinely difficult. Implementation is essentially an engineering problem - predictable and solvable with enough debugging time. Once you figure it out, you can share the solution and others can apply it. But applying data to business outcomes is significantly harder, unless you're approaching it from that high-level philosophical view where people can interpret it like a horoscope and take what they want from it. I don't have an easy answer for this problem. However, I've found something interesting in the broader data space - where data engineers and analytics engineers hang out - that might help bridge this gap. Something that could help us bring a better business perspective to our work. We need to talk about **metrics**. ## Why metric matters. Really matters I have a complicated history with metrics. They've always made sense to me - kind of like how breathing air makes sense, but not something I spent much time thinking about. I used to try to ignore them, or at least avoid them. When clients would ask "what metrics should we track?", I'd give them the standard list of e-commerce metrics. But I was never really comfortable with that approach. [Support my work and subscribe to this newsletter](#/portal/signup/free) I was good at critiquing specific metrics. If someone said "we want to optimize our bounce rate", I could easily explain why that wouldn't really help them - bounce rate is a tricky metric. What I was missing, though I didn't realize it at the time, was a system for thinking about metrics. A metric by itself is pretty meaningless, even when you wrap them in a container and call them KPIs (key performance indicators). Having a single KPI didn't really make sense to me. I wasn't particularly inspired by the concept of North Star metrics either, for two main reasons. First, people interpret North Star metrics very differently, and most interpretations, in my opinion, don't make much sense. When someone suggests a revenue-related North Star metric, I struggle with it. Sure, if you're running a for-profit business, some of your lagging or output metrics will involve revenue. But saying "profit margins are our North Star metric" is like stating the obvious - we need air to breathe. ![](/images/posts/metric-trees-for-digital-analysts/image-15.png) There are better resources about North Star metrics out there. John Cutler created an excellent ebook with Amplitude explaining North Star metrics in a way that makes sense to me. But it was still a single-purpose framework - useful, but not enough to warm me up to metrics in general. ![](/images/posts/metric-trees-for-digital-analysts/image-16.png) *You can get the playbook with just one email.* So why do metrics actually matter? In analytics, we often struggle to translate the output of an analytical system into meaningful business impact. A conversion rate in a funnel comes close - we can say "if we improve this conversion rate by 10%, we potentially make 10% more revenue." But beyond that, it gets tricky. This is where metrics can build a powerful bridge between analytical insights and actionable business outcomes. Why does this work? I'm going to try to explain in this post. I'll also introduce a concept that helped me establish a much better relationship with metrics: **the metric tree.** ## How can we find love for metrics again? I started to rediscover my appreciation for metrics when I came across Abhi's [work with metric trees](https://www.leverslabs.com/article/introducing-metric-trees). This wasn't actually a new concept - I'd learned about it in university under a different name: the DuPont schema. I loved it back then, which makes it interesting to reflect on why I forgot about it (honestly, no idea - I guess, you simply forget things). ![](/images/posts/metric-trees-for-digital-analysts/image-17.png) *There is even a Wikipedia page about it.* When I started working deeply in analytics and data, I lost sight of these tools because I was so focused on the technical engineering side. But seeing Abhi's work and talking with him brought metrics back into focus. I became curious to test this approach in my current work - could it solve the problems I'd had with metrics? Turns out, it did. Why did it work where other approaches had failed? I think it's because metric trees add an essential ingredient that, when missing, makes metrics pretty useless: relationships between metrics. As I mentioned earlier, a metric standing alone often struggles to articulate its place and meaning. Metrics become powerful when they relate to each other. This is what I loved about the DuPont schema in business school - you could see how essential company metrics related to each other. One branch explained how costs worked, another showed how revenue developed. This helped you understand both how to improve revenue and how to increase cost efficiency. These relationships were what was missing in most metric setups I'd worked with in analytics. The closest thing to a metric tree I'd used in digital analytics was the classic funnel. A funnel is essentially a snapshot of a metric tree - a specific view that I'll showcase later. What makes a funnel powerful is the relationship between its metrics. You have an input volume (people starting the funnel) and percentages showing how many people you lose at each step. These metrics are interconnected - if you do better at moving people from step one to step two, you have more opportunities to move people from step two to step three. When you improve the first conversion rate, you might see an impact on the funnel's final output, often tied to revenue. ![](/images/posts/metric-trees-for-digital-analysts/image-18.png) This system finally made sense to me. Rediscovering these relationships between metrics was the essential ingredient I needed to find my love for metrics again. So if you're reading this and you've lost some love for metrics too, maybe I can help you rediscover it. Let's continue. ![](/images/posts/metric-trees-for-digital-analysts/image-30.png) ## The power of metric trees What changes when you work with a metric tree? It doesn't make all your problems disappear - you might even discover new ones. But after running metric tree workshops for over a year and incorporating them into product analytics setups, I've noticed some significant benefits. Metric trees immediately solve one crucial problem: they make data setups accessible to all business teams. Metrics become a universal language, especially when expressed in a tree format. Everyone can see how your business or product is structured - what building blocks you're using to show how your business actually works. This resonates with all different business teams. ![](/images/posts/metric-trees-for-digital-analysts/image-19.png) I've run metric tree workshops with groups of 15+ people from various departments. Everyone grasped the concept and could contribute meaningful feedback. For the first time in my experience discussing data topics with different business teams, we had genuinely constructive discussions. I watched sales, marketing, and product teams align on specific business aspects. The metrics made it transparent, showing how it all fits together. Everyone could contribute thoughtfully about metric definitions and which metrics made sense. It's both a powerful communication tool and mapping tool. For data and analytics teams, it's a way to create a map of how they think the business works. You can show this to different business teams and get their input on needed adjustments. When you get requests for insights, you can pull out the metric tree as a map and ask, "Where does this fit? What part of the tree will this analysis impact?" This immediately provides context for the analysis. For example, if we want to analyze the conversion rate between two steps, we can instantly see how it influences downstream metrics and how far it is from direct revenue impact. This helps us suggest better analysis candidates, saying things like, "That looks interesting, but what if we looked at this area instead? We're seeing a much bigger drop-off here." ![](/images/posts/metric-trees-for-digital-analysts/image-20.png) The metric tree is remarkably effective as a communication tool, even without numbers attached. I've seen its impact in workshops before we'd calculated anything. Of course, it becomes even more powerful when you can visualize the complete tree with actual numbers. There are various ways to implement this. You can create it manually in any whiteboard tool, or use specialized tools like [DoubleLoop](https://doubleloop.app) or [Count](https://count.co) for metric tree visualization. [Lightdash](https://www.lightdash.com/blogpost/metric-trees-how-top-data-teams-impact-growth) now supports this too. More tools are moving toward providing blank canvas capabilities where you can easily build out metric tree visualizations. While the metric tree serves as an optimization tool and works great for root cause analyses, its primary value for me is as a communication bridge between analytics and business teams, ensuring we're all talking about the same things. [Don't miss the next post about Metric Trees - sign up for the free Newsletter](#/portal/signup/free) ## What is a metric tree A metric tree is technically an upside-down tree. It typically has one output metric at the top - usually your main business metric, either profit-related or revenue-related if you're focusing just on the revenue side. From there, it branches downward. The tree can branch out extensively or remain quite lean. If you're just getting started with metric trees, I recommend keeping them simple. It's easy to make them more complex later, but the real magic happens when you can bring a complex tree back to a simpler version. ![](/images/posts/metric-trees-for-digital-analysts/image-21.png) *A complex one* ![](/images/posts/metric-trees-for-digital-analysts/image-22.png) *And a simpler version of the same one* There are two types of metric trees. The first is what I call a deterministic metric tree, which is what I usually use. In this version, you can represent the entire tree in one equation - all the metrics add up, making the whole tree one calculation. > Total video clicks = Total videos click rate \* ((New YouTube Videos \* Avg. new video views) + (Existing YouTube videos \* Avg. existing video views)) The second type shows relationships that indicate how metrics influence each other but without precise formulas explaining these influences (more a probabilistic model). This version makes sense in certain scenarios. For example, when examining how product behavior impacts subscriptions, it's hard to create an exact equation. You might know that using a specific feature affects whether someone starts a subscription, but quantifying that impact is difficult. ![](/images/posts/metric-trees-for-digital-analysts/image-23.png) While there are models to get numeric insights, they all have limitations. That's why I recommend starting with deterministic trees - they're easier to test and build since you can verify if the equations make sense. However, remember you can always express relationships in a metric tree without equations. You can simply indicate that one metric might influence another, even if you don't yet know the exact relationship. Let me share two design aspects I often use in metric trees. First, they can have different levels of detail. You might start with a high-level tree representing your business, then zoom in on specific areas like customer acquisition with more detail. You can stack metric trees on top of each other, but remember - more stacking means more complexity. ![](/images/posts/metric-trees-for-digital-analysts/image-24.png) *My sub tree for book checkout* In my design process, I often create an extensive model first. But the next step is usually simplifying it because you have to work with it practically. A tree with 300 metrics means you need systems to calculate, visualize, and act on all 300 metrics. While possible, this requires a sophisticated organizational setup. That's why I recommend simplifying when possible. Metric trees can also include dimensions, which work like filters. While I wouldn't recommend this for beginners, it's possible to include dimensions like campaign sources to filter your tree based on where users come from. This is straightforward in design but can be challenging in practice - some metrics might have certain dimensions while others don't, leading to attribution problems that are often difficult to solve. Until you can represent your tree with actual numbers, I'd suggest leaving out these design options. But it's good to know they're possible. ![](/images/posts/metric-trees-for-digital-analysts/image-25.png) *Dimension breakdown for new blog post views* This gives us a good foundation to look at some examples. I've chosen three use cases you might encounter as a digital analyst, to show how you can apply this in your own projects. ## **Examples** B2B Software as a service ![](/images/posts/metric-trees-for-digital-analysts/image-26.png) *This is already quite an extensive model* High-level e-commerce tree with focus on customer segments ![](/images/posts/metric-trees-for-digital-analysts/image-27.png) B2B marketing website - very simple to Lead conversion tree: ![](/images/posts/metric-trees-for-digital-analysts/image-28.png) [Support me and sign up for the free newsletter](#/portal/signup/free) ## How to build your own? Getting started with metric trees is simple - you don't need special approval or tools. All you need is a whiteboard or piece of paper. You can start sketching right away. Want a more structured approach? Start with what you already know. Take the examples from this blog post and design something similar for your business case. Remember two key principles: make sure it creates an equation (all steps should add up), and resist the temptation to add too many elements. Once you have a first draft, meet with different teams. Talk to marketing about the acquisition flow - how do we get new accounts? What do we invest in? What matters most to them? Use their input to create the acquisition branch. Repeat this process with other business areas until you have a high-level tree that explains how your online business generates revenue (or leads, or whatever your digital business aims for). Then you can dive deeper. If the product team wants to understand their place in this picture, work with them to construct their branch. One warning though: creating metric trees for product teams isn't easy. I'll write a future post about this approach because product metrics can be extremely noisy. When working with feature-level metrics, you need to find better abstractions for your tree. The key to success is talking with others about your metric tree. Since metrics are something everyone understands, it's a great format for discussion. Remember, you can start with basic tools - paper, whiteboard, or software like Miro or Excalidraw. You don't need specialized software yet. That might come later if you want to visualize numbers, but for design, you already have everything you need. Once you have your first metric tree, you can use it to understand growth levers. Some areas of the tree have more potential for impact than others. Interestingly, the lower you go in the tree, the more operational and directly influenceable the metrics become. Top-level output metrics like revenue can't be influenced directly, but as you break things down, you find more actionable areas. Take a subscription business, for example. Starting with revenue at the top, you might branch down to new subscriptions, then to the acquisition process. Eventually, you reach metrics marketing can directly influence, like visits from paid ads. These highly influential metrics can be affected quickly by tactical changes. ![](/images/posts/metric-trees-for-digital-analysts/image-29.png) Look for metrics that can be influenced directly and have meaningful downstream impact. For instance, marketing might be great at generating ad visits, but the demo conversion rate might be low. If you know from your metric tree that people who watch demos have a good chance of becoming subscribers, there's an opportunity. Improving demo conversion for ad traffic could significantly impact the business. Marketing might have already thought of this, but your metric tree can visualize the potential impact: "If we increase ad traffic conversion by 10%, we could see a 3-4% increase in new subscriptions, significantly impacting revenue." The metric tree makes it easier to find areas for focus and have data-driven conversations. It's a great first use case - you can have productive conversations with marketing about improvements, provide data support, help implement initiatives, and track impact over time. I hope this has gotten you excited about metrics again. Even a simple metric tree can transform how you approach analytics projects and communicate with the business about their goals and how analytics can support them. If you have questions, let me know in the comments, send me an email, or reach out - I'm happy to help! ![](/images/posts/metric-trees-for-digital-analysts/Copy-of-Check-out-the-Book.png) *Don't forget to get my book.*

Product Evolution via Core Features — Miro Analytics

Tue, 04 Feb 2025 00:00:00 GMT

Imagine spending months meticulously tracking every feature of your product, collecting data on every user interaction, only to realize you're missing something crucial: how your core product is transforming. When a product grows by adding new features - like Asana adding goals or Slack adding huddles - measuring success is relatively straightforward. But what happens when your product evolves by transforming its core offering? This is where traditional analytics often falls short. Using Miro as our example, where a simple whiteboard has the potential to become an innovation workspace, we'll explore the fascinating challenge of measuring not just feature adoption, but fundamental product transformation. In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the 7th one: **Miro**. ## The Evolution Challenge: Why Measuring Product Transformation Is Different Than Feature Adoption ### The Hidden Complexity of Core Feature Evolution When we talk about product evolution, the story usually goes like this: you start with core features, then add new ones over time. Think Asana adding goals and workflows, or Slack introducing huddles. These are easy to track - you measure adoption, usage patterns, and impact. Simple enough, right? But sometimes, products take a different path. Instead of adding new features, they transform their core. This is Miro's journey - taking their fundamental concept of a "board" and expanding what it means. As I explained in the video: "What Miro did, or what Miro's doing right now with their product direction, is they're taking their core entity, the board, and making it more powerful. They extended the capabilities of the board significantly over time." This creates fascinating challenges: - You're not just tracking adoption of something new - The same feature needs to support both simple and complex use cases - Users might transition gradually without clear "moments" to track - because they don't recognize the transition - Success looks different for different types of users Think about it like this: When Asana adds a new goals feature, it's clear when someone starts using it. But how do you measure when a Miro board transforms from a simple whiteboard into a true innovation workspace? When does a board cross that threshold from brainstorming tool to becoming the central hub for a team's entire project? This is the hidden complexity of core feature evolution - measuring not just what users do, but how their entire way of working with your product transforms over time. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/measuring-product-evolution-through-core-features-miro-analytics-setup/Check-out-the-Book.png) ### The Limits of Traditional Feature Analytics Traditional analytics loves clean, countable moments. A feature gets launched, users try it, some adopt it - metrics go up or down. But when you're transforming a core feature like Miro's board, these clean moments disappear. You can't just count how many people "activated" the innovation workspace feature because it's not a switch that gets flipped. "I think we still have to find what actually defines this," as I mentioned in the video. "We don't know yet which kind of patterns we can put into activities. This is where I want to get to." The usual analytics approaches fall short in several ways: - Click tracking becomes meaningless (who cares if someone clicked a button when you're trying to measure workspace transformation?) - Feature adoption metrics don't capture evolution of use - Simple counts (like number of assets on a board) create too much noise without signal Here's a real example: Let's say we track that a team added documents, tasks, and comments to their Miro board. Traditional analytics would just show us increased feature usage. But what we really want to know is: Has this board become their primary workspace? Are they actually collaborating here instead of jumping between multiple tools? Even more challenging: success isn't always about doing more. A team effectively using Miro as an innovation workspace might actually have fewer individual interactions because they're working more efficiently. Traditional metrics might flag this as decreased engagement when it's actually a sign of success. This is why we need to move beyond traditional feature analytics. When your core feature is transforming, you need analytics that can capture not just what users do, but how their entire workflow is evolving. ### Finding Signals in Transformation When you can't measure transformation directly, you need to look for signals that indicate it's happening. This isn't about finding one perfect metric - it's about combining multiple signals that together tell the story of how a board is evolving from a simple whiteboard to an innovation workspace. "We have to track a lot of different properties, play around with the analyses to see if they indicate and give us a signal that something is moving into an innovation workspace," as I explained in the video. "This is what I mean by experiments." Here's what these signals might look like: Time and Engagement - Implementing heartbeat events to measure true time spent on boards - Looking for sustained engagement rather than just brief interactions - Tracking patterns of return visits and session length Collaboration Patterns - Number of active contributors on a board - Mix of different activity types (documents, tasks, comments) - Cross-team participation and interaction Quality of Use - Types of assets being created and used - Evidence of structured work (like task management) - Signs of long-term project organization The key is combining these quantitative signals with qualitative insights from your product team. When they conduct user interviews or testing sessions, these become gold mines for identifying new patterns to track. Listen for how teams describe their evolving use of the product - these descriptions often hint at measurable signals you can track. Remember: We're not looking for perfect measurements, but for strong indicators that together tell us how usage is transforming. Sometimes the best signals come from unexpected places, which is why we need to stay flexible and experimental in our approach. ## From Simple Board to Innovation Hub: Building Analytics That Grow With Your Product ### Choosing Your Heartbeat Entity: Why the Board Beats the Asset When building analytics for the product transformation, your first crucial decision is to be aware of your heartbeat entity. In Miro's case, we face an interesting choice: should it be the asset (like sticky notes, shapes, or documents) or the board itself? "If we would take this into account for Miro, it would be the asset because we are constantly adding assets along the way. But the asset, I think, is too weak to be the heartbeat," as I explained in the video. "The real value for me just comes when someone shares a board or presents a board." Think about it like this: **Value Generation:** - Assets alone don't tell us if someone got value - A board being shared or presented shows clear value delivery - Success comes from collaboration, not just creation **Scale Consideration:** - Assets are too granular - you might have hundreds per board - Boards represent complete units of work - Easier to connect boards to business outcomes Most importantly, the board is where transformation happens. A single sticky note doesn't evolve into an innovation workspace - but a board can grow from simple brainstorming to becoming a team's central collaboration hub. ![](/images/posts/measuring-product-evolution-through-core-features-miro-analytics-setup/image-11.png) This is why your heartbeat entity choice should align with where value actually happens in your product. In Miro's case, the board isn't just a container for assets - it's the space where collaboration and innovation come to life. ### Building Your Property Foundation Once you've established your heartbeat entity, the next step is designing properties that can capture transformation without creating analytical chaos. The key is finding the right balance between detail and usability. "Don't do properties with too many variations. These are usually not really useful," I emphasized in the video. "You should always have properties without so many variations. This is important." Properties are great for spreading out new sensors for gathering data that could potentially unlock new insights to understand some signals of the product transformation. Here's how to approach it: **Smart Property Design:** - Use clustering instead of raw numbers (e.g., "50-100 assets" vs exact counts) - Create clear categories that won't explode with variations - Focus on properties that indicate workspace evolution For a Miro board, **key properties for the transformation** might include: - Number of active contributors - Types of advanced assets being used (documents, tasks, etc.) - Collaboration patterns (like cross-team involvement) - Activity density (how many edits in recent periods) The trick is avoiding the temptation to track everything. Instead of creating a property for every possible board characteristic, focus on those that actually indicate transformation. For instance, knowing a board has exactly 247 assets isn't as valuable as knowing it has a healthy mix of documents, tasks, and collaborative elements. Remember: Properties should make analysis easier, not harder. If you find yourself struggling to make sense of a property in your analysis, it might be too complex or not actually meaningful for measuring transformation. ### Smart Extension Through Experimentation When measuring product transformation, you need a systematic way to extend your analytics that maintains clarity while discovering new insights. This is where smart experimentation with tracking comes in. "It's not really like a structured experiment," I explained in the video. "It's more like you come up with specific properties to derive specific proxy metrics. Then you see if they stick and work, if they actually create something meaningful." The heartbeat event is a perfect example of smart extension: - Send an event every 10-30 seconds while a board is active - Track duration through a simple property (heartbeat\_in\_s) - Calculate a simple metric: avg. board view time - Keep the implementation light but the insights powerful - Use it to understand true engagement patterns Your experimentation approach should follow these principles: Start Broad, Then Focus: - Begin with wider property tracking to discover patterns (don't do this for entities and activities) - Keep what proves meaningful, remove what doesn't - Let real usage guide your analytics evolution Combine Multiple Signals: - Mix heartbeat data with activity patterns - Look at both individual and team behaviors - Connect quantitative data with qualitative insights from user interviews The goal isn't to track everything perfectly from day one. Instead, you're building a flexible system that can evolve as you learn more about how users are transforming their use of your product. Some experiments will fail - that's fine. The key is maintaining a clear structure while remaining open to discovering new signals of transformation. You can check out the complete design on the Miro Board: ![](/images/posts/measuring-product-evolution-through-core-features-miro-analytics-setup/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/measuring-product-evolution-through-core-features-miro-analytics-setup/Check-out-the-Repository---Notion.png) ## Beyond Metrics: Creating Analytics That Support Product Vision ### Balancing Discovery and Definition When measuring product transformation, you need to balance two distinct phases: initial discovery of what matters, and defining clear metrics once you understand the patterns. This isn't a linear process - it's an evolution that requires patience and flexibility. "We need to combine quantitative data and qualitative data," I emphasized in the video. "To really understand how a use case or jobs-to-be-done works, I always come up with clear metrics. You can always start out with exploring, but at some point, you have to come up with clear metrics." **The Discovery Phase:** - Implement broader tracking through properties (not entities or activities) - Gather data without strict definitions - Stay open to unexpected patterns - Work closely with product teams to understand user behavior **The Definition Phase:** - Move from noise to clear signals - Define concrete success metrics - Remove or refine properties that weren't useful - Create focused measurements of transformation The key is knowing when to transition between these phases. Start broad enough to catch important signals, but don't stay in discovery mode forever. Once you see clear patterns emerging - like specific ways teams use boards as innovation workspaces - it's time to define more concrete metrics. Remember: Good analytics start with curiosity but end with clarity (after that you can move to curiosity again). Your goal is to move from "what might indicate transformation?" to "these are our clear signals of success." ### Bridging Analytics and Product Strategy Analytics isn't just about tracking what happens - it's about supporting your product's strategic evolution. For Miro, this means helping product teams understand if their vision of the innovation workspace is becoming reality. "Therefore, we need to track if there are actually some indications from the event data that people pick this up," I noted in the video. "This is why I talk so much about proxies, how we can actually measure this." A successful bridge between analytics and strategy requires: **Working With Product Teams:** - Listen to user interviews for patterns - Understand the qualitative signs of transformation - Identify behaviors that could be measured quantitatively - Create metrics that reflect strategic goals **Looking for Strategic Signals:** - Track transition from basic whiteboard to workspace hub - Measure depth of team collaboration - Monitor evolution of board usage patterns - Identify signs of broader workspace adoption Your analytics should tell product teams not just what users are doing, but whether the product is evolving as intended. For instance, increased view time combined with diverse asset types might indicate a board has become a true workspace rather than just a whiteboard. The goal isn't to prove your strategy is working, but to provide clear signals about how users are actually adopting your product's evolution. Sometimes these signals will support your plan, sometimes they'll suggest new directions - both are valuable insights. ### Maintaining Clarity as You Scale The hardest part of measuring product transformation isn't setting up the initial tracking - it's maintaining clarity as your analytics grow and evolve. You need a system that can scale without becoming overwhelming. "You have to clean up properties over time," I explained in the video. "If we feel that some properties are not really useful in analysis, we have to get rid of them or redo them." Keep your analytics clear with these principles: Regular Maintenance: - Review and remove unused properties - Consolidate similar metrics - Update property structures as needs change - Keep documentation current Smart Property Management: - Use clustering instead of endless variations - Keep property values manageable - Focus on meaningful breakdowns - Avoid redundant tracking Most importantly, remember that not every product change needs new tracking. When Miro adds a new export format or asset type, that might just be a new property value rather than a whole new event to track. Think of it like gardening - regular pruning keeps things healthy and manageable. Don't be afraid to remove tracking that isn't providing value. It's better to have fewer, clearer metrics that everyone understands than a complex system that nobody uses effectively. The goal is sustainable analytics that grow with your product. By maintaining clarity and focus, you ensure your analytics continue to provide valuable insights about your product's transformation, even as the product itself evolves. Measuring product transformation isn't about tracking more - it's about tracking smarter. When your core feature evolves, like Miro's board expanding from simple whiteboard to innovation workspace, you need analytics that can grow and adapt with your product. By focusing on meaningful signals, building flexible but maintainable tracking systems, and connecting analytics to product vision, you create insights that truly matter. The goal isn't to track every possible interaction, but to understand how your product is transforming in the hands of your users. After all, the most powerful analytics aren't the ones with the most events or properties - they're the ones that help you understand and support your product's evolution. This was part 8 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - [Slack - 28.01.25](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) - [Superhuman](https://hipster-data-show.ghost.io/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/) - 29.01.25 - [Vimcal](https://hipster-data-show.ghost.io/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/) - 30.01.25 - [Asana](https://hipster-data-show.ghost.io/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/) - 31.01.25 - [Canva](https://hipster-data-show.ghost.io/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/) - 03.02.25 - [Loom](https://hipster-data-show.ghost.io/measuring-ai-productivity-features-analytics-setup-for-loom/) - 04.02.25 - Grammarly - 06.02.25 - Replit - 07.02.25 - Hubspot - 10.02.25 - Stripe - 11.02.25 - Zoom - 12.02.25 - Ghost - 13.02.25 - Amplitude - 17.02.25 - GSheets - 18.02.25 - Lightdash - 19.02.25 - Claude - 20.02.25 - Reconfigured - 21.02.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/measuring-product-evolution-through-core-features-miro-analytics-setup/Check-out-the-Book.png)

Measure your product strategy in analytics setups

Mon, 03 Feb 2025 00:00:00 GMT

## An interesting discovery I decided to take on an interesting challenge: creating one tracking plan every day for a week, with plans to continue for three more weeks. The goal was to create 20 tracking plans for common startup tools. I chose these tools because people are generally familiar with them, and they cover different product directions. This means if your product overlaps with any of these tools (like Notion), you can find useful inspiration. One of my biggest worries was that it would get monotonous quickly. I didn't do extensive prep work - while I had a list of tools I wanted to cover, I didn't map out specific approaches for each one like "for Notion I'll do X, for Slack I'll do Y." Instead, my process was simpler: each morning I'd set up my recording, open the tool's website, and think through the core use cases I wanted to focus on. Then I'd create a first draft using Claude.ai and my book as context, and iterate from there. I chose this approach because I wanted to give real insight into my thought process and design decisions when building analytics setups. Sure, I could show you hundreds of examples of initial analytics setups - which events to track, how to structure properties - but that would miss the interesting part: the decision-making process. We're talking about design here, not math. There's rarely a clear right or wrong answer; it's about balancing tradeoffs and making thoughtful choices. After recording the first two (Notion and Slack), I worried I might end up repeating myself. With both tools, I focused heavily on core features. This worked well for Notion since it has so many different directions - focusing on core features made it approachable. I took a similar approach with Slack. But something interesting happened when I got to Superhuman. Being just an email tool, its core functionality is quite focused. So I started thinking differently: what if we built analytics around Superhuman's original core promise of achieving "inbox zero"? While Superhuman has evolved since then, using this historical angle created an interesting perspective. This led me to develop a two-bucket approach to analytics setups: 1. Core Functionality: Cover how people actually use the product 2. Strategic Direction: Track metrics that reflect the product's strategic goals and unique value proposition ![](/images/posts/measure-your-product-strategy-a-new-learning-for-analytics-setups/image-9.png) If product strategy is crucial (which it is), it should be reflected in your data. You need the right metrics to show whether your strategy is working. I continued evolving this approach through my final video of the week on Asana. I examined their current positioning beyond just task/project management and built an analytics setup that covered both core functionality and strategic direction. This was a valuable discovery for me. While I'd sometimes done this implicitly before, making it explicit improved my approach significantly. I plan to keep using this method and wanted to share it because I think it could be valuable for others working on analytics setups too. ## Core is great, but ... But what do we really mean by "core features"? Let me use Notion as an example, since it's what inspired this whole journey. At first glance, Notion seems simple - just a page where you write stuff. But dig deeper and you'll find it's a massive tool. You can add databases, tables, charts, link pages together to create content hubs, even build entire websites. The possibilities are endless. If you tried to track everything, you'd end up with an overwhelming tracking plan - maybe 60-80 events spread across 10-15 different entities. And when everything sits side by side with equal weight, it becomes hard to see what really matters. One of my biggest breakthroughs in working with event data was developing proper clustering. That's why I introduced the concept of product layer, customer layer, and interaction layer as the double three-layer framework. ![](/images/posts/measure-your-product-strategy-a-new-learning-for-analytics-setups/image-15.png) This hierarchy isn't just organizational - it guides you on how to use each layer. It makes the whole system more approachable and practical. When you're faced with something like Notion's 20 different entities all competing for attention, it becomes really hard to manage. My approach? Take a few steps back and look at the bigger picture. What are the absolute essentials? For Notion, it boils down to just three things: - Workspaces - Documents within workspaces - Accounts These are the fundamental building blocks. And here's the key insight: **the document is Notion's heartbeat**. As long as people are creating and expanding documents, the product is healthy and being used. If you had to pick just one thing to track in Notion, that would be it. ![](/images/posts/measure-your-product-strategy-a-new-learning-for-analytics-setups/image-10.png) This concept of finding your product's "heartbeat" - which emerged over the week - is really powerful. Every product has one: - Notion: documents - Slack: messages - Superhuman: emails - Vim Core: events - Asana: tasks As long as these keep flowing, your product is alive and well. Sure, you can (and should) look at variations and patterns for deeper insights. You can add different angles and metrics. But when you're starting out with analytics, especially if you're feeling overwhelmed, start with identifying your product's heartbeat. Get that foundation solid, then layer your product strategy on top of it. ## The product strategy Once you've defined the core, you can explore product strategy. Let's take Superhuman as an example. Their core foundation is simple: emails, accounts, and email accounts. If you just want to track basic usage, that's all you need. But the interesting part is understanding if you're delivering your unique value proposition. Superhuman did something bold - they launched a subscription-based email client. When they started, this was practically unheard of. They were essentially saying, "We add so much value to your workday that you'll happily pay $30 monthly." That's a strong claim that requires delivering real value. Their early strategy (which has evolved since) was laser-focused: "We'll build the best email client for achieving inbox zero." Everything was optimized around this goal. I actually went through their onboarding process back then - it was mandatory and personal, which I usually hate, but it made sense because they needed to demonstrate their unique approach to how to get to inbox zero fast. They built specific features to support this strategy: - Heavy emphasis on keyboard shortcuts (perfect for power users like me) - Reimagined email triage - instead of the traditional three-column layout, they simplified it to just a list view for quick decision-making - Smart features like email snoozing - letting you temporarily remove emails from your inbox without losing track of them When designing the analytics setup for Superhuman, I made sure to track these strategic features. Beyond basic email activities, I added specific tracking for things like "inbox zero achieved" - a single activity that measures strategic success. This lets you analyze fascinating patterns: - How many inbox zero streaks do users maintain? - How quickly do they reach inbox zero each day? - How often do they achieve it? I applied this same strategic thinking to other products too. Vimcal was trickier - they position themselves as "the calendar for people with lots of meetings." I translated this into analytics by focusing on efficiency features like their booking link system, which eliminates manual calendar entry overhead. For Asana, beyond basic task management, they emphasize connecting work to goals and automating workflows. So I built these strategic elements - goals and workflows - directly into the analytics structure to ensure we're measuring what truly matters for their strategy. ## How to include it in your analytics setup So how can you apply this approach to your own analytics? There are two key steps. First, simplify by identifying your product's core essentials: - Find your product's heartbeat - that one critical entity that, if removed, would break everything. Not just damage the product, but fundamentally break it - Identify the adjacent entities - usually things like accounts/users, or structural elements like workspaces that contain and organize your heartbeat entity - Focus on the truly fundamental building blocks of your product I use two simple tests for validating core entities: 1. If you remove it, does the product still function? 2. Does it have enough meaningful activities associated with it? Strategic entities often fail the first test. Take Asana's goals feature - remove it and Asana still works as a task management tool. Or look at search in Superhuman - it only has one activity, but it's strategically crucial because of their inbox zero approach. When you're not organizing emails into folders, search becomes essential. Once you've nailed down your core, take the second step: sit down with your product and product management teams. Have them articulate what makes your product different from competitors in your category. These conversations typically reveal one or two major strategic directions that you should build into your analytics setup. ## Looking at the tracking plans for the last week If you want to see this approach in action, I've put together a comprehensive set of resources from last week's work. I created five tracking plans, each with its own video and accompanying blog post. The blog posts aren't just written versions of the videos - they go deeper, exploring four key themes that came up in each video. I wanted to use the written format to really dig into concepts that deserved more attention. For those interested in the technical details: - Check out the Miro Board for complete tracking plans for all tools - Visit the GitHub repository to see JSON schemas for each tracking setup ### Notion - core, core, core If you want to see a perfect example of focusing on core entities, check out the Notion post. The video and blog post walk through creating a lean, focused tracking setup with just the essential events. It's a great reference for how to build a solid analytics foundation without overcomplicating things. [Building Notion’s Analytics Foundation: A Product-First Tracking Plan](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) — In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the first one for the omnipresent Notion tool. You can apply the principles in this ### Slack - core and no notifications The Slack episode continues the core-focused approach, but it raises an interesting question about what qualifies as "core." You can watch my thought process as I wrestle with whether to include notifications in the tracking plan. Notifications are a fascinating case study. When Slack launched, they were arguably a strategic differentiator. While their strategic importance may have diminished over time, they were once a key part of Slack's value proposition. Despite this, I ultimately decided to exclude them from the core tracking plan. Check out the video to follow my back-and-forth reasoning and see why I ultimately left notifications out of the core feature set. [Making Smart Tradeoffs in Analytics: A Slack Tracking Plan Journey](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) — In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the second one: Slack. Here is the season overview: Every click, every interaction, every moment ### Superhuman - the early strategy The Superhuman episode marks a turning point - where I start going beyond core features. This shift happened partly because Superhuman's core functionality is actually quite simple and, honestly, a bit boring on its own. Watch this episode to see two things: 1. How to build an extremely lean core tracking structure 2. How I expand beyond that to incorporate Superhuman's "Inbox Zero" strategy into the analytics It's a great example of how to create metrics that align with and support your product's strategic goals. [Build a tracking plan around one core feature - Inbox Zero in Superhuman](https://hipster-data-show.ghost.io/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/) — In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the second one: Superhuman. Here is the season overview: Imagine spending months meticulously tracking every ### Vimcal - the tricky strategy Vimcal presents an interesting parallel to Superhuman. Like Superhuman, its core functionality is relatively straightforward. But translating its strategic value into analytics was more challenging. With Superhuman, we had "inbox zero" - an established productivity concept they could build around. But calendar management doesn't have such a clear philosophy. In fact, conventional wisdom often suggests that having too many meetings is a problem to solve. Vimcal took the opposite approach. They specifically target people who have lots of meetings and need help managing them effectively. But you can't just track "number of meetings" as a success metric - that would be counterproductive and miss the real value. The key insight (at least from my perspective - Vimcal might see it differently) was focusing on reducing meeting management overhead. When you have lots of meetings, efficiency becomes crucial. This became the foundation for my analytics setup - measuring how effectively users can manage their busy meeting schedules. [Create a tracking plan for a tool that you want to use as short as possible - Vimcal tracking plan](https://hipster-data-show.ghost.io/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/) — You’ve probably been there: staring at your analytics dashboard, drowning in a sea of click events, page views, and user actions, yet somehow still unable to answer the simple question “are people actually using our product?” It’s a common trap in product analytics - tracking everything but understanding nothing. This ### Asana - Going beyond core With Asana, I really leaned into blending core and strategic entities. After checking their website, I identified goals and workflows as key strategic elements. I approached this in two ways: 1. Made goals and workflows full entities in the tracking plan 2. Added strategic properties across other entities to measure their impact For example, I built in ways to track workflow adoption across projects and organizations. In the video, I discuss practical applications of this data - like calculating an "automation score" for teams and projects. You could measure what percentage of tasks come from workflows versus manual creation, then give teams actionable feedback: "Your automation score is 20%. Here's how you could get to 50%." This kind of metric helps teams both improve efficiency and maintain quality. It turned out to be one of the most interesting episodes, showing how strategic metrics can create practical value for users. [Combining your Product Strategy with your Analytics Implementation - Tracking Plan for Asana](https://hipster-data-show.ghost.io/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/) — Analytics can feel deceptively simple when you start with a product like Asana. Track some tasks, measure completion rates, count active users - done, right? But as your product evolves beyond its core features, your analytics need to evolve too. The real challenge isn’t tracking basic metrics; it’s building an

Measuring AI productivity features - Analytics setup for Loom

Mon, 03 Feb 2025 00:00:00 GMT

Imagine spending months perfecting your product's analytics, meticulously tracking every user action, only to have AI features completely upend your measurement strategy. That's the challenge many product teams face today. When Loom added AI-generated titles, summaries, and chapters to their video platform, they didn't just add new features - they fundamentally changed how users interact with their product. How do you measure the success of actions that no longer happen? How do you track productivity gains from tasks users never needed to perform? Traditional analytics approaches fall short when measuring AI features, but there's a way forward. Using Loom's journey as our guide, let's explore how to adapt our analytics strategy for the age of AI-enhanced products. This post is part of our series exploring analytics implementations for different types of products. While previous posts focused on measuring core promises and strategic features, today we tackle a unique challenge: how to track features that often leave no trace in our traditional analytics systems. Our journey through Loom's implementation will reveal practical approaches for measuring AI's impact on your product. In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the 7th one: **Loom**. Here is the season overview: ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/a-Tracking-Plan-a-Day---loom.png) ## The AI Analytics Paradox ### The Disappearing Action Problem When building analytics for traditional features, we track what users do - clicks, submissions, completions. But AI features often work by eliminating actions rather than adding them. This creates a fascinating paradox: how do you measure the success of something not happening? Take Loom's auto-title feature. Previously, users would: - Finish recording their video - Think about an appropriate title - Type it in manually - Maybe revise it a few times Now with AI, the title appears automatically. As I noted in our discussion: "The tricky thing is that AI features are often just an extension of an existing feature... Before, you had to type in the headline yourself. This is a micro action. It's not really something significant happening here." This pattern repeats across AI features: • Auto-generated chapters replace manual timestamp creation • Smart summaries eliminate note-taking • Suggested responses reduce typing time The challenge isn't just technical - it's conceptual. Traditional analytics might track "title edited" as a sign of engagement. But with AI features, fewer edits could actually signal success. We need to flip our thinking from measuring actions to measuring outcomes. This shifts our focus to new types of metrics: - Acceptance rates of AI suggestions - Time saved between recording and sharing - Usage patterns of AI-generated content The key is understanding that less interaction can mean more value. When measuring AI features, we're often tracking what didn't need to happen rather than what did. This requires a fundamental rethink of how we approach product analytics. This disappearing action pattern will only become more common as AI features proliferate across products. The analytics challenge isn't just adapting our tools - it's adapting our entire mental model of what constitutes successful product usage. ### The Context Shift Challenge When AI features take over tasks, they don't just eliminate actions - they fundamentally shift where value happens in the product. Instead of focusing on users' interactions, we need to measure what they receive. Consider Loom's AI-generated summaries. The value isn't in the generation process (a single click) but in the quality and usefulness of the summary itself. This creates a new analytics challenge: how do we measure quality at scale? Traditional metrics fall short here: • Click events don't capture content quality • Usage counts miss the effectiveness dimension • Time-based metrics might misinterpret quick AI generations as less valuable "From an event perspective, it's pretty boring. It's just like 'message sent.' in case of chats. There's not really so much happening because the interesting stuff is happening in the context. And I think this makes it so tricky with tracking AI features." To address this shift, we need new measurement approaches: - Content classification systems - Quality indicators through user behaviors - Downstream usage metrics For example, with Loom's summaries we might track: - Whether users share videos with AI summaries more often - If viewers spend more time watching videos with AI summaries - How often summaries are used in other tools (like documents or tickets) The key insight is that AI features force us to think beyond the "when" and "how" of user actions to measure the "what" and "why" of generated content. This requires expanding our analytics toolkit to include content analysis alongside traditional event tracking. This context shift isn't just a technical challenge - it's about understanding that value now often lies in the output rather than the process. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/Check-out-the-Book.png) ### The Productivity Measurement Gap Measuring productivity gains from AI features presents a unique challenge: we can't measure what didn't happen. When a Loom user avoids three meetings by sending quick video explanations instead, how do we quantify that time saved? "I think there's one use case where it's pretty obvious. When a client has to set up something for you... I can just send off the links saying 'Hey, I need these kinds of permissions from you. Here are five Loom videos.' This saves me significant time." The challenge comes in two forms: - Direct time savings (fewer meetings, faster content creation) - Expanded capabilities (doing more with the same time) Traditional metrics struggle here because: • We can't track meetings that never happened • Time saved varies by user and context • Productivity gains often compound over time Instead of direct measurement, we need to look for proxy indicators: - Video reuse rates (same content serving multiple purposes) - Viewer engagement across multiple watches - Time between recording and sharing with AI features vs without One concrete approach is focusing on outcomes rather than time saved: - Number of people reached per video - Engagement rates with AI-enhanced content - Subscription conversion rates for AI features The key is accepting that we can't perfectly measure productivity gains, but we can track signals that indicate whether users are getting more value from the same effort. ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/image-7.png) ## The "Make It Visible First" Approach ### Starting with Basic Monitoring When faced with new AI features, the temptation is to track everything. But with AI's complexity, sometimes the best approach is starting simple and learning as you go. "When we don't know how to measure or how to work with it, then at least monitoring is the best solution to tackle it." For Loom's AI features, this means starting with a single high-level event: ``` AI_productivity_used type: 'auto_title' | 'summary' | 'chapters' status: 'accepted' | 'modified' | 'rejected' ``` ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/image-8.png) This basic structure gives us: - Which AI features users try - How often they're used - Initial acceptance rates - to be fair, you need to figure out how easy this is to figure out. If something is changed, it is easy, but if it is rejected, aka. replaced with something different is harder. Keep the implementation lightweight: • One event type instead of multiple • Essential properties only • Focus on usage patterns rather than detailed interactions The goal isn't perfect measurement yet - it's understanding patterns. Are users gravitating toward certain AI features? Do they consistently accept AI suggestions? Which features see repeated use versus one-time trials? This approach allows us to: - Gather initial usage data quickly - Identify which features deserve deeper tracking - Adjust our analytics strategy based on real usage patterns Think of it as laying a foundation. We can always add more sophisticated tracking later, but first we need to understand the basics of how users interact with AI features. ### Properties Over Events Properties Over Events When tracking AI features, properties provide flexibility that standalone events can't match. Instead of creating new events for every AI interaction, we can use properties to add rich context to existing product events. In Loom's case, when a recording is shared, we can attach AI-related properties: ``` recording_shared ai_title_used: 'original' | 'edited' | 'not_used' ai_summary_used: boolean ai_chapters_used: boolean ``` This approach offers several advantages: - Keeps the event model clean and manageable - Makes it easier to analyze AI feature impact on core actions - Allows for flexible evolution as AI features change "Properties usually are not so expensive to implement... this is the first good test ground. And I might remove properties again because I really have the feeling that they don't provide value." We can extend properties to capture more nuanced information: - Quality indicators for AI-generated content - User modification patterns - Content classification results The key is starting with basic properties and expanding based on actual analysis needs. For instance, if we notice users frequently modifying AI-generated titles, we might add properties to track modification patterns. Remember: It's easier to add or remove properties than to restructure your entire event model. This flexibility is particularly valuable with rapidly evolving AI features. ### Evolving Based on Patterns Analytics for AI features should evolve based on actual usage patterns, not assumptions. The initial monitoring phase reveals which features deserve deeper tracking and possibly their own entity status. "I would implement this AI productivity use because there I would monitor if there is a significant uptake in usage of documents and issues... Then I can just count and see if a lot more people use this." Key signals to watch for: - High adoption rates among power users - Consistent usage patterns over time - Strong correlation with subscription retention - Feature becoming central to user workflows For example, if Loom's document generation feature shows high engagement: ``` AI_productivity_used type: 'document' -> frequency: increasing -> user_segment: 'power_users' ``` This might signal it's time to promote 'document' to its own entity with dedicated tracking. Evolution triggers include: - Feature usage exceeding certain thresholds - Clear patterns in user segments - Strong impact on core metrics - Emerging new workflows Remember: The goal isn't to track everything, but to identify and deeply measure what matters. Let usage patterns guide your analytics evolution rather than trying to predict what's important upfront. You can check out the complete design on the Miro Board: ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/Check-out-the-Repository---Notion.png) ## Finding Signals in the Noise ### The Delta Approach When measuring AI feature impact, the time between events can tell us more than the events themselves. For Loom, we can analyze the delta between recording completion and sharing to understand if AI features speed up the workflow. "We could use the delta between both to try to measure if there's an improvement when AI features are used. It's a bit tricky because sharing maybe doesn't happen automatically." Key metrics to track: - Time from recording to sharing - Duration between title generation and acceptance - Speed of content reuse Statistical considerations: • Remove outliers (like overnight gaps) • Account for natural usage patterns • Compare similar content types The delta approach helps identify: - Workflow acceleration patterns - AI feature adoption impact - Productivity improvements Make sure to focus on meaningful time gaps - not every delta matters. The key is identifying which time spans actually indicate improved workflows versus normal usage variation. ### Quality Through Usage Patterns Instead of measuring direct quality metrics for AI-generated content, we can look at how the content gets used over time. Usage patterns often reveal more about value than traditional engagement metrics. "We can calculate how many shares of a video are happening. So in the end, we could have a property, let's say 'recording\_shares,' which could give us an indicator of a heavily shared recording." Key usage indicators: - Multiple shares of the same content - Repeat viewers for videos - Content reuse across different contexts - Addition to libraries/collections What to track: ``` recording_shared share_count: number unique_viewers: number reuse_instances: number saved_to_library: boolean ``` These patterns help identify high-value content: - Videos frequently shared with new team members - Documentation content reused across projects - Recordings that drive subscription conversions Creating segments based on usage: • High-reuse content creators • Viral internal content • Training material generators This approach focuses on actual value delivery rather than just feature usage. When users consistently reuse and share AI-enhanced content, it signals genuine utility rather than novelty adoption. ## Beyond Click Tracking: Rethinking Analytics Layers ### Merging Interaction and Product Layers Traditional analytics separates user interface actions (clicks, views) from product activities (creating, sharing). But AI features are forcing us to rethink this separation as the boundaries between interaction and product layers blur. "Since AI is happening a lot in context and is producing this context automatically... there's not so much really happening. So in the end, it could be that with AI features, the interaction and the product layer kind of merge together." The traditional layers: - Interaction Layer: Click events, UI engagement - Product Layer: Core product activities - Customer Layer: Journey and success metrics With AI features, this structure shifts: • Less UI interaction to track • More automated product activities • Blurred lines between user and system actions Take Loom's auto-title feature: ``` Traditional Flow: click → type → edit → save AI Flow: system generates → user accepts/modifies ``` This merging means we need to: - Focus more on outcomes than actions - Track system-initiated activities - Use smarter properties instead of click events The key insight is that with AI features, the "how" becomes less important than the "what." Instead of tracking interface interactions, we focus on measuring value delivery - regardless of how that value was created. ### Tracking Value Journeys When AI reduces user interactions, we need to shift focus from tracking actions to mapping value paths. This means understanding how users progress from initial need to achieved value, regardless of the specific steps taken. "We should focus on how the value journey looks like. What does someone have to input to achieve specific kind of value, and then really figure out how we can discover these inputs and track them." Key value journey components for Loom: - Initial content creation - AI enhancement steps - Content distribution - Audience engagement - Content reuse patterns Instead of tracking individual actions, focus on value states: ``` Value States: 1. Content Created 2. AI Enhanced 3. Initially Shared 4. Actively Reused 5. Team Adopted ``` This approach helps identify: - Common paths to value - Adoption patterns - Success indicators - Areas for optimization The focus shifts from "what did the user click?" to "did they achieve their goal?" This might mean: • Fewer events tracked • Richer context properties • More emphasis on outcome metrics Remember: The value journey doesn't always follow a linear path. With AI features, users might skip steps or achieve value in unexpected ways. Your analytics should be flexible enough to capture these varying paths to success. Building analytics for AI features requires us to rethink our fundamental approaches to measurement. As we've seen through Loom's example, success isn't just about tracking what users do - it's about understanding value delivered when AI removes the need for action entirely. Whether you're starting with basic monitoring today or evolving your analytics strategy for future AI features, remember: the goal isn't to track everything possible, but to measure what matters. Focus on finding meaningful signals of value delivery, even when those signals look very different from traditional product metrics. This was part 7 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - [Slack - 28.01.25](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) - [Superhuman](https://hipster-data-show.ghost.io/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/) - 29.01.25 - [Vimcal](https://hipster-data-show.ghost.io/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/) - 30.01.25 - [Asana](https://hipster-data-show.ghost.io/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/) - 31.01.25 - [Canva](https://hipster-data-show.ghost.io/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/) - 03.02.25 - Miro - 05.02.25 - Grammarly - 06.02.25 - Replit - 07.02.25 - Hubspot - 10.02.25 - Stripe - 11.02.25 - Zoom - 12.02.25 - Ghost - 13.02.25 - Amplitude - 17.02.25 - GSheets - 18.02.25 - Lightdash - 19.02.25 - Claude - 20.02.25 - Reconfigured - 21.02.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/measuring-ai-productivity-features-analytics-setup-for-loom/Check-out-the-Book.png)

Analytics for Creative Tools — Canva Tracking Plan

Sun, 02 Feb 2025 00:00:00 GMT

Creating analytics for creative tools like Canva seems like a paradox at first. How do you measure success in a process that's inherently iterative, personal, and often non-linear? When every color change, text adjustment, or element placement could be tracked, how do you avoid drowning in data while still capturing what truly matters? As someone who has implemented analytics for various products, I faced this exact challenge when approaching Canva's tracking plan. The solution wasn't in tracking more - it was in zooming out to find clarity. This post will show you how to build analytics that capture meaningful insights for creative tools, using Canva as our example. We'll explore how to identify true success moments, align tracking with strategic goals, and structure your analytics in a way that brings order to creative chaos. In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the 6th one: **Canva**. Here is the season overview: ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/a-Tracking-Plan-a-Day---canva.png) ## How to Find Success Moments in Creative Processes: Moving Beyond Click Tracking to Meaningful Outcomes ### Why Traditional Click Tracking Fails Creative Tools When I first approached creating a tracking plan for Canva, I had the same hesitation many analysts face: How do you measure success in a tool where every creative decision could be tracked? The temptation to track every color change, text edit, and element placement is strong - but it's a path that leads to analytical chaos. Think about a typical design process in Canva. A user might: - Add and remove multiple text elements - Try different colors and fonts - Move elements around dozens of times - Switch between multiple templates - Make countless minor adjustments The problem isn't just the volume of data. It's that granular tracking of creative actions tells us very little about whether users are actually successful with the tool. A user who makes fifty adjustments to their design isn't necessarily more successful than one who makes five - they might just be struggling to achieve their desired outcome. This is where traditional click tracking fundamentally misses the mark for creative tools. When I work with clients implementing analytics, I often have to push back against the desire to track everything. You don't need to know every time someone clicks the color picker or adjusts an element's position. These interactions create noise that obscures the signal we're really looking for. Instead, we need to ask: What actually indicates that a user has successfully achieved their goal? In creative tools, success isn't about the number of actions taken - it's about reaching a point where the user feels their creation is worth sharing or using. This is why we need to zoom out from click-level tracking and find more meaningful indicators of success. ### The Art of Zooming Out: Identifying True Success Indicators When creating analytics for creative tools, success moments aren't always obvious. You have to zoom out from the creative process itself and look for clear signals that indicate users have achieved something valuable. In Canva's case, there are three unmistakable moments that signal success: - When a user exports their design - When they share it with others - When they send it to print ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image-3.png) "In Canvas' case, it's when I download something, when I share something, or when I give something out for printing. This means I'm happy with the result," as I explained in the video. These actions tell us something crucial: the user believes their creation is good enough to use or share. This pattern of "zooming out" works across different creative tools. Take Miro, for example - while users might spend hours moving sticky notes and drawing connections on their whiteboard, the real success moment comes when they share their board with teammates. The sharing action signals that the creative process has reached a meaningful milestone. Finding these indicators requires asking two key questions: 1. At what point do users demonstrate confidence in their creation? 2. When does the creative process translate into actual value? The beauty of this approach is that it cuts through the complexity of creative processes. Instead of trying to measure whether someone is using the tool "correctly," we're measuring when they're using it successfully. This distinction is crucial for creative tools where there's no single "right way" to achieve an outcome. Remember: Success indicators should be clear and unambiguous. If you find yourself debating whether something counts as success, you probably need to zoom out further. The goal is to find those moments where user intent is crystal clear - where their actions tell you, without any doubt, that they've achieved something valuable. ### Turning Success Moments into Actionable Metrics Once you've identified your success moments, the next step is turning them into metrics that can drive product decisions. In Canva's case, we can combine our success events (export, share, print) into what I call "design value achieved" - a key metric that tells us when users are getting real value from the product. But identifying the metric is just the start. The real power comes from how you use it: - For onboarding effectiveness: - Track time to first "design value achieved" - Set targets like achieving first success within 7 days - Measure what percentage of new users reach this milestone - For ongoing engagement: - Define active users based on success frequency - Monitor users achieving design value at least once every 30 days - Track the distribution of success moments across different user segments "The other thing we can use it for is to really define active user. An active user for us is like when in 30 days, they at least have two design values achieved," I explained in the video. This approach gives us a much stronger definition of activity than simple login or page view metrics. One of the most powerful applications is identifying users at risk. When someone who regularly achieved design value stops doing so, it's an early warning signal. As I noted in the video, this creates immediately actionable insights - customer success teams can reach out to these users, or growth teams can create targeted re-engagement campaigns. The key is to build metrics that connect directly to business outcomes. For a tool like Canva, regular achievement of design value likely correlates with subscription retention and expansion. By tracking these success moments, we're not just measuring feature usage - we're measuring the actual value users get from the product. ## Structuring Analytics Around Strategic Goals: Why Brand Kits and Templates Matter More Than Button Clicks ### Understanding Your Product's Strategic Layer Product analytics often starts with tracking core functionality, but the real power comes from understanding and measuring your product's strategic direction. The challenge is finding these strategic elements - they're not always obvious in your day-to-day analytics. For Canva, I found clear strategic signals right on their homepage: - "Easy to create" - indicating a focus on accessibility - "Professional designs" - showing their quality aspirations - "Share or print" - highlighting their end-to-end solution "The sub headline is a good indicator. We can pick easy, create, professional designs. Easy to create, then professional designs. We are creating professional designs and it shouldn't take us long to do this," I noted in the video. These aren't just marketing messages - they're strategic priorities that need to be measured. Finding your product's strategic layer typically comes from two sources: - External messaging: Homepage headlines, product marketing, and sales materials - Internal direction: Company strategy, go-to-market plans, and revenue models The key is connecting these strategic elements to measurable outcomes. For instance, Canva's push into enterprise markets isn't just about having enterprise features - it's about measuring how those features drive business transformation. Their Brand Kit functionality isn't just another feature; it's a strategic tool enabling corporate identity management. ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image-4.png) When incorporating strategic direction into your analytics, ask yourself: - What differentiates your product in the market? - Which features or capabilities support your long-term vision? - How do these strategic elements contribute to revenue growth? Remember: Your analytics setup needs to grow beyond tracking basic product usage. It should help you understand whether your strategic initiatives are actually working and driving the business forward. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/Check-out-the-Book.png) ### Elevating Strategic Features in Your Analytics When designing analytics for strategic features, sometimes you need to break conventional tracking patterns. A perfect example is how we handle Canva's Brand Kit and Templates - by elevating them to full entity status rather than treating them as simple activities or properties in other entities. "By making it an entity, we're making it a strategic item. It makes it easier then for people, when they go into the analytics setup, to say, 'hey, I want to understand how our brand kit initiatives are doing,' so they will immediately find the right kind of events for it," I explained in the video. This isn't just a technical decision - it's about making strategic initiatives visible and measurable. For templates, this elevated status means tracking the complete lifecycle: - Creation by designers - Approval process - User adoption - Usage patterns - Performance metrics This approach serves multiple stakeholders: - Design teams can track template performance - Product teams can measure template adoption - Business teams can analyze template impact on conversions ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image-5.png) The key is making strategic features discoverable in your analytics. When a feature is buried as a property or merged into general usage metrics, it's harder for teams to track and optimize their strategic initiatives. By elevating these features to entity status, you're signaling their importance and making their impact more measurable. This doesn't mean every important feature needs to be an entity. The decision comes down to two factors: 1. Is this feature core to your strategic direction? 2. Does separate tracking provide valuable insights for multiple teams? When the answer to both is yes, consider elevating the feature in your analytics structure. This makes it easier for teams to find, measure, and improve these strategic elements without getting lost in general product metrics. ### Connecting Strategic Success to Business Growth Strategic features aren't just product enhancements - they're business growth drivers. When we track them correctly, we can directly connect product usage patterns to business outcomes. In Canva's case, their Brand Kit feature is a perfect example of how strategic analytics can inform business growth. Here's how this connection works in practice: - Track adoption rates across organization sizes - Monitor the number of brand assets being created and shared - Measure how brand kit usage correlates with subscription upgrades - Identify which teams are becoming power users "I add a brand kit as an entity because I also wanted to cover an interesting case that I haven't covered so far. It's also like an interesting spin because Canva was always a very B2C like tool, but now to see them, how they basically create this loop to also include B2B use cases," I explained in the video. This strategic view helps multiple teams: - Sales teams can identify accounts ready for expansion - Customer success can spot accounts needing activation - Product teams can measure enterprise feature adoption - Marketing can demonstrate ROI for premium features One crucial aspect is adding the right properties to your tracking. For example, adding a "pro\_feature" property to usage events lets you: - Calculate the percentage of pro feature usage in free accounts - Identify accounts likely to convert to paid plans - Measure the actual value premium features provide ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image-6.png) The goal is creating a clear line from strategic feature usage to business metrics. When you can show that accounts using Brand Kit features are more likely to expand their subscriptions or that template usage correlates with long-term retention, you're proving the business value of your strategic initiatives. Remember: The best analytics setup isn't just about tracking what users do - it's about understanding how those actions drive business growth. ## From Complexity to Clarity: Building a Three-Layer Analytics Framework for Creative Tools ### The Product Layer: Building Your Analytics Foundation The product layer forms the backbone of your analytics framework. For creative tools like Canva, this means identifying the core entities that drive user value without getting lost in the endless possibilities of creative actions. Start with your foundational entities: - Account & User: Essential for tracking both individual and team usage - Design: The core creative output that drives value - Strategic entities: Like Templates and Brand Kit that support business goals "We always have this one entity, which I always call the heartbeat entity, which is driving the whole thing," I explained in the video. "Yesterday I had Asana, obviously there it is the task. Before I had Vimcal, there it is the event. Then I had Superhuman, there it is the email. You always have this entity that basically if this keeps tracking for an account or for user, you're in good shape." For Canva the heartbeat entity is the Design. Everything exists because of it and everything is built around it. ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image.png) For each entity, define key activities that signal progress and success: - Creation moments - Meaningful updates - Success indicators (like export, share, print) - Strategic actions (like template usage) Then add properties that provide crucial context: - Pro vs. free features - Usage environment - User role or permissions - Content type or category Remember: The goal isn't to track everything possible, but to create a foundation that captures meaningful product usage. You can always add more tracking later, but starting with a clean, focused product layer makes future analysis much more effective. The key is finding the right balance - enough detail to understand product usage patterns, but not so much that you create noise in your analytics. When in doubt, ask yourself: "Will tracking this help us make better product decisions?" ### The Customer Layer: Measuring Progress and Growth While the product layer tracks what users do, the customer layer measures how they progress and grow. This is where we translate individual actions into meaningful patterns of success and engagement. ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image-1.png) For creative tools, the customer journey typically follows these stages: - New users getting started - First success moments - Regular usage patterns - Power user behavior - Risk signals "I love this, this is my kind of favorite customer activity or customer profile. Because it gives us something where we can immediately put it into action," I noted in the video when discussing at-risk users. This is where analytics becomes truly actionable. Instead of just tracking individual design creations, we want to identify key transitions: - When does someone move from experimenting to regular usage? - How quickly do they achieve their first successful design? - What patterns indicate they're becoming a power user? - Which behavior changes might signal they're at risk? For Canva, we might define these states as: - Regular designer: Creates and shares designs consistently - Team collaborator: Uses collaboration features actively - Brand manager: Utilizes brand kit features extensively - At risk: Previously active but showing declining engagement The power of this layer is its ability to drive action. When you identify users at risk, customer success teams can intervene. When you spot potential power users, sales teams can explore expansion opportunities. These insights drive business outcomes, not just analytics reports. Remember: The customer layer isn't about individual events - it's about patterns that indicate progress or problems. By understanding these patterns, you can proactively support users throughout their journey. ### The Interaction Layer: Keeping Click Tracking in its Place Yes, sometimes you need to track clicks - but let's be smart about it. While your product and customer layers focus on meaningful outcomes, the interaction layer handles those occasional needs for detailed UI analytics without overwhelming your tracking plan. Instead of creating separate events for every button or interface element, use a single standardized approach: - One event type: "element\_clicked" - Rich properties for context - Clear naming conventions - Focused implementation "Sometimes you need clicks, as I said, sometimes the UX designer really want to know, 'okay, we have this new background removal feature and I really want to know which kind of options are people actually using,'" I explained in the video. This is a valid need - but we can handle it without creating analytics chaos. ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/image-2.png) Structure your click events with consistent properties: - element\_type: "button", "link", "menu\_item" - element\_text: The visible text or label - element\_location: "top\_toolbar", "side\_panel" - element\_target: The action or destination - element\_container: The feature or section context This approach gives you flexibility without complexity: - UX teams can analyze specific interface elements - Designers can track new feature adoption - Product teams can investigate user patterns - All without cluttering your main analytics Remember: Click tracking should support your analytics, not dominate it. Keep it contained in the interaction layer, and make sure it's serving specific, valuable purposes. For deep UX insights, complement this data with session recordings and user interviews - they'll tell you more about user behavior than click tracking ever could. You can check out the complete design on the Miro Board: ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/Check-out-the-Repository---Notion.png) When measuring success in creative tools, the temptation to track everything is strong. But as we've seen through Canva's example, the power lies in zooming out to find meaningful signals amidst the creative chaos. By focusing on true success moments, elevating strategic features, and structuring analytics in clear layers, we can build tracking systems that actually inform product decisions. The goal isn't to capture every creative decision, but to understand when and how users achieve value. Whether you're working with design tools, digital workspaces, or any product with complex user journeys, this approach helps you find clarity in complexity. The best analytics aren't about tracking everything - they're about tracking what matters. This was part 6 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - [Slack - 28.01.25](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) - [Superhuman](https://hipster-data-show.ghost.io/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/) - 29.01.25 - [Vimcal](https://hipster-data-show.ghost.io/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/) - 30.01.25 - [Asana](https://hipster-data-show.ghost.io/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/) - 31.01.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/designing-analytics-for-creative-tools-finding-clarity-in-complex-user-journeys-canva-tracking-plan/Check-out-the-Book.png)

Product Strategy meets Analytics — Asana Tracking Plan

Thu, 30 Jan 2025 00:00:00 GMT

Analytics can feel deceptively simple when you start with a product like Asana. Track some tasks, measure completion rates, count active users - done, right? But as your product evolves beyond its core features, your analytics need to evolve too. The real challenge isn't tracking basic metrics; it's building an analytics framework that captures both your product's essential heartbeat and its strategic evolution. While simple task management metrics might tell you if users are active, they won't reveal if your product is truly delivering on its strategic promise. In this post, we'll explore how to build an analytics framework that grows with your product, using Asana as our example. We'll look at how to move from basic task tracking to measuring strategic initiatives, how to turn these insights into product features, and how to track success across both individual users and entire organizations. Whether you're building a task management tool or any other SaaS product, these principles will help you create analytics that drive product growth. In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the 5th one: **Asana**. Here is the season overview: ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/a-Tracking-Plan-a-Day---asana.png) ## From Basic to Strategic: How to Evolve Your Analytics as Your Product Grows ### Start with Your Core: Identifying Essential Product Activities Every product has a heartbeat - that core activity that drives everything else. For Slack, it's messages. For Superhuman, it's emails. For Asana, it's tasks. When building your analytics foundation, start by identifying this pulse. In its simplest form, a task management tool like Asana needs just two fundamental entities: - Tasks: The core unit of work - Users: The people doing the work That's it. You could build a working task management analytics setup with just these two entities and a handful of key activities: - Task created - Task completed - Task assigned - Task commented ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/image-31.png) "If you really want to get started, create a tracking plan for a task management for just a to-do app. The to-do app already has a very simple data model by itself. And therefore the analytics model is also pretty simple." But even with this minimal setup, you're capturing the essential rhythm of your product. You can see if tasks are being created (adoption), completed (success), assigned (collaboration), and discussed (engagement). The power of starting with core activities goes beyond simplicity. It gives you a stable foundation for future expansion. As your product evolves, you might add projects to group tasks or teams to organize users. But these additions build upon your core tracking rather than replacing it. Think of activities, not interactions. While it might be tempting to track every click and view, focus instead on meaningful activities that represent user progress or value achieved. A task being completed tells you more about product success than how many times someone viewed the task details. By keeping your initial tracking focused on these essential product activities, you create clarity. Your analytics will tell a clear story about how people use your product's core functionality - a story that becomes your baseline for measuring all future additions. ### Identify New Strategic Features: Aligning Analytics with Product Direction Once you've established your core tracking, it's time to look ahead. Your product isn't standing still - it's evolving to stay competitive and deliver more value. For Asana, this means expanding beyond basic task management into areas like goals and automated workflows. Where do you find these strategic signals? Start with your product's current direction: - Marketing messages (what sets you apart?) - Recent major feature launches - Areas of significant investment Looking at Asana's homepage, we see: "Connect work to goals and automate workflows with AI as your teammate." This isn't just marketing speak - it's a signal about where the product is headed. Basic task management isn't enough anymore; they're betting on enterprise orgs, goals and automation as key differentiators. ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/image-32.png) But not every new feature deserves its own tracking setup. Ask yourself: - Is this feature central to our product strategy? - Will it fundamentally change how users get value? - Do we need detailed insights about its adoption? For Asana, goals and workflows aren't just nice-to-have features - they represent a strategic shift toward enterprise functionality. They deserve their own entities in our tracking plan, with dedicated activities like: - Goal created - Goal linked to project - Workflow created - Workflow step completed "When we measure how many projects have workflows, it can enable us to create an index or percentage of 'You have 20% projects with automation.' This can be reported to organization admins to show how many teams have adopted automation, and the potential for time savings or quality improvements." The key is balance. You want to capture strategic initiatives without overwhelming your analytics setup. Focus on activities that indicate strategic success - not just feature usage, but value achieved through these new capabilities. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/Check-out-the-Book.png) ### Bridge the Gap: Connecting Core Features with Strategic Initiatives The real magic happens when you connect your core tracking with strategic features. It's not enough to track tasks separately from goals, or workflows in isolation. You need to understand how these elements work together to drive user success. Consider three key connection points: - Properties that link core and strategic features - Activities that span multiple features - Success metrics that combine both levels For Asana, this means tracking properties like: - Was this task created by a workflow? - Is this project connected to a goal? - How many automated tasks exist in this project? ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/image-33.png) These connections enable powerful insights. Instead of just seeing "1000 tasks created this month," you can understand that "30% of tasks are now created automatically through workflows." This isn't just a vanity metric - it indicates real efficiency gains and strategic feature adoption. Think about progressive disclosure in your analytics. A user's journey might start with basic task management, but you want to identify moments when they're ready for more: - When they're managing multiple recurring tasks (workflow opportunity) - When projects start connecting to broader initiatives (goals opportunity) - When manual work becomes repetitive (automation opportunity) The goal isn't just to measure feature adoption - it's to understand how strategic features enhance your core product value. Are teams with automated workflows completing more tasks? Are projects linked to goals showing better completion rates? Remember: Your core features are still the foundation. Strategic features should enhance, not replace, this foundation. By connecting both levels in your analytics, you create a complete picture of how users progress from basic functionality to advanced capabilities - and the value they gain along the way. You can check out the complete design on the Miro Board: ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/Check-out-the-Repository---Notion.png) ## Turn Analytics into Action: Creating Product Features from Data Insights ### Find the Signal: Identifying Opportunities in Usage Data Behavioral data tells stories - if you know how to listen. The key is identifying patterns that point to opportunities for product improvement, not just tracking metrics for their own sake. Let's look at Asana's task data. Beyond basic counts of tasks created or completed, look for patterns that suggest deeper needs: - Copy task creation with similar structures - Common sequences of task assignments - Projects that follow consistent templates - Tasks that always have the same descriptions These patterns are gold mines for product opportunities. When you see users manually recreating the same task structures over and over, that's not just usage data - it's a signal that automation could add value. The real insights often come from combining different data points. Don't just look at what users do, but the context around their actions: - Time spent on repetitive actions - Common sequences of activities - Variations in task quality - Patterns across different teams For example, if you notice that teams with structured task templates consistently complete projects faster and with fewer revisions, that's a signal. It suggests that helping other teams implement similar structures could improve their efficiency. Look for friction points where users are working around limitations: - Manual copy-pasting of task descriptions - Recurring task creation - Repetitive task assignments - Similar project setups across teams Each of these friction points is an opportunity. The goal isn't just to collect data about these patterns - it's to identify where you can remove barriers and streamline workflows. The best product improvements often come from watching what users are already trying to accomplish, then making it easier for them to do it. ### Design Data-Driven Features: From Insight to Implementation Once you've identified patterns in your data, the next step is turning those insights into concrete features. This isn't just about building what users ask for - it's about solving problems they might not even realize they have. Take Asana's workflow automation opportunity. The data might show: - Teams manually creating similar tasks repeatedly (by duplicating them) - Inconsistent task descriptions across projects - Time spent on routine task management - Varying quality in task documentation From these insights, you can design features that not only save time but improve quality. For instance, an automation system that: - Creates standardized tasks automatically - Maintains consistent documentation - Ensures proper task assignment - Preserves best practices "Creating tasks automatically is a huge win on saving time, but what is often missed is improving qualities and standards. Because you define one time and one central place how this task should look like... When you run this workflow one time, you figure out, 'ah, we should have a link to this kind of documentation.' You just update the task template and the next time the task has been created, it has the better version already." But don't stop at just building the feature. Design it to provide insights back to users: - Show time saved through automation - Track quality improvements - Measure consistency gains - Highlight adoption opportunities This is where concepts like an "automation score" become powerful. It's not just a metric - it's a feature that helps users understand their progress and identifies opportunities for further improvement. Each team can see their automation rate and get suggestions for additional workflows they could implement. ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/image-34.png) Remember: The best data-driven features don't just solve problems - they help users understand the value they're getting and guide them toward even better practices. ### Close the Loop: Using Features to Drive Better Data When you build features informed by data, you create opportunities to collect even better data. It's a virtuous cycle that drives product improvement and user success. Think about Asana's workflow automation. When teams start using automated workflows, you can collect richer data about: - Which tasks are automated vs. manual - Time saved through automation - Quality improvements in task documentation - Patterns of workflow adoption across teams This enhanced data collection becomes possible because the feature itself creates more structured, measurable interactions. Instead of trying to interpret messy manual processes, you get clear signals about how work gets done. The key is using this data to drive further engagement: - Show teams their automation progress - Highlight successful workflow patterns - Identify opportunities for more automation - Surface best practices from power users Each insight can trigger new actions. For example: - A team sees their 20% automation rate - They discover other similar teams are at 40% - They explore those teams' workflow patterns - They implement new automated processes - Their automation rate improves - The data gets even richer This creates a self-reinforcing cycle. Better data leads to better features, which lead to better usage patterns, which generate even better data. Each loop helps users become more successful while giving you deeper insights into how your product delivers value. The goal isn't just to collect data or build features - it's to create a continuous improvement cycle where data and features work together to drive user success. When done right, users don't even think about the data they're generating; they just experience a product that keeps getting better at helping them work efficiently. ## Tracking Success at Scale: Measuring Progress Across Users and Organizations ### Define Progress States: Creating Clear Success Milestones Success in a product like Asana isn't binary - it's a progression through different states of maturity and value. Think of it like levels in a game, where users and organizations develop new capabilities over time. Start by mapping out clear progress milestones. For Asana, these might include: - Beginner: Created first tasks and projects - Activated: Regular task management and collaboration - Workflow Adopter: Using automated processes - Goal-Driven: Connecting tasks to strategic goals - Power User: Combining all features effectively ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/image-35.png) Each state should have clear, measurable criteria. For instance, a "Workflow Adopter" might be defined by: - Created at least one workflow - Automated task creation in multiple projects - Regular use of workflow features - Positive automation impact metrics Don't just track where users are - track how they move between states. This progression tells you: - Common paths to success - Typical timeframes for advancement - Potential stuck points - Opportunities for intervention "Moving from new users into activated users often presents the most significant improvement potential. It's typically where you'll encounter your most significant losses. I advocate for a specific state to truly understand what constitutes an activated user." Keep your state definitions simple at first. You can always add complexity later, but starting with clear, understandable milestones makes it easier to: - Communicate progress to stakeholders - Guide user advancement - Identify improvement opportunities - Measure success over time Remember: States should reflect real value milestones, not just feature usage. They should tell you not just what users are doing, but how successfully they're achieving their goals with your product. ### Track Organization Maturity: Beyond Individual Usage Organization-level success isn't just the sum of individual user activities. It's about how deeply your product's strategic features are woven into the organization's workflows and processes. For Asana, organization maturity reveals itself through patterns like: - Percentage of teams using workflows - Goal adoption across projects - Automation rates in different departments - Cross-team collaboration metrics Think beyond simple usage metrics. Look for signals that show strategic feature adoption: - Are workflows being shared across teams? - Do goals connect multiple departments? - Is automation knowledge spreading organically? - Are best practices being replicated? "When we look on the customer level, we actually look on the organization because they are in the end who's paying the money. So they are in the end who's deciding should we keep the subscription? How many users should we actually add to the tool?" Create clear indicators of organization-level maturity: - Workflow Value Achieved: Organization has implemented successful automation - Goal Framework Established: Teams are aligning work with objectives - Cross-Team Collaboration: Projects and goals connect different teams - Best Practice Adoption: Successful patterns spread across teams These indicators help you understand not just if an organization is using your product, but how successfully they're implementing its strategic features. This understanding is crucial for: - Predicting renewal likelihood - Identifying expansion opportunities - Guiding customer success efforts - Targeting growth initiatives Remember: Organization maturity isn't just about depth of usage - it's about breadth of adoption and strategic alignment. An organization might have power users but still be immature if strategic features aren't widely adopted across teams. ### Connect Individual and Organization Success: Finding Growth Patterns The most powerful insights often emerge when you connect individual user success with organization-level adoption. This multi-level view reveals patterns that neither perspective alone can show. Key questions to explore at this intersection: - Are successful users clustered in specific teams? - How does individual workflow adoption spread? - Which teams lead strategic feature adoption? - Where do power users emerge? "The same thing that we apply on the organization or customer level, we could also apply on the user. So we could, for example, say, does this user have unlocked workflows yet? And then this is an interesting case to analyze later: how has an organization adopted workflows? This is one view. The second view is how many users actually have adopted it?" Look for growth patterns that connect both levels: - Power users emerging in specific departments - Workflow adoption spreading team by team - Goal alignment cascading through organizations - Automation practices being shared These patterns help identify: - Natural expansion paths - Internal champions - Adoption barriers - Growth opportunities The real value comes from using these insights to drive growth. For example: - When you spot a team successfully using workflows, guide them to share their practices - When you find power users, help them become internal advocates - When you see departments lagging, provide targeted support - When you identify successful patterns, create playbooks for other teams Remember: Individual and organizational success reinforce each other. Power users drive organization adoption, while organization-wide practices create more successful users. Understanding this relationship helps you create strategies that work at both levels simultaneously. Creating an analytics framework isn't a one-time task - it's an evolution that mirrors your product's growth. We started with Asana's core task management features, expanded to include strategic initiatives like goals and workflows, and showed how analytics can drive product development while measuring success across different scales. But the most powerful insight might be the simplest: good analytics grow with your product. They help you identify opportunities, validate strategic directions, and measure success in ways that matter to both individual users and entire organizations. Whether you're building your first tracking plan or evolving an existing one, remember to balance the essential with the strategic, turn insights into action, and measure success across all levels of scale. This was part 5 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - [Slack - 28.01.25](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) - [Superhuman](https://hipster-data-show.ghost.io/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/) - 29.01.25 - [Vimcal](https://hipster-data-show.ghost.io/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/) - 30.01.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/combining-your-product-strategy-with-your-analytics-implementation-tracking-plan-for-asana/Check-out-the-Book.png)

Tracking plan around one core feature — Inbox Zero

Wed, 29 Jan 2025 00:00:00 GMT

In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the second one: **Superhuman**. Here is the season overview: ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/a-Tracking-Plan-a-Day---superhuman.png) Imagine spending months meticulously tracking every click, scroll, and interaction in your product, only to realize you've missed the one metric that truly matters. This is exactly what happens when teams build tracking plans without anchoring them to their product's core promise. For **Superhuman**, that promise is the Zero Inbox - yet many analytics implementations would treat this as just another feature rather than the foundation of the entire user experience. In this post, we'll explore how to build a tracking plan that puts your product's core promise at the center, using Superhuman's Zero Inbox as our guide. Whether you're a product manager, analytics engineer, or growth professional, you'll learn how to move beyond surface-level metrics to create a tracking plan that truly measures product success. ## From Product Promise to Event Design: Why Zero Inbox Shapes Superhuman's Tracking Strategy ### Understanding Zero Inbox as More Than a Feature When building a tracking plan for Superhuman, the first instinct might be to track every click, every folder interaction, and every email processed. But this misses the fundamental promise that makes Superhuman different from traditional email clients. As I mentioned in the video: "The whole promise of Superhuman is to get you to Inbox Zero. At least it was when they got started. I think now it basically got a little bit pushed aside with other things. But when they got started, the most important thing is like - this tool is the perfect one to really get you to inbox zero." This core promise shapes everything about how users interact with the product: - The keyboard shortcuts aren't just for efficiency - they're designed to help process emails faster toward Zero Inbox - Snoozing emails isn't merely a convenience feature - it's a strategic tool to maintain inbox zero while handling emails that need attention later - Even the UI design, showing a clean list of emails without preview panes, supports quick decision-making about each email Traditional email analytics often focus on metrics that don't really matter for measuring success: - Email open rates (when every email needs to be processed anyway) - Folder organization (which actually becomes less important with Zero Inbox) - Time spent in the application (when the goal is to process emails quickly) Instead, we need to think about tracking that aligns with the user's journey toward inbox mastery. This means recognizing Zero Inbox not as a single achievement but as an ongoing state that users strive to maintain. It's the difference between tracking how people use email versus tracking how successfully they manage email. This shift in perspective fundamentally changes how we approach our tracking plan. Rather than creating events for every possible interaction, we focus on events that tell us whether users are progressing toward and maintaining the Zero Inbox state - the true measure of product success. ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/image-25.png) *The nice images you get when you hit Zero inbox - I haven't seen this image in 1.5 years* This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/Check-out-the-Book.png) ### Identifying Key Moments in the Zero Inbox Journey When designing a tracking plan around Zero Inbox, we need to be selective about which moments truly matter. As I mentioned in the video: "We want to have an explicit event that is fired once the inbox is on zero because I can then use this to track my customer activities." The key moments fall into two categories - the achievement itself and the actions that contribute to it: **The Core Achievement:** - "Inbox Zero Achieved" - our primary indicator of success - we need to decide how we implement it: technically, when Zero inbox is achieved (emails in inbox = 0), or later in the data warehouse by modeling the context data and derive this event from it. - Tracks both first-time achievement and repeated successes - Helps identify patterns in how users reach this state **Supporting Actions:** - Email marked as "done" (archived) - Emails snoozed for later - Emails read and replied to What's particularly interesting is how these actions work together to tell the complete Zero Inbox story. For instance, the ratio between emails marked as done versus snoozed gives us insight into different user strategies for maintaining inbox zero. We also need to think about the context around these moments. For snoozing emails, we track not just the action but also the duration - this tells us how users are planning their future email processing. For "done" emails, we can analyze what percentage of emails are marked done without being read, indicating efficient email triage. The power of this approach is that every tracked moment connects directly to the Zero Inbox journey. We're not just collecting data - we're tracking a story of how users progress toward and maintain their ideal inbox state. This focused approach makes our analytics more meaningful and actionable, whether we're measuring individual user success or overall product effectiveness. ### Translating Product Philosophy into Technical Implementation When it comes to implementing Inbox Zero tracking, we have two primary approaches. **The Real-Time Approach:** - Track inbox count changes as emails are processed - Emit the "Inbox Zero Achieved" event when count hits zero - Can be implemented either front-end or back-end - Provides immediate feedback for user experience **The Retroactive Approach:** - Include inbox count with every email-related event - Calculate Zero Inbox achievements after the fact - More suitable when working directly in the data warehouse - Easier to implement but less immediate While both approaches work, I lean toward real-time implementation. It's not just about collecting data - it's about supporting the core product experience. When we know exactly when a user hits Zero Inbox, we can provide immediate feedback and celebration moments. By that we can connect analytics data with user experience. The technical implementation needs to consider edge cases too. What happens when: - Emails arrive while processing others - Multiple devices are syncing - Network connectivity issues occur The key is to remember that we're not just building a tracking system - we're supporting a product philosophy. The technical implementation should feel as seamless as the Zero Inbox experience itself. This means carefully choosing where to implement the tracking logic and ensuring it's robust enough to handle real-world usage patterns. Remember: The goal isn't just to count emails - it's to measure success in helping users achieve and maintain Zero Inbox. Your technical implementation should reflect this higher purpose. You can check out the complete design on the Miro Board: ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/Check-out-the-Repository---Notion.png) ## Building the Event Hierarchy: Connecting Product Actions to Customer Success ### The Double Three-Layer Event Framework When designing a tracking plan, it's crucial to organize events in a way that makes sense both for implementation and analysis. As I mentioned in the video: "We have three different layers. We have the product layer, we have the customer layer, and we have the interaction layer." ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/image-15.png) The framework breaks down like this: **Product Layer (Core Foundation):** - Entities (Account, Email Account, Email, Search) - Activities (what happens with these entities) - Properties (additional context for each activity) For Superhuman, this means tracking core activities like achieving inbox zero, processing emails, and managing email accounts. Each activity carries properties that help us understand the context - like how many emails were processed or when inbox zero was achieved. **Customer Layer (Built on Product Layer):** - First value experience - Value repeated - At-risk identification From the video: "The customer layer sits on top of the product layer. It's usually built by using the product activities in a different kind of context." For example, we combine email processing activities with inbox zero achievements to identify if a user is getting consistent value from Superhuman. **Interaction Layer (Supporting Details):** - Sits below the product layer - Tracks specific user interactions - Example: keyboard shortcut usage The power of this framework is that it separates what matters (product and customer activities) from what's just interesting (interaction data). This helps us focus our analysis on meaningful insights rather than getting lost in clicks and views. Each layer serves a specific purpose, but they work together to give us a complete picture of how users are succeeding (or struggling) with the product. ### From Individual Actions to Customer Journey Individual events like marking an email as done or snoozing it for later might seem simple, but their real power emerges when we combine them to understand the customer journey. As I explained in the video: "We can have a customer activity, which is first value experience. And so this can be constructed by someone has connected an email account and the first email process, like read, archived, or replied, and then the first inbox zero achieved." Creating meaningful combinations requires thinking about: **Time Windows:** - First 7 days for initial value experience - Rolling 7-day windows for ongoing engagement - 6-8 week periods for identifying dormant users ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/image-26.png) **Success Patterns:** - Achieving inbox zero multiple times - Processing a healthy volume of emails - Using advanced features regularly Instead of looking at these actions in isolation, we create rule sets that tell us about the user's journey. For example, a successful first-week experience might include: - Email account connected - At least 20 emails processed - First inbox zero achieved - At least one advanced feature used These combinations help us understand not just what users are doing, but whether they're progressing toward mastery of the product. By looking at patterns over time, we can identify: - Users who are building strong habits - Those who might need additional support - Early warning signs of declining engagement The key is to focus on combinations that indicate progress toward the product's core value proposition. For Superhuman, this means tracking not just email processing actions, but how these actions contribute to maintaining a consistently clean inbox. ### Defining Success Through Combined Metrics Once we understand how individual actions combine into patterns, we can define clear metrics for user success. From the video: "Value repeated, we do the same thing. Here I would say this is more like an or combination. Either I maintain my inbox zero, or I process so that many emails per day, or I use advanced features regularly." Success metrics fall into three key categories: **First Value Achievement:** - Clear indicators that users "get it" - Usually happens in first 7 days - Combines basic actions with first inbox zero **Sustained Value:** - Regular inbox zero achievements - Consistent email processing volume - Ongoing use of advanced features like snooze **Risk Indicators:** - Declining frequency of inbox zero - Reduced email processing activity - Changes in established usage patterns What makes these metrics powerful is their flexibility. A user might be successful by maintaining perfect inbox zero, or by processing high email volumes efficiently, or through a combination of both. This matches the reality that different users have different working styles. We can use these combined metrics to create specific user segments: - Active power users - Developing users - At-risk users needing intervention - Dormant users who've stopped engaging The real power comes when we connect these metrics to business outcomes. For instance, dormant users who still have an active subscription are likely to churn soon. This insight allows customer success teams to intervene before subscriptions are canceled, turning metrics into actionable business intelligence. ## Turning Data into Action: Using Zero Inbox Metrics to Drive Customer Engagement ### Creating Actionable User Segments Collecting Zero Inbox metrics is just the beginning - the real value comes from turning these insights into actionable segments. As I explained in the video: "We can create a segment of users where you have two sets of definitions. Set number one looks back for 100 days, and if this user had a period where they achieved inbox zero five or more times or answered and replied to 250 or more emails, that's criteria number one." Key Segments to Monitor: **Power Users:** - Consistently achieve inbox zero (5+ times per week) - Process high email volumes efficiently - Actively use advanced features like snooze **Developing Users:** - Recently achieved first inbox zero - Showing increasing usage patterns - Beginning to explore advanced features **At-Risk Users:** - Previously active but showing declining engagement - No inbox zero achievements in last 2 weeks - Still maintaining active subscription The power of these segments comes from combining usage patterns with subscription status. For example, a previously active user who still pays for Superhuman but hasn't achieved inbox zero recently represents a critical intervention opportunity. ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/image-27.png) Building these segments requires thinking about: - Time windows that make sense for your product - Combination of metrics that truly indicate success - Clear triggers for customer success teams The goal isn't just to create segments - it's to make them actionable. Each segment should have clear next steps for customer success teams, whether that's celebrating with power users, encouraging developing users, or reaching out to those at risk. ### Designing Data-Driven Engagement Strategies Once we have our user segments, we can create targeted engagement strategies based on Zero Inbox patterns. From the transcript: "These users have been active in the near past but have become inactive for a period of six weeks (excluding holidays or other factors). When we have this group of dormant users, we can compare them if they're still in a subscription." Each segment needs its own engagement approach: **For At-Risk Users:** - Proactive outreach before they become dormant - Tips based on their specific usage patterns - Focus on features they haven't yet adopted - Timing interventions based on their last inbox zero **For Developing Users:** - Celebrate first inbox zero achievements - Introduce advanced features at the right moment - Share personalized productivity tips - Regular check-ins during critical first weeks **For Power Users:** - Share advanced keyboard shortcuts - Early access to new features - Community building opportunities - Recognition of their success patterns The key is to match the intervention to the user's journey. For instance, someone who consistently achieved inbox zero but recently stopped might need different support than someone who never quite got there. Timing these interventions is crucial: - Reach out while users are still engaged - Align with their typical usage patterns - Consider time zones and work schedules - Factor in natural usage fluctuations Remember: The goal isn't just to drive engagement for its own sake - it's to help users maintain the Zero Inbox state that makes them successful with the product. Every intervention should tie back to this core value proposition. ### Measuring Intervention Success Once we've implemented our engagement strategies, we need to measure if they're actually helping users achieve and maintain Zero Inbox (and if Zero inbox is really having a strong enough impact on retention and subscription revenue). This creates a feedback loop that helps us refine our approach and maximize impact. Key Success Metrics: **Immediate Impact:** - Return to inbox zero within 48 hours of intervention - Increased email processing activity - Adoption of suggested features - Response to outreach attempts **Long-term Effectiveness:** - Sustained inbox zero achievements - Reduced time between zero inbox states - Movement between user segments - Subscription retention rates Each intervention type needs its own success criteria. For example, when reaching out to at-risk users, we might look for: - Percentage who return to active status - Time to re-engagement - Duration of renewed engagement - Prevention of subscription cancellations The real power comes from combining these metrics to understand what works. We can analyze: - Which interventions drive the best results - Optimal timing for different user segments - Most effective communication channels - Impact on overall customer lifetime value This isn't just about tracking numbers - it's about understanding what truly helps users succeed with the product. By continuously measuring and refining our intervention strategies, we create a virtuous cycle where better data leads to more effective engagement, which in turn leads to more successful users. Remember: Success isn't just getting users back to inbox zero once - it's about helping them maintain this state consistently over time. Our measurements should reflect this long-term goal. This was part 3 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - [Slack - 28.01.25](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) - VimCal - 30.01.25 - Asana - 31.01.25 - Canva - 03.02.25 - Loom - 04.02.25 - Miro - 05.02.25 - Grammarly - 06.02.25 - Replit - 07.02.25 - Hubspot - 10.02.25 - Stripe - 11.02.25 - Zoom - 12.02.25 - Ghost - 13.02.25 - Amplitude - 17.02.25 - GSheets - 18.02.25 - Lightdash - 19.02.25 - Claude - 20.02.25 - Reconfigured - 21.02.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/Check-out-the-Book.png)

Short-as-possible tracking plan — Vimcal

Wed, 29 Jan 2025 00:00:00 GMT

You've probably been there: staring at your analytics dashboard, drowning in a sea of click events, page views, and user actions, yet somehow still unable to answer the simple question "are people actually using our product?" It's a common trap in product analytics - tracking everything but understanding nothing. This was exactly where I found myself years ago, watching teams build elaborate tracking plans that collected every possible interaction but failed to tell the product's story. Today, I'll show you a different approach to analytics design - one that puts humans first, focuses on meaningful insights, and evolves naturally with your product. Using Vimcal, a calendar app for people with too many meetings, as our example, we'll explore how to build analytics that tell your product's story instead of just logging its clicks. In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the second one: **Vimcal**. Here is the season overview: ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/a-Tracking-Plan-a-Day-vimcal.png) ## From Clicks to Customer Journey: Why Traditional Event Tracking Falls Short ### The Problem with Click-Based Analytics When teams first dive into product analytics, they often start in the most obvious place: tracking clicks. It's a natural instinct - open up the application, identify where users can click or interact, and start collecting that data. After all, isn't that what users do? They click through our carefully designed interfaces. But here's the trap: clicking isn't the same as using. Let's look at Vimcal, a calendar app, as an example. When a user creates a new calendar event, they might: - Click the "+" button in the interface - Use a keyboard shortcut (Vimcal is big on keyboard shortcuts) - Accept an invitation from someone else - Have an event created automatically through a booking link Four different interaction paths, one actual product usage: creating an event. If we only tracked clicks, we'd miss most of the story. "Clicking is part of using to some degree, but using a product is two or three abstraction layers up. We want to look at using a product from a high level," is how I explain it to teams when discussing analytics implementation. The problems with click-based analytics compound quickly: - Data becomes noisy and hard to analyze - Different paths to the same outcome create fragmented data - Teams struggle to understand actual product usage - Reports become focused on UI optimization instead of product value This becomes particularly painful when trying to answer fundamental questions about product adoption. When a product manager asks "How many users are successfully using our calendar features?" they don't want to piece together a puzzle of click events - they want to understand product usage at a higher level. The challenge, then, isn't to track more clicks - it's to track meaningful product usage. This requires a different approach to analytics design, one that starts with understanding what "using the product" actually means. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/Check-out-the-Book.png) ### The Power of Layered Thinking Moving beyond clicks requires a structured way to think about analytics. This is where the three-layer framework comes in - a way to separate different types of events based on their level of abstraction and business value. ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/image-15.png) Let's break down these layers using our calendar app example: **Interaction Layer** - Captures raw interactions (clicks, keyboard shortcuts, etc.) - Lives at the bottom of the stack - Perfect for UI/UX analysis but kept separate from product analytics - Example: tracking when someone uses "cmd+e" to create an event **Product Layer** - Records actual product usage - Focuses on core features and jobs to be done - Combines multiple interaction paths into meaningful events - Example: "event\_created" regardless of how it happened **Customer Layer** - Connects product usage to customer success - Maps user journeys from first value to power usage - Helps identify patterns that lead to long-term adoption - Example: detecting when someone becomes an "active user" based on their event creation patterns "We track specific activities from the product layer, and when they combine, we can say, 'Okay, this user was active (based on these four different product events) in the last seven days.' This is how the layers work together to tell the full story." The magic happens in how these layers interact. A single customer-layer event like "user\_activated" might combine multiple product-layer events: creating their first calendar event, sending their first invitation, and setting up their first booking link. Each of those product events might happen through various interactions. This layered approach solves multiple problems: - Keeps interaction data available but separate - Makes product usage patterns clear and analyzable - Enables meaningful customer journey tracking - Creates a clear hierarchy for organizing analytics The result? Analytics that tell the story of product usage rather than just recording clicks. You can check out the complete design on the Miro Board: ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/Check-out-the-Repository---Notion.png) ### Building a Foundation for Insights To make analytics truly valuable, we need to organize our events in a way that makes sense not just for data collection, but for the people who'll actually use this data. This is where the entity-activity model comes in. For Vimcal, our core entities are the building blocks of the product: - Calendar (the container for everything) - Event (the actual meetings/appointments) - Account (the user's home base) - Booking Link (the viral growth feature) ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/image-28.png) "An entity for me should be a core building block of a product. My first test is usually: when I remove this entity, is the product still usable?" This simple test helps prevent overcomplicating your analytics structure. Each entity has activities - created, deleted, shared, etc. But here's where many teams go wrong: they track too much. Not every update or change needs to be an event. When a product manager checks the analytics tool twice a month, scrolling through 70 different events isn't a good experience. Instead, focus on activities that indicate meaningful product usage: - First value moments (creating and sending first calendar invite) - Adoption signals (creating five events in seven days) - Power user behavior (regular booking link usage) - Risk indicators (no activity in last 30 days) The goal isn't to track everything - it's to track the right things. Properties can fill in the details without cluttering your main event structure. For instance, instead of creating separate events for Zoom vs Google Meet calendar events, use a "video\_provider" property. This foundation makes it possible to answer real questions about your product: Are users getting value? How quickly do they adopt key features? When do they become power users? Most importantly, it makes these answers accessible to everyone in your organization, not just the analytics team. ## Designing Analytics for Humans: The Art of Less is More ### The Once-a-Month User Problem Here's a reality check: most people in your company aren't analytics experts. While your data team lives and breathes these tools daily, product managers, marketers, and other stakeholders might check your analytics setup once or twice a month. Think about that for a moment. These occasional users: - Don't remember where things are located - Have forgotten what events mean - Need to relearn the interface each time - Are often trying to answer urgent questions quickly "When I build an analytics setup for a data team, it's often the team that creates the setup for themselves, and it works pretty okay because they define it how they understand things. But everyone else checks an analytics tool maybe twice a month. Their core job is something totally different. They live in totally different tools." This creates a fundamental challenge. The people who build analytics systems (data teams) have fundamentally different needs from the people who use them occasionally (everyone else). It's like building a professional video editing suite when most people just want to trim a clip. Consider a product manager trying to understand adoption of a new feature. They don't want to: - Scroll through 70 different events - Remember complex naming conventions - Piece together multiple data points - Decode technical properties They want clear, accessible insights. If they can't find what they need in 5-10 minutes, they'll either give up or, worse, make decisions without data. This is why designing for the once-a-month user is crucial. Every additional event, every complex naming scheme, every nested property adds cognitive load for these occasional users. The solution isn't to dumb things down - it's to make them intelligently accessible. ### Creating a Clean Analytics Experience Building a clean analytics experience starts with a counterintuitive approach: saying no to tracking more things. Instead of tracking everything possible, we need to focus on what matters and how to organize it in a way that makes immediate sense to anyone opening the analytics tool. Let's look at how this played out in Vimcal's case. Initially, we considered tracking integrations (Zoom, Google Meet) as their own entity. But we asked ourselves: does this deserve top-level visibility in our analytics? The answer led us to move it to properties instead. Three key principles for keeping analytics clean: **Start with Clear Entities** - Use names that match how people think about your product - Keep the number of entities small (4-6 is often enough) - Make them memorable and immediately understandable - Example: Calendar, Event, Account, Booking Link ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/image-30.png) *Entities are the foundations for the product layer events and help to introduce a clear structure* **Make Events Tell a Story** - Name events from a user's perspective - Avoid technical terms like "updated" or "modified" - Focus on meaningful actions that indicate product usage - Keep your total event count manageable **Use Properties for Details** - Hide complexity in properties rather than creating new events - Add context without cluttering the main interface - Make properties discoverable when needed - Example: Adding "video\_provider" instead of separate Zoom/Meet events "A good data user experience would make it pretty easy for them to immediately find the things they're looking for during those two times they visit per month." Think of it like organizing a kitchen. The most-used items should be easily accessible, while specialty tools can live in drawers. Everything has its place, but not everything needs to be on the counter. ### Balancing Depth and Accessibility The secret to great analytics design isn't choosing between deep insights and accessibility - it's knowing where to put each piece. Think of it like an iceberg: the most important data points visible above water, with deeper complexity available when needed. This is where our layered approach really shines: **Surface Level (Highly Accessible)** - Core product metrics everyone needs - Well-named entities and events - Clear customer journey states - Example: "Active Users" instead of "Users who performed 5+ actions in 7 days" **Middle Layer (Moderate Complexity)** - Properties that add context to events - Filtered views for specific use cases - Combined metrics for deeper understanding - Example: Breaking down calendar events by source or type **Deep Dive Layer (Power Users)** - Interaction events for detailed analysis - Raw data for custom exploration - Technical properties for debugging - Example: Tracking every keyboard shortcut usage "The implementation effort for properties is usually less than everything else. Also, I don't decrease the user experience because properties are usually bound to an event. When I go to an analytics tool, I pick an event, then I see the properties." This approach allows your analytics to evolve naturally: - Start with the essentials everyone needs - Add properties to existing events rather than creating new ones - Keep interaction tracking separate from product analytics - Let power users dig deeper when needed Remember: Just because you can track something doesn't mean it belongs in your main analytics interface. The goal is to make data accessible while keeping the path to deeper insights clear and uncluttered. The best analytics implementations grow like a tree - strong trunk of core metrics, with branches of additional insight available to those who need to climb higher. ## Beyond Implementation: Making Analytics Tell Your Product's Story ### Connecting Analytics to Product Promise When Vimcal positions itself as "the calendar for people with too many meetings," it's not just a marketing tagline - it should be the north star for our analytics design. But here's the twist: measuring this isn't as simple as counting meetings. The real promise isn't about handling many meetings - it's about making meeting management efficient. This subtle but crucial distinction shapes our entire analytics approach. Consider what efficiency looks like in calendar management: - Meetings getting scheduled without manual intervention - Quick creation of events through shortcuts - Automated handling of video conferencing links - Minimal back-and-forth for scheduling "If 70% of the events get booked by using a booking link, then the person organizing the calendar doesn't have to touch the tool at all. They just provide the link and the meetings get booked by themselves. By analyzing that, we can see that this person has a lot of meetings, but 80% are managed automatically." Your analytics should measure your product's core promise, not just its features. For Vimcal, this means tracking: - Ratio of self-scheduled vs manually created meetings - Time saved through booking links - Usage of efficiency features (keyboard shortcuts, templates) - Automation adoption rates This changes how we think about success metrics. Instead of celebrating when users create more calendar events (which could actually indicate inefficiency), we celebrate when they spend less time managing their calendar while handling the same number of meetings. The key is to align your tracking with your promise. Don't just measure what users do - measure how well you're delivering on your core value proposition. ### Defining Success States Raw event data tells you what happened, but success states tell you what it means. Instead of drowning in individual events, we need to define clear states that represent meaningful moments in the customer journey. For Vimcal, success states flow naturally from initial engagement to power usage: **First Value Moment** - Created first calendar event AND - Sent first invitation with attendees OR - Created first booking link - Must happen within first 7 days **Adoption Success** - Five events created in seven days - Booking links actively shared (not just created) - Regular use of efficiency features - Consistent weekly engagement **Power User Status** - Managing majority of meetings through booking links - High usage of keyboard shortcuts - Multiple calendars actively managed - Steady weekly active days "The nice thing about calendars is they can use the old hotmail trick - where you can put in a small 'created by vimcal' thing into the invite. Therefore if you share an invite with other people, it's an opportunity for vim call to trigger someone to check out what this tool is." Rather than using vague terms like "churn prediction," we opt for clear, actionable states: - At Risk: Active in past but no engagement in last 2-3 weeks - Dormant: No meaningful activity in 6-8 weeks - Needs Attention: High meeting count but low automation usage The key is choosing time windows that make sense for your product. For a calendar tool, weekly patterns matter more than daily ones. For other products, you might need different windows. Let your product's natural usage patterns guide these decisions. These states become your compass for product decisions, customer success interventions, and growth strategies. ### Evolution, Not Revolution Analytics implementation isn't a one-and-done project - it's a living system that grows with your product. The trick is evolving it thoughtfully without breaking what works or overwhelming your users. Start lean and grow methodically: - Begin with core entities and essential events - Add properties to existing events before creating new ones - Keep your interaction layer separate from product analytics - Let real questions drive your expansion "Interestingly, the longer I work on a setup, I usually tend to add more properties. This is because I get ideas about specific kinds of analyses we could do. The implementation effort for properties is usually less than everything else." Think about Vimcal's video conferencing feature. Instead of creating new events when users integrate with Zoom or Google Meet, we added a "video\_provider" property to existing calendar events. This gives us the data without cluttering our event list. **When to expand your analytics:** - New core product features launch - Teams identify missing insights - Usage patterns change significantly - Success metrics need refinement **When to refine instead:** - Edge cases emerge - Teams need more context - Existing events need clarification - Analysis reveals gaps Remember: Every addition creates maintenance overhead and potential cognitive load for users. The goal isn't to track everything - it's to track the right things in the right way. Think of your analytics like a garden. You start with strong foundational plants (core entities), add complementary elements (properties) as needed, and occasionally prune what's not working. It's about steady growth, not constant upheaval. Building meaningful analytics isn't about tracking more - it's about tracking smarter. Through our journey with Vimcal, we've seen how moving beyond click-tracking to a layered approach helps us tell better product stories. Remember: your analytics should be as thoughtfully designed as your product itself. Start with your product's core promise, build a foundation that occasional users can actually use, and let your implementation evolve naturally as you learn. Most importantly, resist the urge to track everything. The best analytics aren't the ones with the most events - they're the ones that help teams make better decisions. Whether you're building a calendar app or any other product, focus on making your analytics tell a story that everyone in your organization can understand and act upon. This was part 4 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - [Slack - 28.01.25](https://hipster-data-show.ghost.io/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/) - [Superhuman](https://hipster-data-show.ghost.io/build-a-tracking-plan-around-one-core-feature-inbox-zero-in-superhuman/) - 29.01.25 - Asana - 31.01.25 - Canva - 03.02.25 - Loom - 04.02.25 - Miro - 05.02.25 - Grammarly - 06.02.25 - Replit - 07.02.25 - Hubspot - 10.02.25 - Stripe - 11.02.25 - Zoom - 12.02.25 - Ghost - 13.02.25 - Amplitude - 17.02.25 - GSheets - 18.02.25 - Lightdash - 19.02.25 - Claude - 20.02.25 - Reconfigured - 21.02.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/create-a-tracking-plan-for-a-tool-that-you-want-to-use-as-short-as-possible-vimcal-tracking-plan/Check-out-the-Book.png)

Smart Tradeoffs in Analytics — Slack Tracking Plan

Tue, 28 Jan 2025 00:00:00 GMT

In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the second one: Slack. Here is the season overview: ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-22.png) Every click, every interaction, every moment of user engagement whispers a story. But what happens when those whispers become a deafening roar of data? Most teams believe that more tracking means more insights. The truth is brutally simple: tracking everything often means understanding nothing. In this comprehensive guide, we'll deconstruct the mythology of "track everything" and reveal a strategic approach to analytics that prioritizes clarity, impact, and actionable insights. Drawing from real-world experience implementing tracking plans for complex products like **Slack**, we'll show you how thoughtful, minimalist tracking can unlock deeper understanding of user behavior than mountains of disconnected data points. Don't miss the upcoming episodes of this series and sign up for free to my newsletter (and support my work): [Get all future episodes of this series](#/portal/signup/free) ## Why Less Events Lead to Better Analytics: The Core vs. Nice-to-Have Decision ### The Paradox of Event Tracking "Let's track everything - we might need it later." If you've ever been involved in setting up analytics, you've probably heard this phrase. It seems logical at first: data storage is cheap, and you never know what insights you might want to uncover in the future. Why not capture every click, hover, and interaction just in case? But here's the paradox: tracking everything often leads to understanding nothing. Let me illustrate this with a real scenario I encountered. A product team had diligently tracked every possible interaction in their app - from button hovers to modal closes, resulting in over 200 unique events. When they wanted to understand how users were progressing through their core workflow, they spent days just trying to identify which events were relevant. Some events had similar names but tracked slightly different things. Others were deprecated but still collecting data. The signal was lost in the noise. This "track everything" approach creates what I call analytics debt. Similar to technical debt, it accumulates silently but impacts your ability to move quickly and make decisions. Here's how: 1. **Data Quality Suffers** When you track everything, you spread your quality assurance thin. Instead of thoroughly validating a core set of events, you end up with hundreds of events that might be tracking incorrectly. I've seen teams spend weeks debugging inconsistencies between similar events, only to realize they could have used a single, well-designed event instead. 2. **Analysis Becomes Harder, Not Easier** More data doesn't automatically mean better insights. When an analyst needs to choose between "MessageSent", "MessageCreated", "MessagePublished", and "MessagePosted", they waste time figuring out which one to use. Even worse, different team members might choose different events for the same analysis, leading to inconsistent results. 3. **Implementation Costs Stack Up** Every event you track needs to be maintained. When your frontend changes, someone needs to update the tracking. When your analytics platform changes, someone needs to migrate the events. When new team members join, someone needs to document what each event means. These costs multiply with every event you add. 4. **Actual Money Gets Spent** Most analytics platforms charge based on event volume or have limits on events per user. When you're tracking everything, costs can spiral quickly. I've seen teams forced to sample their data or switch platforms because their event volume became unsustainable - all while only actively using a small fraction of the events they were tracking. The solution isn't to track less for the sake of tracking less. Instead, it's about being intentional with what you track. In the next section, we'll look at a framework for making these decisions, using Slack as our example. You'll see how we can get more value from fewer, well-chosen events than from tracking everything that moves. Remember: The goal of analytics isn't to collect data - it's to drive insights and decisions. Sometimes, less data, thoughtfully collected, gets you there faster. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/Check-out-the-Book.png) ### The Core Entity Test When designing a tracking plan, the first and most crucial decision is determining what actually matters to your product. Just as architects need to identify load-bearing walls before any renovation, we need to identify our core entities before building our analytics. Watch the video if you like to watch my decision process in real time: The Core Entity Test is deceptively simple: "If I remove this entity, would the product still fulfill its core purpose?" Let's apply this to **Slack** to see how it works in practice: **Workspace Entity** - Question: Could Slack function without workspaces? - Answer: No. Workspaces are where all communication happens. - Decision: Core entity ✅ - Why: Without workspaces, there's no container for channels, messages, or user organization. **Messages Entity** - Question: Could Slack work without messages? - Answer: Absolutely not. Messages are the primary form of communication. - Decision: Core entity ✅ - Why: Messages represent the fundamental unit of value in Slack. **Notifications Entity** - Question: Could Slack work without notifications? - Answer: Yes, but with reduced utility. - Decision: Nice-to-have ❌ - Caveat: This was a close call. While notifications are important for engagement, the core communication function remains intact without them. ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-23.png) **Tools/Apps Entity** - Question: Could Slack work without third-party integrations? - Answer: Yes, the core communication would be unaffected. - Decision: Nice-to-have ❌ - Why: While tools add value, they're clearly an extension of the core product. But the Core Entity Test isn't just a yes/no question. Sometimes entities live in a gray area, and that's where additional criteria come into play: 1. Strategic Importance Even if something isn't strictly core, it might be strategically crucial. For Slack, integrations might fall into this category if they're a key differentiator from competitors. 2. Current Focus If your team is spending the next six months improving notifications, it might make sense to include them as a core entity despite failing the strict test. 3. Analysis Requirements Sometimes non-core entities need tracking because they significantly impact core metrics. If notification engagement strongly correlates with workspace activity, you might want to track them anyway. **The Power of Saying No** The real value of the Core Entity Test isn't just in what it includes, but in what it excludes. Every entity you decide not to track: - Reduces implementation complexity - Improves data clarity - Lowers maintenance burden - Makes analysis more straightforward The key is to start with a solid foundation of core entities and expand thoughtfully when needed. It's much easier to add tracking later than to remove it once it's embedded in your analytics culture. Remember: Just because something is valuable doesn't mean it needs to be a core entity. Often, these features can be tracked as properties of core entities or as specific events without elevating them to entity status. In the next section, we'll look at how to handle important features that don't make the cut as core entities, ensuring we don't lose valuable insights while maintaining our clean, focused tracking plan. Don't miss the upcoming episodes of this series and sign up for free to my newsletter (and support my work): [Get all future episodes of this series](#/portal/signup/free) ### Converting Events to Properties One of the most powerful ways to reduce event sprawl is to recognize when an event should actually be a property. Here's the key principle: If something is a variation or attribute of an existing event, it's probably a property. Let's look at some real examples from our Slack tracking plan: Instead of separate events: ``` Message Sent Message Edited Message WithAttachment Message WithMention ``` We use a single event with properties: ``` Message Sent properties: wasEdited: boolean hasAttachment: boolean hasMention: boolean ``` Why this works better: - Analysis becomes simpler ("Show me all messages where wasEdited = true") - Event volume stays manageable - New variations can be added without creating new events - Easier to maintain and document ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-13.png) A simple rule of thumb: If you find yourself creating events that are variations of the same action (like MessageSent), stop and ask if they should be properties instead. This approach isn't just about cleaning up your tracking plan - it fundamentally improves your ability to analyze user behavior by making patterns easier to spot and compare. ### Real-World Decision Examples Let's dive into two of our most challenging decisions from the Slack tracking plan. These examples demonstrate how theoretical principles meet real-world constraints and requirements. **The Notifications Decision** ``` Initial instinct: Core entity Final decision: Nice-to-have Key metrics impacted: User engagement, response time, workspace activity ``` This was our closest call. Here's why: For notifications: - Critical for user engagement - Drive return visits - Key feature for power users - Important for workspace activity Against notifications as entity: - High event volume (cost implications) - Core messaging works without them - Can be tracked through other means - Complex implementation across platforms The Solution: Instead of creating a notifications entity, we track notification impact through: - Message read rates - Response times - Return visit patterns - User engagement metrics This gives us the insights we need without the complexity of a full notification tracking system. But we might would add them later. **The Tools/Apps Decision** ``` Initial instinct: Separate entity Final decision: Property of workspace Key metrics impacted: Workspace value, user engagement ``` This decision was clearer but offers important lessons: Initial arguments for tools as entity: - Revenue stream for Slack - Differentiator from competitors - Important for workspace stickiness - Complex usage patterns Why we changed course: - Tools extend workspaces, they don't exist independently - Most tool metrics tie back to workspace activity - Simpler to analyze as workspace properties - Easier to maintain and scale The Solution: ``` WorkspaceUpdated properties: installedTools: string[] activeTools: string[] totalTools: number ``` Plus tool-specific properties on relevant message events: ``` MessageSent properties: sentByTool: boolean toolName: string ``` Key Takeaway: Both decisions prioritized simplicity and maintainability while preserving our ability to answer key business questions. They demonstrate that good tracking plan design isn't about tracking everything possible, but about tracking the right things in the right way. ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-24.png) Remember: The goal isn't to capture every possible data point, but to create a sustainable system that provides actionable insights while remaining manageable and cost-effective. Don't miss the upcoming episodes of this series and sign up for free to my newsletter (and support my work): [Get all future episodes of this series](#/portal/signup/free) ### Benefits of Starting Small The natural instinct in analytics is to cast a wide net. But after years of implementing tracking plans, I've consistently seen better outcomes from teams that start small and grow thoughtfully. **Better Data Quality** - Fewer events mean more attention per event - Higher confidence in analysis results - Less time spent debugging tracking issues **Faster Time to Insight** ``` Instead of: Searching through 150+ events → Asking developers about differences → Finally starting analysis You get: Find relevant event from 30 clear options → Begin analysis immediately ``` **Easier Team Adoption** - New team members understand the system quickly - Clear, manageable documentation - Consistent analysis across the team ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-14.png) **The Compound Effect** Starting small doesn't mean staying small forever. It means building a solid foundation that you can confidently build upon. When you do need to expand, you'll do so with clear understanding and proven needs, not theoretical possibilities. Remember: You can always add more tracking later, but removing or fixing problematic tracking is much harder. Start with what you know you need, not what you think you might need. ## The Final Blueprint: A Production-Ready Tracking Plan for Slack ### Understanding the Double Three-Layer Framework Analytics isn't just about collecting data—it's about collecting the right data that tells a meaningful story about your product's success. The double three-layer framework is a strategic approach to tracking that transforms raw data into actionable insights. **What is the Double Three-Layer Framework?** ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-15.png) Think of this framework as a carefully designed lens for understanding your product's performance. It consists of three distinct layers: 1. **Product Layer**: The foundational level that captures the core mechanics of your product 2. **Customer Layer**: A strategic view of how users progress through your product 3. **Interaction Layer**: A flexible catch-all for granular user interactions **Why Layers Matter** Traditional analytics often fall into a common trap: tracking everything possible. The result? A complex, noisy dataset that obscures more than it reveals. The three-layer framework solves this by focusing on meaningful, strategic tracking. **Product Layer: The Building Blocks** Here, you define the core entities and activities that make your product unique. In Slack's case, this means tracking workspaces, channels, messages, and users—but only in ways that truly matter to understanding product usage. **Customer Layer: User Journeys** This layer transforms product events into meaningful user progression. Instead of raw clicks, you're measuring meaningful milestones like "initial value achieved" or "risk of churn." **Interaction Layer: Flexible Insights** A safety valve for those specific UX questions without cluttering your primary tracking. Want to know how often users click a specific icon? The interaction layer provides a clean, minimal way to capture these insights. **The Philosophy Behind the Framework** The goal isn't to track everything, but to track what matters. Each layer serves a specific purpose: - Understand product mechanics - Map user success - Provide flexible insights By treating your tracking plan as a thoughtful design process rather than a technical checklist, you create an analytics infrastructure that truly serves your business strategy. ### Product Layer Deep Dive The product layer is where we translate product mechanics into trackable, meaningful data. Here's a comprehensive look at how we approach tracking Slack's core product entities. **Workspace: The Foundation** ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-16.png) A workspace isn't just a container—it's the primary unit of collaboration. Key tracking considerations: - **Essential Properties**: - Workspace ID (always track) - Creation date - Status (active/inactive) - Domain (for account-based insights) - **Lifecycle Activities**: - Workspace created - Workspace deleted The critical decision: Skip "workspace updated" unless there's a compelling reason. Sometimes, the absence of an event is a strategic choice. **Channels: Collaboration Containers** ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-17.png) Channels represent specific collaboration spaces within a workspace: - **Key Properties**: - Channel ID - Type (public/private) - Associated workspace ID - **Lifecycle Tracking**: - Channel created - Channel joined - Channel left - Channel archived **Messages: The Heartbeat of Communication** ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-18.png) Messages are the most voluminous event—choose wisely: - **Critical Properties**: - Message ID - Sender ID - Channel ID - Workspace ID - Message type (regular/thread/reply) - **Strategic Activities**: - Message sent - Message read (with caution) - Message replied **User: Beyond Simple Identification** ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-19.png) Users are more than just accounts: - **Meaningful Properties**: - User ID - Email domain - Last active timestamp - Number of workspaces joined - **Key Activities**: - User created - User invited - User joined - User removed **Subscription: The Revenue Perspective** ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-20.png) Track what matters for business insights: - **Core Properties**: - Subscription ID - Type - Member count - **Critical Activities**: - Subscription created - Subscription renewed - Subscription cancelled **The Tracking Philosophy** Remember: Less is more. Each event and property should answer a specific business question. Avoid tracking for tracking's sake. Key principles: - Prioritize business impact - Minimize event volume - Ensure data usability - Plan for future iterations By carefully selecting what to track, you create an analytics foundation that grows with your product—not against it. You can check out the complete design on the Miro Board: ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/Check-out-the-Miro-Board.png) If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/Check-out-the-Repository---Notion.png) ### Customer Layer: Mapping User Journeys The customer layer transforms raw product events into meaningful narratives about user progression. It's where we move beyond technical interactions to understand how users truly experience and derive value from the product. At its core, the customer layer sits atop the product layer, synthesizing product events into strategic insights about user behavior and product success. Unlike the granular product layer, this layer focuses on broader user journeys and meaningful milestones. **First Workspace Experience** The initial interaction is critical. For Slack, this could mean: - Creating a new workspace - Inviting the first team members - Sending initial messages - Achieving first team collaboration The key is capturing that moment when a user transitions from a curious newcomer to an engaged participant. **Defining "Initial Value Achieved"** Initial value isn't a single, universal metric. In Slack's context, it might look like: - Setting up a workspace - Inviting at least 3 team members - Sending 20+ messages within the first seven days - Receiving responses from multiple team members This multifaceted approach ensures we're measuring meaningful engagement, not just surface-level activity. **Value Repetition and User Progression** Sustained value is about consistent engagement. We track this by looking at: - Repeated activities within a specific time window (e.g., 7 days) - Consistent message volume - Cross-channel interactions - Sustained team collaboration **Churn and Risk Detection** The customer layer also helps identify potential dropoff points: - Tracking periods of inactivity - Monitoring workspace engagement trends - Identifying users at risk of disengagement By understanding these patterns, teams can proactively address user retention challenges. **The Strategic Perspective** The customer layer isn't just about tracking—it's about understanding user success. It provides a holistic view of how users progress, derive value, and potentially churn, transforming raw data into actionable insights. ### Interaction Layer: The Catch-All Solution The interaction layer is the Swiss Army knife of tracking—a flexible, lightweight approach to capturing granular user interactions without overwhelming your analytics infrastructure. Product managers and UX designers often crave detailed insights into user behavior. How do users interact with specific buttons? Which icons do they click most? These questions can quickly lead to event tracking bloat if not managed carefully. ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/image-21.png) **The Challenge of Granular Tracking** Traditional tracking approaches might create an event for every possible interaction: - "Message Options Icon Clicked" - "Emoji Button Pressed" - "Thread Reply Button Touched" The result? A massive, unmanageable collection of events that obscures more than it reveals. **The Elegant Solution** Enter the interaction layer—a single, flexible event type with rich properties: - **Event Name**: "Element Clicked" - **Key Properties**: - Element ID - Element Type - Location - Context Tags This approach allows for comprehensive tracking with minimal complexity. **How It Works in Practice** Imagine a UX designer wants to understand icon usage in Slack. Instead of creating dozens of specific events, they can now: - Track all icon interactions through a single event - Add contextual properties - Filter and analyze with surgical precision **The Philosophy of Restraint** The interaction layer embodies a core tracking principle: capture insights without creating noise. It provides flexibility without sacrificing data clarity. **Strategic Flexibility** By treating interactions as a catch-all layer with intelligent properties, teams can: - Maintain a clean, minimal tracking plan - Support ad-hoc analysis requests - Keep the primary tracking layers focused on core product insights The interaction layer isn't about tracking everything—it's about tracking what matters, when it matters. Don't miss the upcoming episodes of this series and sign up for free to my newsletter (and support my work): [Get all future episodes of this series](#/portal/signup/free) ## Building a Sustainable Foundation: Cost, Scale, and Evolution ### Cost Considerations in Analytics Analytics isn't free. Beyond the apparent subscription costs, there's a hidden economy of data tracking that can quickly escalate if you're not strategic. Most teams underestimate the financial implications of their tracking approach. Event volume isn't just a technical challenge—it's a direct cost driver that can significantly impact your analytics budget. **The Hidden Price of Tracking** Consider the common culprits of ballooning analytics costs: - Heartbeat events tracking user time in-app - Automatic page view tracking - Granular interaction events - High-frequency data collection A simple heartbeat event sent every 10 seconds can generate thousands of events per user monthly, dramatically increasing your tracking expenses. **Understanding Pricing Models** Analytics platforms typically charge based on: - Monthly tracked users - Total event volume - Event quota limitations - Data retention periods The complexity lies in the nuanced ways these platforms meter and bill your usage. **Strategic Tracking Cost Management** Smart teams don't just track—they track strategically: - Prioritize high-value events - Implement sampling for high-frequency interactions - Create explicit, meaningful events instead of generic tracking - Regularly audit and prune unnecessary event collection **The Cost-Insight Balance** The goal isn't to minimize tracking, but to maximize meaningful insights while controlling infrastructure expenses. Every event should answer a specific business question. **Practical Considerations** Before implementing any tracking, consider: - Projected event volume - Potential cost implications - Business value of collected insights - Scalability of the tracking approach By treating analytics as a strategic investment rather than an unlimited resource, teams can create powerful, cost-effective tracking infrastructures. ### Scaling Your Tracking Plan As products grow and evolve, tracking plans must become living, adaptable systems. What works for a startup of 10 users won't necessarily scale to an organization of 10,000. Scaling isn't just about handling more data—it's about maintaining clarity, performance, and strategic insight as your product becomes more complex. **The Challenges of Growth** Product complexity increases exponentially. New features, user roles, and interaction patterns emerge, threatening to overwhelm your carefully designed tracking infrastructure. The risk? Uncontrolled event proliferation that creates more confusion than insight. Each new feature can potentially multiply your tracking complexity geometrically. **Designing for Flexibility** A robust tracking plan anticipates growth by: - Creating extensible entity frameworks (only pick new entities after serious considerations) - Implementing consistent property patterns (but think about properties first when you need more data) - Maintaining a clear, minimal core tracking approach The goal is a tracking plan that can accommodate innovation without becoming a tangled mess of events. **Avoiding Tracking Debt** Technical debt has a tracking equivalent. Each hastily added event, each poorly defined property creates future complexity that becomes increasingly difficult to untangle. Sustainable scaling requires discipline: - Regular tracking plan reviews - this also means to check for events that can be sunset - Clear governance processes - Cross-functional collaboration **The Long-Term Perspective** Scaling isn't about tracking everything—it's about tracking what matters, in a way that remains coherent and valuable as your product transforms. By treating your tracking plan as a strategic asset rather than a technical checklist, you create an analytics foundation that grows alongside your product's ambitions. Analytics isn't a destination—it's a continuous journey of discovery. The tracking plan you build today is just the first step in understanding your product's true potential. By embracing a strategic, intentional approach to data collection, you transform analytics from a technical requirement into a powerful lens for product innovation. Remember, the most valuable insights often come not from tracking everything, but from tracking what truly matters. Your data should tell a story, not create noise. Start small, think strategically, and let your tracking plan evolve as thoughtfully as your product itself. The path to meaningful analytics begins with the courage to say no—to complexity, to unnecessary events, and to the myth that more data automatically means better insights. This was part 1 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - [Notion - 27.01.25](https://hipster-data-show.ghost.io/building-notions-analytics-foundation-a-product-first-tracking-plan/) - Superhuman - 29.01.25 - VimCal - 30.01.25 - Asana - 31.01.25 - Canva - 03.02.25 - Loom - 04.02.25 - Miro - 05.02.25 - Grammarly - 06.02.25 - Replit - 07.02.25 - Hubspot - 10.02.25 - Stripe - 11.02.25 - Zoom - 12.02.25 - Ghost - 13.02.25 - Amplitude - 17.02.25 - GSheets - 18.02.25 - Lightdash - 19.02.25 - Claude - 20.02.25 - Reconfigured - 21.02.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/making-smart-tradeoffs-in-analytics-a-slack-tracking-plan-journey/Check-out-the-Book.png)

Notion's Analytics: A Product-First Tracking Plan

Mon, 27 Jan 2025 00:00:00 GMT

In this content series - season 1, I create a tracking plan for a typical start-up tool every day for four weeks (I take a break on the weekend), so 20 in total. This is the first one for the omnipresent Notion tool. You can apply the principles in this one to any tool that has workspaces and documents. Here is the season overview: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/image-12.png) Every product analytics implementation starts with good intentions and ends in event chaos." This was my experience at every company until I developed a structured approach to building tracking plans. Today, we'll explore how to create a foundational analytics framework for Notion using AI assistance - a process that helps avoid common pitfalls while capturing meaningful user behavior. Instead of tracking every click, we'll focus on what really matters: how users progress from exploration to active usage, and ultimately, to product success. The approach combines systematic event design with practical decision-making, demonstrating how thoughtful analytics architecture can scale with your product. Whether you're building Notion, another collaboration tool, or any SaaS product, these principles will help you create analytics that drives genuine product insights. This is **part 1** of my new content series, where I will create one tracking plan each day for the next four weeks (I will pause on the weekends). ## From Raw Events to Customer Journey: The Double Three-Layer Framework When building analytics for a product like Notion, it's tempting to track everything. Every click, every view, every interaction. Yet the most valuable insights often come from understanding user behavior at a higher level. This requires a different approach to event tracking – one that focuses on meaningful activities rather than raw interactions. This work is based on the chapters about event data design in my book [**Analytics Implementation Workbook**](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/). There, you can read more details about the D3L framework. ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/Check-out-the-Book.png) But follow along for now to see how we use it in practice. ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/image-6.png) ### Moving Beyond Individual Events Most analytics implementations start with tracking clicks, page views, and button interactions. While this data captures user behavior, it often creates a deluge of disconnected events that fail to answer crucial business questions: Are users getting value? How do they progress through the product? Which features drive retention? The problem isn't just volume – it's structure. When tracking focuses on interactions, you end up with hundreds of unique events that describe how users navigate your interface rather than how they achieve their goals. This makes it nearly impossible to identify patterns that matter for product decisions. Consider Notion's "Share" functionality. A traditional approach might track: - "share\_button\_clicked" - "share\_modal\_opened" - "email\_input\_changed" - "share\_confirmed" But what really matters is that a user successfully shared content with others – a key value moment that indicates collaboration. This is why we need to move beyond individual events to a structured framework that captures product usage at different levels of abstraction. The Double Three-Layer Framework provides this structure, organizing events into meaningful layers that connect user actions to business outcomes. If you prefer to watch, here is my YouTube video where I describe everything: ### The First Three Layers: Event Types The Double Three-Layer Framework organizes events into three distinct types, each serving a specific purpose in understanding product usage: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/image-7.png) Product Events capture core product functionality at an abstract level. For Notion, these include activities like "workspace\_created", "page\_shared", or "block\_added". They represent meaningful product interactions without getting lost in interface details. These events form the backbone of your analytics, tracking how users utilize key product capabilities. Customer Events track progression through the product lifecycle. Instead of individual actions, they capture important states like "first\_value\_achieved" (created workspace, added pages), "became\_active\_user" (regular content creation), or "at\_risk" (declining engagement). These events help measure activation, engagement, and potential churn by focusing on the user's journey rather than their clicks. Interaction Events serve as a catchall for interface-level tracking needs. Rather than creating unique events for every button, we use generic events like "element\_clicked" with properties describing what was clicked. This provides interface insights when needed while keeping the event schema clean and maintainable. This layered approach ensures we capture both high-level product adoption and detailed usage patterns without mixing different types of user behavior. ### The Second Dimension: Components The framework's second dimension breaks down each event type into three key components: Entities represent standalone objects in your product that users interact with. For Notion, core entities include workspaces, pages, accounts, and teams. Each entity should be significant enough to warrant tracking independently. The subscription entity, for example, merits separate tracking because it bridges product usage and business metrics. Activities describe what happens to entities. They capture meaningful state changes like "created", "shared", or "deleted". When designing activities, less is more – each should represent a distinct, valuable action. For instance, rather than tracking every page edit, we might only track significant changes through a "last\_modified" property. Properties provide context for entities and activities. They include identifiers (workspace\_id), characteristics (page\_type), and calculated metrics (blocks\_count). Properties help segment and analyze behavior without creating new events for every variation. They're also easier to add later without restructuring your tracking implementation. These components create a structured vocabulary for describing product usage, making analytics both powerful and maintainable. ### How The Layers Work Together Let's see how these layers work together by following a single user action in Notion: A user creates a new page in their workspace. At the interaction level, this might involve clicking "New Page" and typing a title. Rather than tracking these individual clicks, we capture a single product event: "page\_created". This event includes properties like page\_id, workspace\_id, and page\_type. This same action contributes to customer events tracking the user's journey. Combined with other actions like adding content and sharing, it might trigger "first\_value\_achieved" or contribute to "became\_active\_user" status. The framework connects individual product usage to meaningful progression. The three layers provide different lenses for analysis: - Product events show what users do - Customer events reveal how they progress - Interaction events explain how they navigate By structuring events this way, we can answer both tactical questions ("How are users creating pages?") and strategic ones ("Are users becoming successful?") without drowning in event data. ## Building the Foundation: Core Entities and Activities for Notion ### Core Entities for a Foundational Plan When building Notion's foundational tracking plan, we focus on five core entities that form the backbone of user activity: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/image-8.png) **Workspace** serves as the primary container entity. It's where collaboration happens and content lives. Tracking workspace creation, sharing, and overall usage provides insights into adoption and team collaboration patterns. **Pages** represent the main content entity. Users create, organize and share pages to capture knowledge. Page-level tracking helps measure content creation and collaboration frequency. **Account** tracks individual user identity and settings. This entity connects user actions across workspaces and enables cohort analysis based on user properties. **Team** enables tracking of collaborative usage. Team metrics like member count and activity levels indicate product stickiness and expansion opportunities. **Subscription** bridges product usage with business metrics. Tracking subscription states (created, renewed, canceled) connects user behavior to revenue outcomes. Each entity requires a unique identifier and timestamps for created/modified dates. This foundation enables tracking key product flows while maintaining flexibility for future expansion. ### Defining Key Activities The key activities for each entity create a map of essential user behaviors in Notion: **Workspace Activities** - Created: Initial workspace setup - Shared: Opening collaboration - Updated: Activity heartbeat - Deleted: Workspace removal **Page Activities** - Created: New content addition - Shared: Content distribution - Updated: Content changes - Deleted: Content removal **Account Activities** - Created: User onboarding - Subscription\_changed: Plan modifications **Team Activities** - Created: Team formation - Member\_added: Team growth - Deleted: Team dissolution **Subscription Activities** - Created: Initial subscription - Renewed: Continued engagement - Cancelled: Churn signal - Ended: Revenue impact Each activity represents a meaningful state change rather than interface interactions. This focused set of activities enables tracking core product usage while avoiding event proliferation. ### Essential Properties Properties provide essential context for entities and activities in Notion's analytics: **Identifiers and Relationships** - {entity}\_id: Unique identifiers for each entity - parent\_id: Links pages to workspaces - user\_id: Connects actions to accounts - team\_id: Enables team-level analysis **Calculated Metrics** - members\_count: Team/workspace size - blocks\_count: Content volume - pages\_count: Workspace activity - shared\_count: Collaboration level **Status Indicators** - page\_type: Document vs database - subscription\_status: Active/canceled - sharing\_level: Private/team/public - last\_modified: Recent activity These properties enable segmentation and analysis without requiring new events for each variation. They can be enriched over time as analytics needs evolve. You can check out the complete design on the Miro Board: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/Check-out-the-Miro-Board.png) ### Implementation Priorities Starting with fundamental flows ensures successful implementation while maintaining flexibility for growth: **Priority 1: Core Creation & Sharing** - Workspace and page creation tracking - Basic sharing functionality - Team member additions - Account creation and subscription starts **Priority 2: Value Indicators** - Content creation frequency - Collaboration patterns - Team engagement levels - Basic retention metrics **Priority 3: Growth Foundations** - Subscription tracking - Team expansion monitoring - Workspace growth patterns - Member activation rates This tiered approach ensures essential metrics are captured first while laying groundwork for more sophisticated analysis. Each tier builds upon previous implementation, allowing validation of tracking quality before expanding scope. Key success metrics can be tracked from launch, with additional granularity added as usage patterns emerge. If you want the nerdy version, you can check out the JSON schemas here: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/Check-out-the-Repository---Notion.png) ### Evolution Path The tracking plan should evolve systematically as product usage grows: **Adding New Entities** - When feature sets become complex enough to warrant independent tracking - If new collaboration models emerge - When monitoring new business dimensions becomes critical - But beware - entities are holy and make one of they really deserve to be one **Expanding Activities** - Add activities when usage patterns reveal new success indicators - Introduce granular tracking for heavily-used features (carefully) - Track new conversion points as user journeys evolve **Property Enrichment** - Add calculated metrics as analysis needs mature - Introduce new segmentation dimensions - Enhance context based on observed usage patterns - Always think about first adding a property when you want to change something, then about activities if properties don't work **Feature Expansion Guidelines** - Maintain clean event taxonomy - Evaluate whether new features warrant entity status - Consider property additions before creating new events - Validate data quality before expanding scope This evolution maintains analytics clarity while supporting product growth. Rule #1: Every newly introduced event (entity + activity) is causing more work (implementation, monitoring, documentation, data user experience, potential confusion) If you prefer to watch this in a more course-like setting, with helpful context and you would like to see a free preview of my upcoming course, **Analytics Implementation Deep Dive**, check out this link: ## Design in Practice: Three Critical Event Tracking Decisions ### Decision 1: Managing State Changes The "Updated" event represents one of the most crucial design decisions in event tracking. For Notion, this manifests in three key scenarios: **Workspace/Page Updates** - Initial inclination: Track every content change with an update activity - Better approach: Use last\_modified timestamp - Rationale: Reduces event volume while maintaining activity insight ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/image-9.png) **State Changes vs Properties** - Example: Page sharing levels - Decision: Track "page\_shared" event with sharing\_level property - Alternative rejected: Separate events for each sharing state - Benefit: Maintains clean event schema while preserving analysis flexibility **Activity Heartbeat** - Challenge: Tracking ongoing engagement - Solution: Use calculated activity metrics based on last\_modified - Implementation: Update timestamp on significant changes only - Advantage: Enables activity monitoring without event spam This approach balances granular tracking needs with system maintainability. ### Decision 2: Subscription Tracking Approaches The subscription tracking decision balances technical implementation with business insights: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/image-10.png) **Standalone Entity Approach (Selected)** - Distinct subscription events (created, renewed, canceled, ended) - Properties for plan type, amount, billing period - Enables clear subscription lifecycle tracking - Supports revenue analysis and cohort tracking **Account Property Alternative (Rejected)** - Track as account state changes - Single "subscription\_changed" event - Limited visibility into subscription patterns - Harder to analyze retention metrics **Implementation Trade-offs** - More setup work for standalone entity - Cleaner data model long-term - Better supports pricing changes - Enables subscription-specific metrics This approach prioritizes analytical flexibility over initial simplicity. ### Decision 3: Activity Calculation Methods When tracking key activities in Notion, we face three core calculation approaches: **Real-time Events** - Track as actions occur - Immediate data availability - Higher implementation complexity - Example: page\_created, workspace\_shared - Could also be implemented via a server-side approach **Calculated Properties** - Derived from other events/properties - Reduced event volume - More flexible for definition changes - Example: the workspace\_active property based on last\_modified property **Technical considerations guide timing:** - Calculated properties should be pre-calculated at the backend and not be calculated on the fly - Query performance impacts - this is the same as with calculated properties, talk to the devs about if any properties will cause any performance issues. Moving things to the server side can be a solution. - Implementation complexity - there is always a trade-off to make. If specific events or properties are causing problems, determine if they are worth the effort. ### Lessons from These Decisions Three key principles emerge from our tracking design decisions at Notion. First, favoring properties over events proves valuable when state changes happen frequently or when analysis needs remain exploratory. Rather than creating unique events for every state change, using properties provides analytical flexibility while maintaining a clean event structure. This approach particularly shines when tracking features that may evolve frequently, like sharing permissions or content types. Balancing precision with maintenance forms our second key lesson. While comprehensive tracking might seem ideal, it often creates long-term maintenance challenges. By leveraging timestamps and calculated metrics, we can derive insights without overwhelming our tracking implementation. This approach enables us to answer complex questions about user behavior without requiring engineers to instrument every possible interaction. Future-proofing our implementation represents our final lesson. Starting with core entities and using properties for variation creates a foundation that can grow with the product. This approach supports multiple analysis methods without requiring fundamental schema changes. As Notion's feature set expands, we can add new properties or calculated metrics without disrupting existing analytics. This flexibility proves invaluable as product analytics needs evolve. ## Beyond Features: Measuring Success Through User States and Activities ### User States in Product Analytics In Notion's analytics framework, we define five key user states that track progression through the product lifecycle: **New Users**: Accounts created within the current period. This state captures users starting their journey, typically lasting 30 days from signup. **Activated Users**: Those who've achieved initial success markers: - Created first workspace - Added 3+ pages - Created 10+ content blocks These criteria indicate basic product understanding. **Active Users**: Demonstrate ongoing engagement through: - Content creation in 3+ days per week - Regular sharing activity - Team collaboration Time windows adjust based on expected usage patterns. **At Risk Users**: Show declining engagement: - No activity in last 14 days - Reduced team collaboration - Dropping below active criteria Early warning system for potential churn. **Dormant Users**: Effectively churned from product usage: - No activity in 30+ days - Team members inactive - No content changes Distinguished from subscription churn. Users can move between states based on activity patterns, creating measurable progression through the product lifecycle. ### Mapping Product Activities to States Activities in Notion map directly to user states, creating clear progression signals: **Activation Signals** - First workspace creation - Initial page structuring - Team member invitations - First content share These early activities predict long-term engagement. **Active Usage Indicators** - Regular page creation - Consistent content updates - Cross-page linking - Team collaboration Frequency matters more than volume. **Risk Signals** - Declining creation frequency - Reduced sharing activity - Team member inactivity - Fewer page updates Two-week window provides intervention opportunity. **Churn Predictors** - No workspace activity - Team communication drop - Static content - Missing key features Early detection enables retention efforts. Each state transition requires specific activity combinations within defined timeframes, enabling automated state tracking and proactive engagement strategies. ### Creating Actionable Metrics User state metrics reveal Notion's product health and growth trajectory: **State Transition Rates** Monitor key conversion points: - New to Activated: Target 40%+ in first 30 days - Activated to Active: Weekly progression rates - Active to At Risk: Alert when exceeding 20% - Recovery from At Risk: Intervention success rate Time windows matter. Most users activate within 7 days or not at all. Active state duration predicts long-term retention. **Retention Analysis** Track retention through state transitions: - Active retention by signup cohort - Team size correlation - Feature adoption impact - Usage pattern differences Growth indicators emerge from state progressions. Workspace expansion follows activation. Team collaboration depth predicts retention strength. These metrics connect daily product usage to quarterly business outcomes. Weekly monitoring enables rapid intervention when metrics decline. ### Using States for Product Decisions User states drive product development priorities by revealing key opportunities: **Friction Points** Early state transitions expose friction. Low activation rates point to onboarding challenges. Team invitation drop-offs signal collaboration barriers. **Activation Optimization** Monitor first value achievement: - Simplify workspace creation - Guide initial content structure - Accelerate team connection Track improvement impact through activation rates. **Retention Focus** State patterns guide feature priorities: - At-risk prevention features - Active user engagement tools - Reactivation capabilities Product roadmap aligns with state progression needs. New features target specific state transitions. Success metrics tie directly to state improvements. Track feature impact through state transition changes. Strong features move users toward active states. Weak ones show minimal state progression impact. Building analytics isn't just about collecting data – it's about creating a foundation for product understanding. We enable insights that drive product decisions by structuring our tracking around user states and meaningful activities rather than interface interactions. The framework we've explored for Notion demonstrates how thoughtful analytics design can scale with your product while maintaining clarity and purpose. Remember: the best analytics implementation isn't the one that tracks everything, but the one that captures what matters most for your users' success. This was part 1 in our series "One tracking plan a day" Season 1 - startup tools. Make sure you visit all other parts of the series: - Slack - 28.01.25 - Superhuman - 29.01.25 - VimCal - 30.01.25 - Asana - 31.01.25 - Canva - 03.02.25 - Loom - 04.02.25 - Miro - 05.02.25 - Grammarly - 06.02.25 - Replit - 07.02.25 - Hubspot - 10.02.25 - Stripe - 11.02.25 - Zoom - 12.02.25 - Ghost - 13.02.25 - Amplitude - 17.02.25 - GSheets - 18.02.25 - Lightdash - 19.02.25 - Claude - 20.02.25 - Reconfigured - 21.02.25 If you like to generate your own tracking plans by using my book with Claude AI, get your copy here: ![](/images/posts/building-notions-analytics-foundation-a-product-first-tracking-plan/Check-out-the-Book.png)

From SQL to Slack: Automating Data Workflows with Big Functions

Thu, 23 Jan 2025 00:00:00 GMT

Every data analyst knows the feeling: you've uncovered an important insight, but turning that finding into action requires an engineering ticket, multiple meetings, and weeks of waiting. What if you could write a SQL query and have it automatically notify your team on Slack? Or enrich your customer data with third-party information without building a complex pipeline? BigFunctions transforms BigQuery from a data warehouse into an automation engine, letting analysts trigger actions directly from their SQL queries. In just five minutes, you can deploy functions that connect your data to external services - no infrastructure management is required. This isn't just about saving engineering time. It's about empowering analysts to complete the full cycle of data work, from insight to action, without dependencies. Let's explore how BigFunctions is bridging this gap and improve how data teams deliver value. ## Why BigFunctions Bridges the Gap Between Data Analysis and Action Data teams face a constant challenge: turning insights into action. While analysts excel at uncovering valuable patterns in data through SQL, implementing these findings often requires engineering support. A simple task like automatically sharing key metrics via Slack can turn into a week-long project, requiring infrastructure setup, API integration, and deployment processes. This creates bottlenecks and delays the delivery of valuable insights to stakeholders. BigFunctions fundamentally changes this dynamic by allowing analysts to trigger actions directly from SQL queries. Instead of building and maintaining separate infrastructure for each integration, analysts can leverage pre-built functions or create custom ones that connect directly to external services. This removes the traditional dependency on engineering teams for implementation while maintaining security and scalability. The applications are immediately practical. Analysts can push daily metrics to Slack channels, enrich customer data with third-party APIs, or standardize complex calculations across the organization - all from within their SQL workflows. For example, a single query can analyze product usage patterns and automatically notify relevant teams when specific thresholds are met, a process that previously required multiple systems and team handoffs. This capability shift represents more than just technical convenience; it's about empowering analysts to complete the full cycle of data work. By bridging the gap between analysis and action, Big Functions enables faster decision-making and more agile data operations. Teams can experiment with new metrics and automation without lengthy implementation cycles, leading to more innovative uses of their data infrastructure. Here is a 40m long introduction and hands-on demo I recorded with [Paul](https://www.linkedin.com/in/paul-marcombes/), the creator of the BigFunction framework: ## From 5-Day Projects to 5-Minute Solutions: Real-World Big Functions Examples ### Slack Notifications Pipeline (5-minute implementation) Most data teams have experienced this scenario: stakeholders want regular updates about key metrics, but setting up automated notifications traditionally requires multiple components - a scheduled script, API integration, error handling, and monitoring. What could be a simple notification often becomes a multi-day engineering project. Enter BigFunctions. Here's how a weekly course popularity notification goes from concept to production in five minutes: 1. **Write a simple SQL query to get your metrics:** ```sql SELECT course_name, COUNT(*) as starts FROM course_starts WHERE start_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) GROUP BY course_name ORDER BY starts DESC LIMIT 1 ``` 2. **Add the Big Functions Slack integration:** ```sql SELECT bigfunctions.eu.send_slack_message( 'Most popular course this week: ' || course_name || ' with ' || CAST(starts as STRING) || ' new students', 'your-webhook-url' ) FROM (previous query) ``` 3. **Schedule the query in BigQuery's native interface.** That's it - no infrastructure to maintain, no API credentials to rotate, no deployment pipeline to manage. The same approach that would typically require multiple services and ongoing maintenance now runs as a simple scheduled query. The impact extends beyond time savings. Analysts can now experiment with different metrics and notification patterns without engineering support. When a team wants to track a new KPI, it's a matter of minutes to implement the notification, fostering a more agile, data-informed culture. [Subscribe to my newsletter](#/portal/signup/free) ### Data Enrichment Without Engineering (30-minute setup) Understanding website traffic sources often requires context beyond basic referral URLs. Traditional approaches to enriching this data might involve building ETL pipelines or maintaining separate services. BigFunctions transforms this into a straightforward SQL operation. Here's a practical implementation for enriching referral data: ```sql SELECT referrer_url, bigfunctions.eu.get_webpage_metadata(referrer_url) as metadata FROM website_traffic WHERE DATE(visit_timestamp) = CURRENT_DATE() ``` The function returns structured metadata including site descriptions, titles, and languages - valuable context for marketing analysis. To make this cost-effective and scalable: 1. Process new URLs only: ```sql SELECT referrer_url FROM website_traffic t LEFT JOIN referral_metadata m USING(referrer_url) WHERE m.referrer_url IS NULL ``` 2. Add quota monitoring: ```sql SELECT COUNT(*) as daily_enrichment_count INTO monitoring.api_usage FROM website_traffic WHERE DATE(visit_timestamp) = CURRENT_DATE() ``` This approach transforms what would typically be a single data pipeline architecture into a simple SQL workflow. Marketing teams can immediately access enriched data for analysis, while data teams maintain control over API usage and costs through standard SQL patterns. The same pattern extends to other enrichment sources - company information APIs, weather data, or AI-powered text analysis, all accessible through SQL queries rather than separate infrastructure. ### AI-Powered Analysis Integration (1-hour project) App store reviews contain valuable customer feedback, but manually analyzing thousands of comments is impractical. Big Functions enables automated sentiment analysis and categorization directly in BigQuery using AI models. Here's the implementation approach: ```sql SELECT review_text, bigfunctions.eu.sentiment_score(review_text) as sentiment_score, -- A higher score indicates more positive sentiment, while a lower score indicates more negative sentiment bigfunctions.eu.ask_ai( ''' Question: About which main feature is this review about? Answer: return one value from these options: user interface, pricing, performance ''' , 'gemini-pro') as category FROM app_store_reviews WHERE DATE(review_date) = CURRENT_DATE() ``` To ensure reliable results: - Run analysis in batches of 1000 reviews - Store results in a separate table for cost efficiency - Add validation checks for AI outputs This transforms unstructured feedback into actionable data that product teams can immediately use. The entire pipeline, from raw reviews to categorized insights, runs within BigQuery without additional infrastructure. Key benefits: - Immediate access to AI capabilities - Consistent analysis across all reviews - Easy integration with existing dashboards and reports [Subscribe to my newsletter](#/portal/signup/free) ## Getting Started: Implementing Big Functions in Your Data Stack ### Initial Setup (5 minutes) Setting up Big Functions requires minimal configuration. First, clone the repository and install the CLI: ```bash pip install bigfunctions ``` **Create your first function:** From the [docs](https://unytics.io/bigfunctions/framework/#5-bigfun-cli): "Functions are defined as yaml files under `bigfunctions` folder. To create your first function locally, the easiest is to download an existing yaml file of unytics/bigfunctions Github repo. For instance to download `is_email_valid.yaml` into bigfunctions folder, do:" ```bash bigfun get is_email_valid ``` **Deploy the function** Make sure to check all requirements in the [docs](https://unytics.io/bigfunctions/framework/#52-use-bigfun). ```bash bigfun deploy is_email_valid ``` The function becomes available in your specified dataset within minutes, ready to use in queries. This minimal setup provides immediate access to both pre-built functions and the framework for custom development. **Integration with Existing Workflows** Big Functions seamlessly integrates with existing dbt workflows by adding function calls directly to your models. This allows you to: - Add notification logic to your presentation layer models - Enrich data as part of your transformation pipeline - Maintain version control of function usage - Document function dependencies alongside models Beyond dbt, BigFunctions works with any tool that generates BigQuery SQL, including: - Scheduled queries in BigQuery - BI tools (when they don't cache the datasets) - Custom data applications Best practices for integration: - Store credentials securely using your existing patterns - Monitor usage through standard BigQuery logging - Include function tests in your CI/CD pipeline - Document function dependencies for team visibility The key advantage is maintaining your existing workflow while adding powerful integration capabilities without additional infrastructure. [Check out my Analytics Implementation workbook](https://hipster-data-show.ghost.io/the-analytics-implementation-workbook/) BigFunctions represents an interesting shift in how data teams can deliver value. By eliminating the gap between engineering, analysis and action, it enables analysts to build powerful data workflows directly in BigQuery without engineering support. From sending automated Slack notifications to enriching data with AI insights, what once took days of engineering work now requires just a few lines of SQL. The simplicity of setup combined with seamless integration into existing tools like dbt makes it an accessible solution for teams of any size. As data teams continue to face pressure to deliver insights faster, BigFunctions offers a practical path to more agile, automated data operations. Whether you're looking to streamline communications, enrich your data, or experiment with AI integrations, BigFunctions provides the tools to transform your BigQuery instance into a comprehensive data enablement platform. Start with a simple Slack notification - you might be surprised how quickly your team discovers new ways to bridge the gap between insight and action.

How I reconfigure atm

Wed, 15 Jan 2025 00:00:00 GMT

Reconfigured is a tool I've been using for almost three months now. It helps me capture everything I encounter when working on data tasks. One unique aspect of data work is how often you discover quite a lot of bits of information that become extremely relevant later. ![](/images/posts/how-i-reconfigure-atm/image-1.png) *Adding a quick task reminder in reconfigured* These insights come in different forms. While some are specific discoveries from analysis, I find most are a mix of decisions, questions, and tasks. For example, when working on a data model and implementing specific logic, I make decisions along the way. This is fairly normal, but in data modeling these decisions are often crucial because they affect how we handle business logic. Some decisions need to be documented since others might need to understand our reasoning in the future. In the past, I sometimes added these as comments in the data model, which wasn't ideal. Questions are possibly the biggest component. When working with a dataset, I often have questions for my client about missing context. I might need to ask others who've worked on it before to understand their approach. Sometimes I even have questions for myself to investigate when I have more time, so I need to note those down too. Tasks naturally emerge from all this. During implementation, I generate many follow-up tasks. For instance, I might need to test specific things that I'm ignoring while focusing on implementation, but I want to remember what needs testing later. There are several ways to handle this information flow. I could try keeping it all in my head, but I know that doesn't work - I typically forget about 50% of things if I don't write them down. Another option is keeping a paper notebook nearby. While this works initially, it doesn't scale well. The pages fill up quickly, and I still need to transfer everything to a digital system. When I started testing Reconfigured, I wasn't sure how I'd use it. I initially thought it would be a task management tool for data work, but after a week, I realized it's actually perfect for journaling. ![](/images/posts/how-i-reconfigure-atm/image-4.png) [reconfigured | the analysts’ journal for tracking path to insight.](https://reconfigured.io) — analysts’ journal for tracking path to insight. easily add context on the go, see what you were thinking yesterday and what you’re investigating today. What makes it perfect is its accessibility - you can activate it with just two shortcuts. And the application comes into foreground, and I use the same shortcut to hide it again. I mostly use the quick note-taking mode: hit a keyboard shortcut, type in a popup window, press command+enter, and the entry is saved. It fits seamlessly into my workflow because when I'm developing a data model, I don't want to switch between applications to take notes. I need it to be quick and unobtrusive, which Reconfigured handles beautifully. Over the last 4-5 weeks, I've developed a clear pattern in how I use Reconfigured. I want to share this approach, though it might evolve since the tool is still new. [Sign up for my newsletter](#/portal/signup/free) To make my notes easier to review later, I always start entries with their type. If it's a decision note, I start with "Decisions:" For questions, I start with "Questions:" For tasks, "Tasks:" Then I add everything as bullet points. This makes it simple to scan through a note stream for a specific task and quickly find all related tasks or questions. ![](/images/posts/how-i-reconfigure-atm/image-2.png) This system is especially helpful when meeting with clients - I can immediately pull up all relevant questions. In the future, as Reconfigured adds more AI capabilities, it could potentially sort these automatically into separate sections for questions, notes, and decisions. But even now, it works really well. For complex projects, I've added another practice: at the end of each day, I make a quick voice note summarizing what I've done and what needs attention next. I add this to Reconfigured, sometimes running it through Claude to get bulleted summaries. This helps enormously when picking up work the next day or even the next week - I can quickly review my notes and know exactly where I left off. That's my current workflow with Reconfigured. If you'd like to try it yourself, I've included a link. It's quite different from other tools you might have used, and you may find entirely different use cases for it. But for me, it's invaluable for journaling my data work because it maintains context that proves valuable for anyone who later works with what I've built. Do you like these kind of posts: [Sign up for my newsletter](#/portal/signup/free)

The easiest tracking setup in the world

Wed, 18 Dec 2024 00:00:00 GMT

About a year and a half ago, I had a LinkedIn post go viral. It was my second viral post and still holds the record for most views and engagement in my LinkedIn history. What kind of post was it? A meme. I stopped doing meme posts after that one – they didn't really align with my usual content. But this particular meme captured an important truth, which is exactly why I'm writing this post today. Thanks for reading Hipster Data Lab! Subscribe for free to receive new posts and support my work. The meme showed a simple scene: When asked who wants to implement, get, or measure their product, everyone's hands shoot up. But when asked who wants to implement the tracking? Not a single hand. This perfectly captures half the story. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb354dca2-76b3-4d85-9b09-719b626f2c81_1394x770-png.jpg) Look, I know most companies have some tracking in place to understand their product and marketing performance. The real issue is that few are willing to invest in creating a tracking setup that actually makes a difference. There's a massive gap between basic tracking and having a system that can truly help you grow your business, improve marketing, and understand how people are really using your product. ## The easiest tracking setup in the world The simplest tracking setup you can do is just implementing the standard tracking SDK that comes with most analytics tools. It tracks page loads, and the smarter ones even work with single-page applications – where technically there aren't new page loads, but the SDK still catches when users navigate to new pages and sends those events to the system. But what does this basic setup actually tell us about how our business or product is performing? Sometimes, it might be enough. Take a blog, for instance. If I just want to know what content people are reading, this simple tracking would do the job perfectly. It lines up exactly with my business goal – people reading what I write – and I can set it up in half an hour. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4b718f75-ec3c-42d1-b16a-80cf7d45eff0_1394x606-png.jpg) But let's say I expand my blog by adding email subscriptions as a way to keep readers coming back. Now my business goal has changed, and my tracking setup needs to change with it. ## There is no easy business and product anymore In the early days of digital business, everything was simpler because we were all figuring it out. Take e-commerce – it was mostly small sites with straightforward operations. People would visit a product page, add to cart, checkout, done. Back then, we weren't too worried about measuring customer retention because marketing wasn't as complex as it is now. Understanding customer lifetime value wasn't as critical. Being digital was new, so everything was naturally less complicated than traditional businesses. But today's digital business? Totally different story. Look at e-commerce again – now we've added subscriptions, complex discount systems, and we're pushing for account creation and email collection way earlier. We're laser-focused on building customer loyalty because we have to be. Our businesses have grown, competition is fierce, and we've had to add layer upon layer of complexity to keep up. Digital products are even trickier. Take a software-as-a-service product – the complexity is mind-boggling. Your acquisition process might take forever before someone actually uses the product. If you offer free accounts, you're in product-led growth territory, where free users are both marketing channel and product users. You've got to analyze them from both angles. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f77a3a994-fea6-4cdb-a8b2-ce1b7b387ff9_1394x790-png.jpg) Then you need to figure out how long it takes people to start paying. And don't get me started on subscription management and churn. You've got product churn (people stop using your product) and revenue churn (people cancel subscriptions) – they're related but different beasts. We keep piling on these layers, making everything more complex. You can't just throw a tracking pixel or SDK at it and expect the data to magically tell you how to improve your marketing, grow your product, boost adoption, or increase subscriptions. So here's the real question: do we need more data, or do we need better data? ## More complexity, more data? Over the last 5-10 years, as our businesses and products got more complex, the typical answer was "we need more data." In most setups I work with, both marketing and product teams have dramatically increased what they track. Marketing teams launched customer data platform initiatives to capture every possible touchpoint between people and the brand across all platforms. Product teams started measuring every core interaction to track user behavior at an incredibly granular level. The result? Tracking setups with over 200 unique events. Remember, an event is supposed to be when someone does something. So we're tracking 200 different scenarios of people interacting with our product or marketing. And it often goes way beyond 200. When I talk to these companies, they insist all these events are essential for their data teams and analysts. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f53d597a0-c61c-442a-8353-40b363a464f4_1392x694-png.jpg) But here's the thing: when we run workshops and take a big step back to look at things from a business and product perspective, we usually end up needing just 20-30 events. We can streamline everything dramatically using one simple trick (which I'll get to in a moment). The key insight is this: when your business model and product get more complex, the answer isn't to collect more data points. The answer is to collect better, more focused data. We need to identify and measure the essential points in our product and business model that actually drive growth and success. I'll explain the difference between this approach and tracking every app interaction in the next section. ## From interactions to product and business outcomes I'll admit it – I'm guilty of this myself. For over five years, I set up product tracking systems that obsessed over how people interact with products. Sure, I learned some tricks along the way, like using one "CTA clicked" event with different properties instead of creating separate events for every CTA in the product. This reduced the number of events, but the data still wasn't really telling us how the product and business were performing. Eventually, I had to completely rethink my approach to product and marketing analytics. The realization hit me: tracking how users interact with my applications only tells me... how users interact with my applications. But the questions from product, marketing, and business teams are usually much bigger: Is our product successful? Are our marketing campaigns working? Is our business growing? Just knowing how someone clicks around an application doesn't tell us if the product is successful. Think about an ATM – if I tracked every button press, I still wouldn't know if people actually accomplished what they came to do. An ATM session is successful when someone gets their money out. That's it. So really, I only need two events: ATM session started (card inserted) and money withdrawn, assuming that's the only use case. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f70530a68-a061-4a0c-a65d-e0a03bad9da8_1396x792-png.jpg) This brings us to use cases – a much better level of analysis than tracking specific buttons or features. When people visit your website or open your app, they have a specific goal in mind. They want to accomplish something. Take email programs. One use case might be checking for important new messages. This isn't the easiest thing to measure – though we can look at patterns, like if someone checks 3-4 times daily, we're probably serving this need well. Another use case is replying to emails. Someone reads an email, hits reply, writes something, sends it. That's a distinct use case we can measure. And here's the key: it doesn't matter if they hit return, clicked a button, or used some other method to send it. Those details matter for UX designers, but they can measure that differently. From a product perspective, I care about whether people are archiving emails, responding to them, or creating new ones. These actions tell me how people are actually using my product. So the first step in moving from tracking interactions (clicks) to measuring product and business outcomes is to focus on use cases. We need to create a use case map for our product to understand what jobs people are trying to get done. ## What makes products and marketing successful Let's talk about success moments – the key moments when our product or marketing delivers real value. For marketing, it can be straightforward, but often isn't. Take a simple example: if we run a campaign to book demos, and that's its only goal, success is easy to measure. We just count how many people from that campaign booked demos. Done. But reality is usually messier. We often run broader campaigns where success could mean booking a demo, creating an account, or something else entirely. That's why it's crucial to define what success looks like for each campaign before we launch it. We need clear metrics to measure against. Consider podcast appearances. We might put our company leaders on different podcasts to boost visibility. Here, we need different success metrics. We might ask podcasts about download numbers to measure reach. Or we could get creative – maybe offer a special discount code for listeners, or create a podcast-exclusive ebook, then track how many people use that code or download that resource. The key is deciding upfront what success looks like. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7ddde642-5c48-415e-b8fa-7c9284eb6864_1394x388-png.jpg) The same goes for product success, and this is where user research becomes crucial. When we take user research seriously – doing lots of interviews and surveys – we discover why people really use our product and what makes them stick around. We learn what makes us so valuable to their daily lives that they won't switch to alternatives. Once we understand these situations and outcomes, we can identify or create success moments in our product and map out the use cases that lead to them. Often, we'll end up with just 5-6 core use cases that really matter. These are what we need to measure. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9bc65f89-f225-4788-9348-6f82bdace2ea_1392x424-png.jpg) Ultimately, we need to track two things: how many people reach these success moments, and how often they repeat them. This tells us if our product is performing well. Are we effectively guiding people to these valuable moments? Because let's face it – most products aren't intuitive enough for users to find value immediately. So when measuring our product, we need to track both how well we guide people to their first success moment and how well we encourage them to repeat it. When users keep hitting these success moments, they stick around and happily pay for our product. That's the essential foundation of product measurement. ## Were we not talking about the easiest analytics setup? Let me be clear: the easiest tracking setup isn't about fancy auto-tracking or AI magic that turns messy data into brilliant insights. We're not there yet, and honestly, we might never be. The secret to the easiest tracking setup is focus. We need to identify the metrics that actually tell us how our product, business, and marketing are performing. Let me show you with an example, using a product since product tracking is usually trickier. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f957a3e18-3a22-41c6-be33-619d60dd3bf4_1390x514-png.jpg) Let's take a task management tool. New ones pop up weekly – it's a crowded market. The wrong approach would be tracking every possible interaction in the tool. That just gives us a pile of data that doesn't tell us if we're actually making an impact or building the loyal user base we need (especially since people hop between task management tools every couple months). Instead, we need to focus on growth levers – specifically, success moments. These moments map to specific growth stages. When we can move people from stage A to B to C, we're more likely to convert them to subscribers. Here's how I'd build an eight-event tracking setup for a task management product: First, we need "account created" – our starting point. This is our baseline, our opportunity pool. Every new account is a potential subscription. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fea5f1931-9565-42e8-8005-b7c6b49bd004_1396x472-png.jpg) At the other end, we need "subscription created," "subscription retained," and "subscription churned." These three events tell us about our revenue health and let us calculate basic MRR metrics. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc29bc7a3-f273-4aa2-bbd3-da54d054e03d_1394x358-png.jpg) Now for the tricky part: what happens in between? We've used four events to track the start and revenue – so let's choose the next four carefully. We need to capture how users evolve, which looks different depending on your company's stage. Let's assume this is a new tool. Here's where we can blend qualitative research (like user interviews) with our quantitative data. Your product team might already have this research – if not, it's worth doing. Here's how it typically plays out: When someone creates an account, they usually start by adding tasks to a list. So we track "tasks created" with properties for different list types. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7c3fac42-663e-4be6-9b33-3baf104b72f6_1392x566-png.jpg) We might add "tasks completed," but it's actually optional. As long as people keep creating tasks, we can assume they're adopting the product. "Tasks completed" is nice to have but not essential right now. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb3e4e74a-1507-4c56-b632-812d50b08c79_1396x682-png.jpg) The next evolution is usually creating a project (basically a specialized list). We'll track "project created" or "list created." This shows someone's moved up a level – they've found task management useful and want to organize their work better. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4596afa7-109e-4dc9-bda7-fadfe717bcbd_1396x540-png.jpg) Then comes the big one: inviting team members. Assuming this is a team-focused tool, we track "team member invited." This is huge – it means someone finds the tool valuable enough to bring in their colleagues. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe9629497-ca6a-4c4e-a60c-3d12ae3a2a26_1394x444-png.jpg) So that's just three core events (plus one optional) that tell us how people adopt our platform. But what about stickiness? For this, we use these same events but look at them over time. To define an active user, we might look for someone who, in the last 30 days, either creates 3+ tasks, creates a project, or invites a team member. This time dimension helps us measure retention – and we can build it using our existing events and the segmentation features most analytics tools offer. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f82907da6-f1e8-41d9-810c-8802f75c4c19_1392x336-png.jpg) ## To recap, what does our easy tracking setup look like, and what can we achieve with it? Let me be clear about what I mean by an "easy" tracking setup – it's about focus. We need to understand exactly what metrics will help us improve the user and product experience to drive growth. A growing product needs two basic things: new users coming in to build our user base, and existing users sticking around (avoiding product churn). That's the core. But of course, we also need to make money. So we need to look at both product metrics and revenue metrics. Four key pillars: product account creation, product retention, subscription creation, and subscription retention. Understanding these gives us a clear picture of where to invest our efforts to boost product usage, stickiness, and revenue. For all this, we just need eight events: account created, task created, task completed (optional), project created, team member invited, subscription created, subscription retained, and subscription churned. Let me show you how to use these to measure product performance. First analysis? Always create a global customer journey funnel from account creation to subscription. This gives us the big picture of how we're converting people over time. I usually look at this over a long period to understand totals, but it's also helpful to track conversion rates over time in a line chart. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f60aac4b3-59df-4f54-a809-0b778adc3be8_1396x586-png.jpg) Next, I create user cohorts to track evolution. I start with new users (accounts created in last 30 days). Then activated users – new users who've created at least three tasks, maybe a project, possibly invited a team member. We can even create different activation levels: initially activated (basic tasks), extensively activated (created a project), and committed (invited teammates). The key is showing progression from new user to actually using and understanding the product. We might need to adjust thresholds – maybe three tasks isn't enough to show real engagement. The most interesting cohort is active users. We might use the same criteria as activated users (tasks, projects, team invites), but we look at the last 30 days for ALL users, not just new ones. If 30 days is too long, we can switch to weekly analysis. Then we create risk indicators. An "at-risk" cohort might be users who were active in the past 90 days but inactive for the last 30. "Dormant" users might be those active in the last 300 days but inactive for 60 days – they're likely lost, but some might come back. The real power comes from combining these cohorts with subscription data. Active free users are subscription candidates. Active subscribers are our business backbone. And at-risk subscribers? They need immediate attention. With these cohorts set up, we can run deeper analyses. A revenue retention analysis using subscription events shows how well we keep paying customers. A product usage retention analysis (using active user definitions) shows product stickiness. Comparing these reveals fascinating patterns. For dashboarding, focus on cohort movement. Track new accounts per month to measure acquisition health. Watch conversion rates: new users to activated, activated to active, and active to churned. ![](/images/posts/the-easiest-tracking-setup-in-the/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2faf640b12-b473-4eed-8cdd-7d0c20ce31a0_1054x472-png.jpg) These metrics tell powerful stories. If active-to-churned rates spike, we need to investigate – maybe competition is pulling users away. If new user numbers grow but activation rates drop, maybe we need to pause acquisition and fix onboarding. That's the beauty of this setup – just eight events giving us deep insights into business performance, product growth, and risk factors. Minimal tracking, maximum impact. ## What comes after that? Now, everything beyond these eight events is optional – but this basic setup can serve you well for a long time. It'll consistently tell you how your product and growth are performing. When you do expand, it's usually to understand specific steps in more detail. For example, you might see you're getting lots of new users but only 8% become activated users. What's happening with the other 92%? This is where qualitative research shines – user interviews can reveal insights that numbers alone can't show. On the quantitative side, you might want to track your onboarding process. That might mean adding just three more events: "onboarding started," "onboarding steps submitted," and "onboarding finished." This helps you see if people are actually completing the onboarding or dropping off. The magic happens when you combine measurement with onboarding design. Your onboarding could collect valuable context – like asking users what problems they're trying to solve. This helps predict their likely evolution and lets you measure if they're actually achieving their goals. Ultimately, you're gathering context to better understand how to move people from new accounts to activated users. Yes, you'll need a few more events, but we're talking about three additional ones – not fifty. The key is careful, purposeful expansion. I'd love to hear your thoughts in the comments. Could this setup work for you? Skeptical about anything? Questions? Drop them below or reach out to me directly. Thanks for reading Hipster Data Lab! Subscribe for free to receive new posts and support my work.

Sessions v Users — picking the right metric

Tue, 03 Dec 2024 00:00:00 GMT

So here's a small story about making things easy. ![](/images/posts/sessions-v-users/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fbdd2bdaf-3698-4797-9948-0d2d8cf5666a_1448x510.png) The other day I tried to make what I thought would be a simple video about sessions versus users in analytics. You know, just explaining when to use each one and if one's better than the other. I figured it would be easy - no notes needed, just me and my whiteboard. Well, I was wrong. Three times wrong, actually. Thanks for reading Hipster Data Lab! Subscribe for free to receive new posts and support my work. Each time I started explaining, I'd get partway through and realize "Nope, this makes no sense" and have to start over. Pretty frustrating! What makes this whole thing interesting is why I was even making this video. In my workshops, I keep getting this question that I thought wasn't really relevant anymore: "How do we compare sessions versus users?" It usually comes from people who've used Google Analytics forever and are switching to tools like Amplitude, which are all about tracking users instead of sessions. Here's the twist - even Google Analytics 4 is mostly about users now. They actually removed a bunch of session metrics when they launched it. People weren't happy about that, so they had to bring some of them back. ![](/images/posts/sessions-v-users/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fba22a7e1-c067-4d73-b678-9b5b55f135e6_1472x660.png) I'll be honest - I moved away from Google Analytics years ago to more user-focused tools, and I work mostly in product analytics where it's all about the user journey. So I got pretty dismissive about sessions. When people in workshops would ask, "When should we use sessions?" I'd just say "Oh, sessions are complicated - better to avoid them if you can." But then they pushed back: "Look, we've been reporting on sessions forever. We can't just change everything without explaining why. We need to understand this properly." And that's when it hit me - one of those moments in my data career where I realized I was taking the easy way out instead of really diving into the problem. That's what led to this whole thing - me trying (and finally succeeding!) to properly explain the difference between sessions and users. Let me tell you what I figured out... ## We need to talk about success first We need to talk about what success means for your business before we get into sessions and users. I know it might seem obvious, but trust me, this matters. ![](/images/posts/sessions-v-users/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3e96a226-5ba5-4c46-bbdd-e621b453fa36_1688x730.png) Think about your website or app. Maybe it's just one small part of what your company does, or maybe it's the whole show. Either way, you need to know what "winning" looks like online. Take some common examples: If you're running an online store, success is pretty clear - someone buys something. For a SaaS company, it might be getting people to sign up for a demo, or create an account, or even better, start a subscription. And if you've got a big corporate website, maybe success is having people check out your products or apply for jobs. Whatever it is, you need to know what counts as a win. (If you don't know this yet, that's your homework - figure out why you have a website in the first place!) This matters for our sessions versus users discussion because both sessions and users can be "successful." And to really understand the difference between them, we need to look at these success moments. I'm going to use these successful events as our guide to explain how sessions and users work. ## A successful session Before we talk about successful sessions, let's break down what a session really is. I find it helps to use a real-world example. Think about going to a supermarket. You walk in through the front door - boom, that's when your "supermarket session" starts. You walk around, grab what you need, maybe find some stuff you weren't looking for, check out, and leave. Session over. But wait - you get to your car and realize you forgot milk. You go back in - and technically, that's a new session. Simple enough in the real world, right? Online it gets trickier because we don't have actual doors. So we need to create rules about what counts as a session. Let's say someone visits your online store - how do we know if it's a new session or part of their last visit from 5 minutes ago? ![](/images/posts/sessions-v-users/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe56caa9e-2e03-43e2-8591-4525b1f12589_2286x516.png) Most analytics tools solve this with time. They look for gaps in activity. If someone's clicking around your site and there's less than 30 minutes between clicks, it's usually counted as one session. But if they leave for an hour and come back? New session. Here's the important bit: a session is just a model we made up. Different tools might define it differently. The old Google Analytics even started a new session if someone came back through a different marketing link, or if it hit midnight. They did this to make their marketing reports work better. Now, what makes a session successful? In an online store, the simple answer would be "someone bought something." We use this to calculate conversion rates. But there's a catch - not everyone who visits your site is ready to buy. That's why conversion rates aren't usually 70-80%. Not every visit is a shopping visit. We could get fancy and split this up better. Maybe we could separate "shopping sessions" (when someone adds stuff to their cart) from "browsing sessions" (when they're just looking around). Then we could measure success differently for each type. Session metrics help us with two main things: 1. Did the customer have a good experience? 2. Which marketing source brought them in? But remember - sessions only show us a small piece of the picture. People often visit multiple times before buying something. To understand the whole story, we need to look at user-based metrics too. Thanks for reading Hipster Data Lab! Subscribe for free to receive new posts and support my work. ## A successful user aka customer Let's talk about two ways to track users online. The basic way is using cookies or device IDs. It works, but it's not great - people clear their cookies, use different devices, or share devices. Maybe your teenager borrows your iPad, or you check the store from both your phone and laptop. Suddenly the data shows two or three "users" when it's just you. You end up with messy data. The better way? Track actual user accounts. In an online store, this happens when someone orders something and creates an account. Now we can see everything they do as one customer, not random separate visits. If they browse on their phone during lunch break and finish the purchase on their laptop at home - we know it's the same person. Now, what makes a user successful? Let's call them customers because that's what they really are in an online store. Unlike sessions (where success is pretty simple - did they buy something?), customer success is trickier. Sure, getting someone to buy for the first time is great. But no online store can survive on just new customers. Think about your favorite stores - you probably buy from them regularly, right? That's what every store wants. We need more than just that first purchase. ![](/images/posts/sessions-v-users/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f771aff3b-f255-404f-a17d-20223221ff1f_2494x402.png) What we really want is for customers to stick around and spend more over time. We want them to come back again and again. Think about it as different levels: **Level 1: New Customer** - They buy something for the first time - Great start, but just the beginning - Maybe they're testing us out with a small purchase - We need to prove ourselves here **Level 2: Returning Customer** - They come back for second, third, fourth orders - Shows they like what we're selling - Starting to build a habit - They trust us enough to come back - Usually spend more than first-time buyers **Level 3: VIP Customer** - Maybe they've spent $2,000 in the last six months - These are our rock stars - They keep buying even when we don't advertise - They tell their friends about us - They're the backbone of our business - Might be only 20% of customers but could make up 80% of sales - When they do leave, it hurts more So measuring customer success is way more complex than sessions. I always suggest at least splitting it into two parts: 1. How good are we at getting new customers? 2. How good are we at turning them into returning customers? You could add more levels too - VIP conversions, different customer segments, whatever makes sense for your business. Maybe you have "bronze," "silver," and "gold" customers. Or maybe you split them by what they buy - fashion buyers versus home decor buyers. The key is looking at each group separately because they're totally different situations. And here's the really interesting part - these different levels need different approaches. Getting someone to make their first purchase might need big discounts or ads. But turning them into a regular customer? That's more about good service, quality products, and maybe a loyalty program. And keeping your VIPs happy? That might mean special treatment, early access to sales, or personal shopping assistance. This is why user-based analytics gives us a much richer picture than just looking at sessions. It helps us understand the whole customer journey, not just single visits. ## So what is better now: Sessions or Users? After all this, you can probably guess the answer - neither is better. They're just different ways of looking at your data. Let me break this down. **Session Metrics: Website Performance** Sessions tell us how well our website converts visitors right now. Remember though, it's just a model - don't read too much into it unless you've really fine-tuned how you track sessions. Want to make session tracking more meaningful? In an online store, you might only count sessions when someone starts checking out. That makes sense, right? They're showing clear buying intent, so now we can really see if our checkout process works. Session metrics are great for things like: - How many visits we get - How many of those turn into sales - What our conversion rate looks like - Running A/B tests to improve conversion One really useful thing here is funnel analysis - tracking how many people move from viewing a product, to adding it to cart, to actually buying it. Sessions help us spot where people drop off. **User (Customer) Metrics: Business Health** Now this is where it gets interesting. If you really want to know how healthy your business is, you need to look at customer metrics. They tell you: - How good you are at getting new customers - How well you turn them into repeat buyers - How many become VIP customers - How well you keep those VIPs around Here's a trap to watch out for: Your total revenue might look great, but what if it's all from existing customers? Maybe you're terrible at getting new ones. You'd never spot this just looking at session data - you need customer metrics. Quick tip: Unless your site requires an account for everything, get your customer data from your backend systems (like order history) rather than frontend tracking. **Different Tools for Different Jobs** It all comes back to how you define success. For an online store, maybe it's just purchases. For a SaaS company, success might mean: - Getting someone to book a demo - Having them create an account - Starting a subscription Sessions help track one-time wins - like someone requesting a demo. User metrics show the bigger journey - from trying a demo, to creating an account, to becoming a subscriber. **The Real Answer** Use both! They show different parts of your business performance. The trick is knowing when to use which one and being able to explain to stakeholders what they're actually looking at. Think of it like this: Sessions are like looking at today's weather, while user metrics are like looking at the climate. You need both to get the full picture. Still have questions? Drop them in the comments - I'd love to help explain more! If you like to watch my whole case, I have also recorded a video about it: Thanks for reading Hipster Data Lab! Subscribe for free to receive new posts and support my work.

The next evolution of Product Analytics

Tue, 08 Oct 2024 00:00:00 GMT

This is the first time I changed a post's title and minor parts. But some days after I published it, I felt I got the initial angle wrong. This is not about sunsetting product analytics (which was the original title) but more about evolution. When I began this post, I intended to discuss Optimizely's acquisition of NetSpring and explore its implications for the product category. However, I quickly realized this would evolve into a more fundamental examination of product analytics' current state and whether it makes sense as a distinct category. A year ago, I wrote a more observation, “Leaving Product Analytics.“ In it, I examined various trends within the category and pointed out signs that product analytics as a standalone field was beginning to fade. [Leaving product analytics](https://hipster-data-show.ghost.io/leaving-product-analytics/) — The current situation: Amplitude, Mixpanel, and Heap are setting out to new offerings This post essentially closes that chapter. Product analytics started with a promise it could never fully deliver on. I'll explain why fulfilling that promise wasn't possible and explore where things might be heading now. After all, the need for product insights hasn't disappeared—it's just evolving. ## Optimizely acquires Netspring It was quite a surprise when I heard that Optimizely acquired Netspring yesterday. Surprisingly, this news didn't make big waves in the data world. But it makes sense - we're talking about a niche event in the product and customer analytics space. I live in this data niche, so this was big news for me. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f65fc8882-26ea-413a-9298-4bca8d9da1c9_1716x532.png) I don't want to dive too deep into what this acquisition means, mostly because I don't have many details. Instead, I want to use this opportunity to take another look at the product analytics category and maybe finally close this chapter. Thanks for reading Hipster Data Stack! Subscribe for free to receive new posts and support my work. ## Leaving product analytics - part 1 In that previous post, I mentioned that all the vendors who created and defined the product analytics category were already on their way out. Amplitude was moving towards including marketing analytics capabilities, aiming to become a "customer experience platform". Mixpanel was following suit. Heap was acquired by ContentSquare, so they're now a content experience platform. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fed36eaeb-2f9c-4beb-9efb-c6977ec3950b_1454x652.png) But a year ago, new platforms like Netspring, Mitzu, Houseware, or Kubit were entering the category with a new paradigm. They worked directly on top of cloud warehouse data that companies already had. This meant you weren't relying on SDK data anymore. You could use your existing data, model it correctly, and put Netspring on top to do classic product analytics use cases like cohort analysis, funnel analysis, etc. I was pretty optimistic about this move, and it impacted the category. Amplitude and Mixpanel quickly followed suit. Mixpanel introduced enhanced synchronization. Then, this summer, Amplitude announced the public availability of their native Snowflake connector, allowing you to run Amplitude directly on your Snowflake data. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3705b944-61de-47de-b167-22d61b779ed5_1418x384.png) So I thought, "Wow, we have a new paradigm. Things are moving in a new direction." But now Netspring has been acquired. Does this give us any feedback about what I assumed was a new movement and horizon? Netspring is one of two companies that have left the cloud-native warehouse product analytics space. Houseware, which came around a bit later than Netspring, also announced it was making its product analytics product free—which essentially means it's giving up on this category. It's worth noting that these are all very early and young companies. As with any young company, especially in the current economic environment, getting things off the ground is difficult. These new companies in this space were trying to achieve something challenging. And that's what this post is about - exploring these difficulties and closing the chapter on product analytics. ## What makes it so complicated? Product analytics has always been a tough nut to crack. In my consulting work, I've seen firsthand how challenging this space can be. If I had focused solely on product analytics projects when I started my data and analytics consulting career, I might not have made it past the first year. The demand for such projects has consistently been low over the years. Even with a profile that showcases my ability to bridge product and data (having worked in both fields), the interest in product analytics remained minimal. The complexity of product analytics stems mainly from its primary target group: product managers. This group is not inherently averse to data or uninterested in analytics. Rather, their role is so multifaceted and resource-intensive that data analysis often takes a back seat. Let's draw a comparison with marketing professionals. For marketers, campaign analysis is a natural, recurring task in their weekly and monthly workflows. It's an essential step they learn early in their online marketing careers. Transitioning this work to an analytical platform for cross-campaign analysis isn't a giant leap - it's still part of their core responsibilities. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5851f99e-95ad-481c-9b39-f62e9d572a88_1374x742.png) Product managers, on the other hand, face a different reality. They rarely have the luxury of time to analyze their features or overall product performance. And even when they find the time, there's no straightforward approach to product performance analysis. Try this: Go on LinkedIn and ask people how they measure product performance. You'll get a variety of responses but no clear consensus. Marketing metrics are comparatively simple: you have a website funnel, campaigns driving traffic to it, and you analyze how traffic from different campaign sources performs within this funnel. Conversion rates along this funnel give you a solid idea of marketing campaign performance. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe4aac9ba-945f-4064-99b7-439c4227c94f_1122x916.png) In the product world, it's far more complex. Users sign up, use various features differently, and eventually convert to a subscription. The myriad paths they can take make it challenging to pinpoint why they subscribe. It's tough to gauge the impact of specific feature developments on conversion rates - and remember, feature development is to product managers what campaigns are to marketers. So, we're dealing with a complex problem that lacks easy answers, coupled with limited time for product managers to dig into analytics. Add to this a third challenge: the sheer volume of data points. Marketing analytics might require tracking five touchpoints, often derivable from page views. Product analytics, even with a conservative approach, might need 20-30 events tracked. These must be instrumented initially and maintained consistently to ensure high-quality data for meaningful analysis. This trifecta - complex analyses, limited time, and resource-intensive instrumentation - makes widespread adoption of product analytics an uphill battle. Thanks for reading Hipster Data Stack! Subscribe for free to receive new posts and support my work. ## Why did Amplitude and Mixpanel succeed, though? After writing the previous paragraph, I had to ask myself: If product analytics is so complex, requiring significant time and good instrumentation, how did companies like Amplitude and Mixpanel succeed in the first place? Good question! Amplitude is a public company with substantial revenue and a large customer base. I've worked on numerous Amplitude projects as a consultant, so there's definitely market adoption. How is this possible when everything seems so complicated? In reality, there's always an ideal state and a current state. Product (and, by that, product analytics as well) has experienced a tremendous hype phase over the past decade. I'm probably not the right person to write about why product got into this hype phase. But certain factors contributed to it. There were numerous writings about the power of product and its success criteria. The lean startup movement, for instance, focused heavily on product development - how to start with a lean product, incrementally improve it, and get constant feedback. This created a hype cycle around the power of product and what it could bring to a business. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff46c5c2f-c62c-4801-a817-28ed8cb28072_714x284.png) Truth be told, when you have digital software, the product obviously plays a crucial role - without it, there would be no business. But the interesting question is: what kind of role does it play? That's not easy to answer. Returning to the product hype cycle, as more product teams emerged and companies invested in them, there was a natural curiosity to understand how these products actually work. You don't see anything in a digital product if you don't track it. Unlike old software, where you'd send a DVD to a customer and only get feedback through license renewals, one promise of digitally hosted cloud products or software-as-a-service was earlier feedback loops to determine if the product makes sense. I think this early eagerness drove people to invest in product analytics - to get some glimpses into product usage. In most cases today, these are still just glimpses. Most product analytics setups I know of provide small insights into how people use the product - how many people log in during a month, for example. Some go further to understand core activities within the product. However, it's still limited due to complexity, time constraints, and instrumentation challenges. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5eabc3e3-6e38-45c7-aaf0-00751885571f_1012x678.png) What can explain Amplitude and Mixpanel's success is that product plays a vital role in our digital world today, and there's at least a basic need for people to understand how a product works. This is what Amplitude and Mixpanel deliver to their customers at a minimum. However, it also creates a big gap between expectations and what the software can actually deliver. I have many conversations with people using product analytics who are disappointed and wondering why they don't get more out of it. Is it really so hard? First, I tell them, yes, it is hard. It takes work. Because of this, product analytics itself always has a hard time creating a hype cycle. The returns aren't guaranteed. Marketing analytics has an easier job here—a campaign manager can immediately get feedback on campaign performance, creating a value cycle. This feedback loop was always missing for product analytics - the loop where you can say, "Wow, we got this kind of value back. Therefore, product analytics totally makes sense." Additionally, the one thing that always helped product analytics was the success stories of companies that successfully implemented it. All the Uber and Airbnb stories showcasing how they used product usage data to build better features - this was the small hype cycle you could get in product analytics. But when people tried to implement it similarly, they failed because, again, it's not easy. These companies had a lot of resources to achieve that. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f45e93b31-4e4d-4fc6-b65f-40dfb300995b_1428x592.png) Amplitude and Mixpanel created the product analytics category (they were app analytics before). What made them successful was that they gave the first glimpses into how people actually use a product. But going beyond that was super hard for them and their users. This sets the scene for why it might be time to leave product analytics behind. ## Leaving product analytics - part 2 It's time to move beyond traditional product analytics—the kind we've been using for the past decade—those superficial glimpses into how people use our products. The problem, as I've mentioned before, is that today's product analytics is too granular. It's disconnected from how products deliver value and businesses genuinely operate. We've been working with the wrong incentives. When we ask, "How do people use our product?", we're setting ourselves up for misleading answers. The question itself is flawed. Instead, we should be asking: - "What value do people get from our product?" - "How quickly can they achieve this value?" - "How consistently do we deliver this value?" ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f1004113b-f558-4365-9b23-2abd543918d0_1220x330.png) And to bridge the gap to business metrics: - "How does the product contribute to our revenue?" - "How does it generate initial revenue when we have a free plan and the product serves as an acquisition channel?" - "How does the product contribute to ongoing revenue in a subscription model?" ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd4fc3968-5b92-4d53-95bb-fc71c43103a3_1950x254.png) These questions lead us to a fundamentally different approach to product analytics. They introduce a higher-level perspective on how we analyze things. This approach also better incorporates the multifaceted role that products play in modern software setups. In a typical subscription-based, web-hosted software model, products serve various functions. With a free plan, the product becomes an acquisition channel - a phase in the customer journey where users experience the product and determine if it solves their problem. If there's a successful match, they decide to pay for it. In this phase, the product takes on a completely different role compared to the retention role later. Once a customer signs up for a subscription, the product becomes crucial in maintaining the customer relationship. If the product consistently delivers value over time, customers will happily continue paying for it. Customers will switch if it fails to create value or if a competitor offers more value. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f27792db9-f05b-419e-a578-7a35b0a0cd72_1136x412.png) The scenario becomes more complex in enterprise setups where product usage isn't directly linked to contract signings. In cases where sales teams negotiate with CIOs for large license deals, product usage may play a secondary role. However, ensuring a positive user experience remains crucial. It can influence decisions within the company to stick with the product and create a ripple effect when users change jobs and recommend the product to new employers. Deriving value from a product is paramount, but we need to ensure our analytics setups focus on analyzing the product's value chain, not just how people interact with it. This shift in perspective allows us to align our product analytics more closely with business outcomes and customer success. ## How do we call this now? It's clear that product analytics as a term doesn't quite fit anymore. But categories, especially in tech, tend to stick around - people have these terms ingrained when thinking about specific jobs or problems. So, what's the next evolution of product analytics? And what should we call it? I see different trends in where classic product analytics vendors are heading. Some are moving towards what they call "customer analytics." I toyed with this term for a while, thinking it might be the answer. But I've come to realize that customer analytics falls into the same trap as product analytics - it's too granular. When we focus on customer-level analysis, tracking every touchpoint over their lifetime, we encounter the same issues as with product analytics. We end up with a wealth of granular data, which is a good foundation, but it's not where our analysis should end. We need to abstract two or three levels, focusing on value journeys and high-level customer paths. If we stay too granular, we miss the insights that truly create better customer experiences. This is why I'm wary of the term "customer analytics." I fear it won't deliver the results people are after, much like what we see in the current CDP market (hello CDPs). So what should we call it? It's something simple, but I understand why vendors shy away from it: business analytics—or good old business intelligence, even if it doesn't have the same buzz. Ultimately, any analytics work should analyze business impact. ## Business product analytics When we look at how people derive value from a product, we're always connecting it to lifetime value. We're always making that link to see how it impacts revenue. Even if we're looking at product usage dropping off, we're really looking at its eventual impact on subscription renewals and revenue. Interestingly, I'm seeing glimpses of this trend, even in classic business analysis. Financial departments are starting to realize that the metrics they've been working with are purely output metrics - things like revenue, which are the result of many preceding factors. While it's crucial to have an overview of output metrics, they're not the right level for operationalizing insights. You can't walk into a room and simply say, "We need to make more revenue." You need to break it down to an operational level. This is where business analytics in general is evolving. We'll still have the output layer showing overall business performance, but we're introducing layers beyond that. We're venturing into analyzing product values, customer movement through product stages, product adoption, marketing journeys, account discovery, and more. In essence, business analytics is expanding. What we used to call product analytics is becoming an essential part of a larger business analytics picture. We're finally in a position to connect the dots between what's happening in product, sales, customer success, and marketing to see the revenue outcomes. What I'm hoping for is a next generation of business analytics tools based on event data. Why? Working in product analytics, I've seen the power of event data structures. Unlike classic BI data, event data gives you a full sequence of an account or customer's lifetime. This allows us to build analyses that truly understand value generation. ![](/images/posts/sunsetting-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f02c16f15-89fb-4bdd-b443-8ac21400e630_1686x496.png) Event data enables datasets that can answer more complex questions, helping us understand our business better. Current product or event analytics tools can help to an extent, but I see a next generation of business analytics tools based on event data that goes beyond this, answering questions we've always wanted to address but never had the right setup for. This is why I'm concerned about what happened to NetSpring, which I'd consider a next-generation business analytics tool. Whether they were too early or something else didn't work out, I still hope to see other BI tools move in a similar direction. Finally, let's talk about at least two sentences about AI. If we can create a business analytics setup using metric trees to show product and business mechanics from output to input metrics, underpinned by an event data model with clear semantics, we could enable some fascinating AI applications. These could identify interesting outliers, segmentations, and developments much more effectively than throwing AI at the snapshot data we currently use in BI applications. There's an exciting future ahead, but I don't think it's called product analytics anymore. Let's see how it unfolds. Thanks for reading Hipster Data Stack! Subscribe for free to receive new posts and support my work.

Introducing user states in product analytics

Wed, 18 Sep 2024 00:00:00 GMT

It all started about a year ago. I was recording a video with the awesome [Juliana Jackson](https://www.linkedin.com/in/juliana-jackson/) about e-commerce analytics setups. At some point, we got into discussing user journeys in e-commerce. Juliana made this fascinating observation: in e-commerce analytics setups, we usually only look at the last user journey—the final buying conversion—and we optimize everything for that. You'll always see one funnel when you look at an analytics setup for an e-commerce shop. It typically starts somewhere on a product detail page, goes to checkout, and ends up in an order. But here's the thing—plenty more user journeys before that final purchase. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3f16dca2-9ee1-4f84-a333-7b70970beba8_1718x642.png) Sure, there are a few things we buy right away—maybe a toothbrush because we need one. For those, the single funnel makes sense. But for most other stuff we buy, it's a whole different story. Think about the last expensive thing you bought. My buying journey can sometimes take months. I go through all these different stages: from "That could be interesting" to "I forgot about it" to "Oh, that's interesting again." Then it's "let's do some research" to "Okay, I've narrowed down the options" to "I have a favorite" to "Should I buy that?" This can sometimes take weeks, even months, until I finally hit "okay, I'm ready. I'm buying it now." This is a great example of a user journey that we assume is pretty straightforward. But if we take two steps back and examine it, it's far more complex than we think. Thanks for reading Hipster Data Stack! Subscribe for free to receive new posts and support my work. ## The complex user journeys in Software as a Service or why product analytics is so complicated I started with an e-commerce example because, in analytics, it's the most straightforward setup. But as I've pointed out, it's not. Now, let's switch to a business model that feels way more complex by default: software as a service (SaaS). A typical SaaS customer journey can stretch over months. It might start with people just checking out the software - browsing marketing materials, maybe catching a demo, attending a webinar, or even chatting with someone from the company. Then, if possible, they might create an account. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fba08c301-ff1d-43b8-acb9-bc118c284e20_1546x632.png) Next, we're all hoping they'll start using the product. They might invest an hour figuring out what they can do. It could be successful, but maybe not. If it works out, they might come back and try something else. Hopefully, with each return, they start developing a habit and realizing, "Hey, this tool is beneficial for what I do." They keep using it, and at some point, it becomes their go-to tool. Then, depending on how the SaaS is designed, they might hit a wall. The product team builds this wall somewhere to say, "Looks like you're getting a lot of value from this tool. Time to pay us for it." Product teams experiment a lot with when and how to do this. But we assume this person has figured out the software is super helpful, so they're happy to hand over their credit card details and get a subscription. And that's when the real story kicks off. The subscription runs in the background, renewing every month. You're still hoping people keep using your product in a way that gives them value over time. But this relationship can fade. Maybe their use case disappears, they find things that bug them, or, worst case, they discover an alternative that does the job better. They might start using it less often, then stop altogether. Obviously, they'll cancel the subscription at some point because who wants to pay for something they don't use? This whole journey is already complex. Tracking it is a huge undertaking. And we're not even touching on the fact that there are different scenarios for product usage. Every product has five to ten dominant use cases or jobs to be done that people tackle with this tool. When we factor all this in, it explains nicely why so many people struggle with product analytics. It's significantly hard because we're looking at very complex user journeys that can happen over a long period and involve loads of different features within the product. So, there's no easy way to answer, "Hey, which feature is the most important to convert people to?” There's no simple answer to which usage pattern can predict that someone will subscribe. These are important questions, but they're extremely tough to crack. We need to find a different way to approach this. ## Managing complexity One way to handle complexity is to find a simpler model that can still explain patterns within the mess but it's easy enough to work with and valuable. The classic checkout funnel is a good example. Sure, people can take different routes to check out, maybe even do steps in between, but you usually take a simplified version. You define five steps and analyze that because you figure, okay, everything that happens in between can have an influence, but it's optional that we need to bring it into the model. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4eca9864-cea8-472c-a629-e6249caad43e_1400x814.png) So, the challenge we're facing in product analytics is this: We need to find a model that makes it easier to really grasp what's going on in user journeys. At the same time, it needs to be robust enough that when we draw conclusions from this model and apply them to our products, it'll create the impact we're hoping for. We can't make the model too simple or end up with useless data points from all the data we collect. It took me a long time to develop different ideas for this. It took over eight to ten years. I tested all these other ideas over time, and some worked well, but it was never something where I'd say, "This is it. I'd deploy this everywhere." My newest iteration of all these thoughts, and the most promising so far, is the idea of user states. Let me explain what user states mean. ## What are user states? This concept might feel more familiar if you've got some experience with games, be it computer games or role-playing games. In a classic role-playing game, a player starts as a novice or a beginner. They've got some basic skills, and they level up over time by going on different adventures. They change their skills, improve them, and develop different characteristics. Most games have this concept called levels. It's usually a motivation for people to keep playing, like, "Hey, what happens at level 45?" Or it unlocks things you can only do at level 45. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd4667908-c9f5-45f1-bbaf-24c6ff80d904_768x439-png.jpg) We can apply this same idea to user journeys or usage within our products. When someone signs up for your product and creates an account, they're fresh. They're starting with nothing. They're the novices, the beginners who must develop skills to use your product. If you're doing an excellent job with activation, like having a perfect way to get people to one or two first success moments using your tool, they develop some skills. So they move up one or two levels on their journey. They've moved from a beginner to an activated user because we now believe they've got the basics of the product down. At a high level, the next step for us would be getting to a point where this user is coming back after two or three days, doing similar steps, or even expanding their knowledge within the tool, maybe trying one or two other use cases, becoming more proficient. When they do this over a long period, we see in the data they've been at it for eight weeks, and they keep coming back; we can move them to a level we might call active user, current user, or pro user, whatever you want to call it. Then, the flip side can happen, like I described before, where they come back infrequently. They don't use us anymore. We can track this with a different state, too. For example, we might flag these users after two or three weeks as "at risk" users. At some point, we can also say these users have churned from our product perspective, even if they might still have a subscription running. But from a product view, we can see they don't use us anymore, so we flag them as churned. This is a very high-level user state model, but it already makes it much easier to understand how people are progressing and journeying through your product. Obviously, we want to move lots of people to active users or pro users, or whatever we want to call them. And hopefully, we don't want to see them move to "at risk" or to churn. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff5a7aafa-7a0b-42a4-b79d-ee5df4417b7d_1512x708.png) ## The five core user states This is our starting point. This is the highest level of user states I'd kick off with. It's like the bird's-eye view of the user journey in a product. Now, there are deeper levels to this. You could have different stages of activation or various levels of activity for your current or active users. But start here. This is why I call them the five core user states. It's a solid foundation to build on. You can get this nailed down first and then dive into the nitty-gritty if necessary. ### New users These individuals signed up within a specific period. Depending on how you're analyzing and looking at data daily, these people signed up during that day. For monthly analysis, those who signed up during signed up during that month. The definition of "signing up" can vary depending on your product. If you have an app without formal accounts, people might have installed the app. For products with a standard account system, it is when someone has created an account. ### Activated users I also prefer to include activated users, as this is the first essential step. In my experience, transitioning from new to activated users often presents the most significant improvement potential. It's typically where you'll encounter your most significant losses. I advocate for a specific state to truly understand what constitutes an activated user. One aspect to consider is whether activated users can only come from new users or if people can become activated later. You'll need to investigate this in your data. From my experience, there's usually a very short period between being a new user and becoming an activated user. However, this might vary depending on the nature of your product. ### Active users I've used the te for the next stage despite some c” despite some challenges with this label. The term is often used rather generically, which can lead to misunderstandings. We must develop a precise, granular definition of an active user in our context. This involves various interactions that users engage in, and it's a definition that requires ongoing refinement. The "active user" metric you see in most analytics tools lacks this refinement level, making it somewhat risky to use. However, I've decided to stick with this term because it's widely understood and used in the industry. By using familiar terminology, I hope to make the concept more accessible and easier for people to grasp. Active users are our ideal state, and we would love to keep people forever. If you go deeper and break up active users into different segments, it is also potentially the state with the most depth. ### At Risk users At-risk users were active in a previous period but have not been active in the most recent period we're analyzing. For example, when looking at last month's numbers, we'd consider active users in the months before last month but not during the previous month itself as at-risk. This concept can be applied to various periods. If your product expects or aims for more frequent user engagement, such as daily or weekly, you can adjust these periods to define at-risk users. The key is identifying the gap between a user's last activity and the current period, which signals they might be at risk of disengagement. ### Dormant Users These users are essentially product churned, but I prefer "dormant" as it's more straightforward. While we typically consider users churned when they cancel their subscriptions, users often stop using our product well before that point. This can occur significantly earlier than a subscription cancellation, making it a valuable indicator of subscriptions at high risk. The term "dormant users" more accurately describes this state. These users were active at some point but haven't been active for a defined recent period. The specific time frames depend on your reporting periods and product expectations. For example, dormant users are those not active in the last 90 days but active in the 180 days before. This definition allows you to identify users who have disengaged from your product before they officially churn, allowing you to re-engage them. You can map out and analyze your product's high-level performance with these five states. By examining the transitions between these states, you'll quickly see if you're losing too many users or doing well in activating or successfully retaining active users. This provides early signals if something isn't working with your product, offering an efficient way to monitor product performance over time. Your critical task is defining what constitutes an activated and active user. Once you've established these, the other states become more easily defined. However, this definition process takes time, requires data analysis, and will likely need refinement to work optimally. It's important to note that updating your definition of an active user will influence your metrics. For instance, if you start with a broad, simple definition and later decide to raise the bar, your metrics will likely decrease after applying the new definition. This is fine, but it's crucial to communicate these changes effectively so stakeholders understand the metrics shift. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f15b37fad-c78d-45eb-ab98-37aa3eb7fc29_1616x796.png) However, even this simple model of user states gives us much more control over investigating and researching. For example, why are we only moving 20 percent of new users, beginners, into activated users? There is a massive gap of 80 percent where we're not doing a good job. This translates into very operational product or customer success work, like talking to people and asking them what's missing and why they're not getting it. Then, we have to work on different kinds of onboarding scenarios. And then we have two metrics where we can immediately see, "Hey, we're improving. We're now getting 30, 40 percent of people into the activated state." So we're making progress here. You can do the same with all the other transitions between the different states. This makes states extremely interesting as a straightforward model to understand how your product is performing. Thanks for reading Hipster Data Stack! Subscribe for free to receive new posts and support my work. ## The user states in practice [Duolingo's growth model](https://blog.duolingo.com/growth-model-duolingo/): ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fba5ce47f-86a7-4562-9064-da5704c8d3b0_2000x1107-png.jpg) This whole thing isn't new. I know quite a few companies that use this model in their product analytics setup to understand how users progress through their products. Duolingo, for example, published one of the best posts I've read on product analytics about a year ago. They described their growth model, which is what I just talked about—how people move through different states of activity in their product. They use terms like new users, current users, reactivated users, resurrected users, and then at-risk users (they have two different levels of at-risk), and dormant users. They analyze how people transition between these states. This is the foundation for all their work to keep as many people as possible in the current user state. I highly recommend reading this post. ### How to implement it A good approach is to sit down with all the people involved with the product. That means the product team, obviously, but also customer success if you have it, customer service, development (remember them!), potentially sales, and maybe marketing who've done some research. You can invite a big group. Then, start discussing what everyone thinks makes an activated user and what makes an active or current user. You'll come up with different versions. Your first task will be to make it as simple as possible. It's not a good job if you end up with a rule set for activated users that's 20 lines long. It should be simple to start with. ![](/images/posts/introducing-user-states-in-product/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd048b785-bcc3-4c5a-9689-8a00474bc563_956x548.png) #### The technical implementation Now, for the technical implementation part. I've got some bad news for you—it's not easy. The obvious choice would be to use product analytics or event analytics tools because, well, they're built for this. But setting these things up isn't straightforward. **In Amplitude or Mixpanel** The usual approach in a tool like Amplitude or Mixpanel would be cohorts or segmentation. In both tools, you can set up something like, "I want to include people in a group who were new (so they have a new account in the last 30 days) and who then did these two events three times within the last 30 days." You'd call this “activated users.” It's possible to build something like that in there. It gets more complex and intricate rule sets and gets trickier when you work with different periods. For example, an at-risk user might be someone who was active 30 to 60 days ago but has not been active in the last 30 days. This is the part that complicates things in product analytics tools. It's also super hard to test if your definition works. I did a video about how you can set these things up, but I still need to be happier with how the tools support this. But it's a start. You can begin there, and I know it's possible to build it. It just takes a while, and it takes some testing and experimenting. **In your warehouse** The other way to implement this, obviously, is in your data warehouse with a data model. You'd have to create a model that describes exactly what I've been talking about. One approach, and this is something I want to invest more time in and write about in the future, is to have constant user state tables that you generate. These would also develop a history over time. In the end, for each user ID, you'd have a history of which states they've been in over time, over the months. When you can develop this, it makes it pretty easy for you later to say, "Hey, we have this many people now in active, this many in churned or dormant," and so on. But this is what you'd have to create in your data warehouse. ## What's next First off, I really hope more and more product teams or companies will implement at least this simple model. That way, they'll have a baseline and foundation to understand product performance. This is kind of my mission - to make this approach more visible to more people so we can see more implementations. I hope that we see better tool support for user states. I know about at least one new product analytics product that is built around this concept but is not available yet (follow [Marko on LinkedIn](https://www.linkedin.com/in/marko-j-6b77471a/) to see updates). So there is hope. Now, the next steps can go pretty broad. You might pick specific user states and dive one level deeper. Let's say in an active user scenario, as I mentioned before, your product usually has different jobs to be done. So the next obvious step would be to analyze, "Okay, we have this set of active users, but what jobs are they actually doing?" This gives you a good indication of which parts of your product are really important and better answers the question of what features matter. But you need to approach it from a jobs-to-be-done perspective. At some point - and this is something I'm still tinkering with - maybe we can even introduce a leveling system in our products. It would be super nerdy and super interesting, but maybe also a bit over-engineered. I don't know yet. I'll have to do some experimenting with that. But for me, I hope the major takeaway you get from this post is that you sit down and say, "Okay, we really want to implement these five core user states." That's the big win here. Thanks for reading Hipster Data Stack! Subscribe for free to receive new posts and support my work.

Combine Product analytics with Subscription data - Part 2

Tue, 14 May 2024 00:00:00 GMT

Even after ten years of working in this field, one magic moment is always when it all comes together. In this case, we have added new event data that shows up and modeled new events in our data warehouse, which is ready to be synced in our product analytics setup. Now, you log into your product analytics tool, and everything is in front of you, just waiting for the insights to be unlocked. This is a magic moment but also sometimes a scary one. The obvious question is, “What should we do next with all this new data?”. This post shows how I approach this as a first step before doing a deeper analysis. This is part two of a series, and I recommend you start with part one to get the full picture of what we are building here. [Combine Product analytics with Subscription data - Part 1](https://hipster-data-show.ghost.io/combine-product-analytics-with-subscription/) — When working in product analytics, I had one painful experience. Occasionally, I chatted with the people from BI, and they told me about this magic thing called Data Warehouse. It sounded like a paradise—the vast amount of information and data from all these systems that were out of reach for me. This post, in comparison to former posts, is tool-specific. I want to show an end-to-end implementation that shows all the steps. Therefore, I need to show these steps in a product analytics tool. I currently collaborate closely with Mixpanel, especially on their Warehouse integration (since this has been my most important feature in product analytics in the last five years). Therefore, this implementation uses Mixpanel since I know all the steps best using this tool. But everything I do here can be implemented with any other product analytics tool (in slightly different ways). Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Sync Data Warehouse data to Mixpanel You might know Mixpanel classically. You used an SDK to send events directly into Mixpanel from a client or server environment. Since last year, we have had another option, which is even more powerful than the old way—a sync from the data warehouse. There are two ways to sync the data into Mixpanel: a normal sync and the new mirror function. **The normal sync** The normal sync works like most reverse ETL products. You define a table with your event data or with user properties (more about the two a bit later), set this up in Mixpanel, and define how often you want to sync it: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8aa005a4-c4c5-4965-89c1-55af41963a24_2618x1302.png) What is important to know: This sync only works for new records. Usually, the updates are timestamp-based. So, the job looks for new event data based on the timestamps and loads them. This is no problem when the structure of your event does not need to be adapted for historical data. This means that when you introduce a property, it will only be available for new events. **Recommendation**: Use one table per event. This enables you to introduce new events later (because you modeled them), and you can still sync the full history into Mixpanel. **The mirror sync** Mirror was introduced about [five months ago](https://mixpanel.com/blog/mirror-product-analytics-data-warehouse-sync/). The big difference is the kind of data synced into Mixpanel. Instead of doing the sync based on the timestamp, Mirror checks for what kind of records have changed in the source table and then sync all these changes into Mixpanel. So, it is a bit like change data capture. This enables you also to change event tables for all historical data. For example, you could introduce a new calculated property like account\_num\_orders that you want to use for all historical events. With the normal sync, this would only apply to the new events. But with Mirror, you could update all events. And we can do even something more crazy - delete events or overwrite property values. Can you remember these data points where you have a peak in revenue? It was not because of your awesome marketing team but because someone released an issue with the revenue property. You can fix this now. Mirror is still in beta. If you would like to test it, contact your Mixpanel Account manager. ### What data can we sync? **Event data** All event data - when we have a timestamp, an identifier, and an event name- can be synced into Mixpanel even better when we have some property values. This also means you can sync historical event data from different tools like Segment or Google Analytics 4. **User properties** We can sync any user properties into Mixpanel. The only requirement is to have the user ID matching the Mixpanel one and a JSON field with the properties. This is an extremely powerful and easy way to enrich user profiles with data from additional sources like your production database, CRM, or customer support tool. **Lookup tables** We might have sensitive data in our data warehouse that we would never expose with a frontend SDK - like product margins in an online shop. Another use case for look-up tables is calculated metrics, like the number of users in an account. **Ad spent data** This is a powerful way to extend the existing campaign data with cost information for all marketing analyses. These can be obvious candidates like Google Ad, or Meta Ad spent, but they can also include costs like influencer costs or costs for SEO initiatives. Combining all these cost items, you can support essential Marketing metrics like the Return on Ad Spent. But Ad spent data already foreshadows what is possible beyond that. You can provide any aggregated data to Mixpanel that enables you to combine sequence based metrics with metrics calculated by aggregated data. For our setup, we sync the events we have created before. Creating another user property table could also be interesting, but it is out of the scope of this post. ## Creating a metrics dashboard in Mixpanel Before I started working on this series, I talked with Abhi Sivasailam about metric trees. You can watch it here: [https://youtube.com/live/SLnvoOZ2vk0?feature=share](https://youtube.com/live/SLnvoOZ2vk0?feature=share) Naturally, Abhi is pretty inspirational, so before I got started with this project, I created this metric tree: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd30be3fc-bb97-4cb6-b463-4e75aac7e5a9_1954x1322.png) At the top, we have the classic MRR bridge (in a minimal version here). This can be built with the new Stripe data we sync into Mixpanel. Below, we have the product usage data already collected in Mixpanel in this scenario. As you can see, combining both data as event data enables us to calculate subscriptions at risk metrics based on the product usage data. This is where we want to get to. But in step one, we want to create a dashboard tree. A dashboard tree enables us to provide our company with a report that starts on a high-level, business-relevant view (The MRR) and then drills down the metrics tree to show the leading metrics to any input metric. This is extremely helpful in review and planning meetings where you can perform these drill-downs to identify areas where the next work should be invested. Conducting a Root Cause analysis in case of a metric performance change is also helpful. Ergest wrote a great article about that: [Debugging Your Business with Data](https://sqlpatterns.com/p/debugging-your-business-with-data) — I have a treat for you this week! This is one of my longest and mnost in depth pieces but it still barely scratches the surface. So strap in, grab a cup of your favorite hot beverage and enjoy. When I was a data analyst there was one task I absolutely dreaded having to do: “Hey Ergest, sales are down 5% what’s going on?” Our dashboard tree version 1 can look like this: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f63c4876e-3e39-4925-9b23-d4a138b24e0e_1212x510.png) Let's get to work. We build the **new MRR** first. We use the "Subscription created" event to build this, which we are now syncing into Mixpanel. But for the MRR, we need to use the subscription\_amount property as an aggregated SUM: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f1975e28d-d848-4bba-9adb-24e3244c38ee_1626x1024.png) And we did the same for retained MRR and canceled MRR. With that, we can build the Total MRR metric: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f80c562bc-e1b6-47de-8a23-9010d752249d_1656x890.png) As you can see, here we use the formula function to combine the MRR metrics we have built before to calculate the total MRR. Nice; with all this, we can build the first level of our dashboard tree: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f57f3df46-80b3-4164-8589-b27ce668578b_2380x1352.png) With Mixpanel's Dashboard in Dashboard function, we can now build and link the next level: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb8c57500-e75a-4a67-893d-82b17376a7f9_2140x890-png.jpg) Going one level down to the Accounts: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f255b88f4-ee06-444f-ae16-73954ad756db_1816x1030.png) The dashboard tree is a great addition to our current setup, and it is great that we can now cover all metrics in Mixpanel. But we did not achieve something really new. We could have achieved a similar output by loading the Mixpanel into our Data Warehouse and combining it with the Stripe data. Then, do the dashboard tree in a BI tool. So, we need to get a step further. ## Do event analytics with product and subscription data. As mentioned earlier, we combine both data sources in Mixpanel to analyze the sequences of accounts/users across product usage and subscription events. We can start with a simple but powerful funnel. Because now we can actually map the whole customer journey and not just some parts. ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc86c490f-23b3-4260-9a1f-577e08b81156_1844x816-png.jpg) Here, we have the first version of the customer journey funnel, starting at the board created and ending (so far) at the subscription created. As we can see, we have a serious monetarization problem (luckily, it is just a demo dataset). Over time, we can extend this to include subscription renewal or other user journey stages. The next powerful insight is a report that we can prepare for the marketing team that checks the funnel performance based on the initial campaign that brought the user into our app: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4bfd6997-4243-40c9-b995-6c9b7949e476_1796x750-png.jpg) Here, marketing can investigate which initial initiatives drive users into the application and have a higher chance of converting them into subscription accounts. Since we have subscriptions, we don't want to lose them easily. Therefore, we want to create something that combines product and subscription data and immediately helps the customer success team. We create a new cohort: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6b0838d5-884a-4287-8fac-d2eed7ec6bf7_1364x800-png.jpg) This cohort now includes all users with an active subscription but with no essential board activity in the last 60 days. These are our churn risk users. We need to get these to our customer success team: ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb06e2374-a58b-4c33-bc7d-fab536bde44e_1980x1072-png.jpg) For this, we use the cohort sync that I can use in Mixpanel to sync these users' emails, for example, to Braze, where our customer success team can create an outreach email campaign to learn more about what is missing for the users. This brings us to the point where we can now analyze the customer journey and optimize the customer experience in the next step. It might be just a small glance that I showed here, but the approach to bring all customer touchpoint data into one place and then use analytics to surface new patterns and use these insights to enable different communication and experience directly is the essence of the new way to do analytics. In the comments, let me know what kind of follow-up questions you would like to get covered in a future post about this topic. This post is sponsored by Mixpanel. For me, the simple reason is that I use Mixpanel's Data Warehouse sync in most of my projects. As described above, I want to combine data warehouse data with product data (and I even now sync the product events from the data warehouse), but I like my usual analyst experience. This is the reason why I use this combination so often. You can learn more here ([https://link.timodechau.com/mixpanel-s](https://link.timodechau.com/mixpanel-s)) ![](/images/posts/combine-product-analytics-with-subscription-4f1/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff569ecc5-52ed-465b-a38d-cf3d19782cc7_2200x1238-png.jpg)

Combine Product analytics with Subscription data - Part 1

Wed, 01 May 2024 00:00:00 GMT

When working in product analytics, I had one painful experience. Occasionally, I chatted with the people from BI, and they told me about this magic thing called Data Warehouse. It sounded like a paradise—the vast amount of information and data from all these systems that were out of reach for me. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. "Well, we do a lot of reporting with the subscription data, you know. The management loves these MRR dashboards." To be honest, I was deeply jealous about this. I also wanted to create something that the management loved. They only asked me to tell them what features are helpful. That was one of the reasons why I was always on the lookout for ways to enhance product analytics data. I tried them all - server-side SDKs and proxing webhooks. It got me enriched data, but it was tedious and not really a controllable way to move forward. My goal was clear: I wanted the Data Warehouse experience for product analytics. So, I went for extreme measures and even switched to the BI side, building modern data stacks. Here, I tried to find ways to do event data modeling (which was already complicated) and finally found a way to do efficient sequence analysis and discovery (something nastily slow with SQL). It took four years until the light was at the end of the tunnel. I had a formalized way to handle event data modeling (activity schema), and finally, the first product analytics tools in the data warehouse arrived on the scene. And the big solutions like Mixpanel followed with their data warehouse offering. Finally, I could do the cool things the BI people did before. Obviously, I wanted to start with subscription data (I wanted some love). ## What do we want to achieve? In this project, we combine behavioral product data of a whiteboard application with the subscription data we have already pulled in our data warehouse using Fivetran (I just picked it because I assume this is the most common integration - but it has issues; more on this later). ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fcdc4a102-d10f-4c0b-8795-023323737f5e_2042x1252.png) The behavioral data is already in Mixpanel. The subscription data is stored as snapshots in our data warehouse. Our task is to bring both of them together in Mixpanel. Why in Mixpanel? I explain in a second. Finally, we want to create our core metrics and investigate the impact of behavior on subscription-related activities like subscriptions created or canceled. ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9b950e61-02aa-4e78-8790-490459b3e5e5_1586x264.png) ## All data in the Data Warehouse or Mixpanel? This topic alone could be a substack post. But we keep it shorter here for the sake of readability. First of all, both ways work. The natural first instinct would be to pull everything into the data warehouse. This is the classic bi-driven modern data stack approach. All data sources come together in one place to be transformed and modeled. If we would only be interested in reporting metrics and some dimensions, this would be no problem. We would use the subscription data to calculate all subscription-relevant metrics and the mix panel data to calculate all product-related metrics. But I want to go a step further. I want to analyze the sequences that include behavioral and subscription events because first, I want to understand better what activities signal a subscription activity (like created or canceled). Second, based on that, I can support the customer success and growth teams with the right audiences for specific communication tests. And yes, I could do this with SQL as well - but to be honest, I can spend my time far better - this is a usual SQL query for just a simple funnel analysis: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f36a423c0-4f2c-4eca-a4e8-2bfdc8b4574c_950x1420-png.jpg) [https://medium.com/cube-dev/sql-queries-for-funnel-analysis-35d5e456371d](https://medium.com/cube-dev/sql-queries-for-funnel-analysis-35d5e456371d) That's why we combine everything here in Mixpanel since it unlocks more possibilities for analysis. In this way, we can show the MRR metrics and investigate how product behavior influences subscription revenue. In Mixpanel, we have the opportunity to explore funnel, retention, and cohorts to identify the group of users that have a significant impact on MRR growth. And before we can do this, we need the data to be in event shape. However, not all data comes that way, so we need to eventify our Stripe data first. ## Eventify Stripe data To be fair, there is a way to get Stripe event data. Stripe offers an event API endpoint where you can get an extensive log of all activities within Stripe: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f62414e49-5cc6-48dc-9d96-180d259dba0b_1502x540.png) [https://docs.stripe.com/api/events](https://docs.stripe.com/api/events) But if you use Fivetran to load Stripe data in your data warehouse, you won't get the event data (if you use Airbyte, you can get it). So, we picked the Fivetran example here since it is a good exercise for learning how to transform snapshot data into event data. A process that I call Eventify and described here: [Eventify everything - Data modeling for event data](https://hipster-data-show.ghost.io/eventify-everything-data-modeling/) — I started out with data modeling by supporting classic e-commerce marketing use cases. On a high level, this was all pretty straightforward. We usually had one core event - the order. And we build everything around that: sessions and marketing attribution. So we could end up with 1-2 fact tables just for the orders and some more dimension tables. In a nutshell, we look for timestamps and identifiers. Everything that has a timestamp is a candidate for an event. So, this can be a created\_at column in the production database table combined with a user and/or customer ID. So what do we get? The starting point for all of this is naturally the Fivetran docs about the Stripe integration: [https://fivetran.com/docs/connectors/applications/stripe](https://fivetran.com/docs/connectors/applications/stripe) From there, I want to check the schema: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f90b905c7-41a3-40f4-bd52-d9892749965c_1946x1124.png) [https://docs.google.com/presentation/d/1zyxgbaOjgBt3NsY0OfsiGsWDIefcBc-R1lHWlMltCYU/edit#slide=id.g25ac7999fa1\_0\_0](https://docs.google.com/presentation/d/1zyxgbaOjgBt3NsY0OfsiGsWDIefcBc-R1lHWlMltCYU/edit#slide=id.g25ac7999fa1_0_0) This helps me to look for potential event candidates. Fivetran, for example, uses history tables for some integrations. History tables are a great start to finding event data since they materialize the activities' results. Here we have a subscription history table: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f38785097-d537-436b-81dc-a9cab77c789f_614x834.png) So, we keep this in mind for our event sourcing. Invoices are another good candidate since they are usually created when a subscription is renewed, which leads us to another event. In the end, we look for occurrences of timestamps and identifiers. ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f41ede312-f215-4857-9691-22e0c3075e16_1822x466.png) ### Let's start eventifing First **"subscription created"** - the essential event we can use to calculate the New MRR metric. We have two places to get it: the subscription history and the invoices table. Both should hold the information about when a subscription has been created. I pick subscription history since it is closer to the object. Looking at the subscription history table, I can find the \`start\_date\` and \`created\` fields. That looks good. Now, we need to understand the difference. Luckily, Stripe has one of the best-documented APIs around: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb12d7301-95c4-4d76-8556-efd5c86b626f_992x162-png.jpg) ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2faf00b85d-5d1b-4ed0-80a6-a5ad0b15629b_1056x158.png) It comes down to how we have handled subscriptions so far. We could not use created if we had backfilled subscriptions when we migrated to Stripe. In general, start\_date looks like the better pick. Now we need an identifier - the best candidate is naturally \`customer.\` But this is Stripe's customer ID, which is likely not what you use for your product. So, you might need to join/map this with your application account/user ID. To create this event, our query would look like this: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f354b3cad-e5e8-417d-8b55-58a15454a958_1068x378-png.jpg) Here, we get the information from the history table that is relevant for us, and we build a row number rank because we are just interested in the first instances of all subscriptions. ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7cc36425-80c4-44be-95c2-fe74a7a0408c_1210x454-png.jpg) In the next step, we get more context information by joining subscription\_item and plan tables. We can use this context later to calculate the MRR and break it down according to the different plans. Here, we also make sure to pick only the first item. ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc8c513dc-2062-460e-9e7c-becda4df63d0_1908x378-png.jpg) In this final step, we bring it all into the schema of the ActivitySchema. This helps us model all events similarly and easily combine them in one table if needed. **\`Subscription renewed\`** - should be triggered whenever the subscription cycle ends and is renewed (automatically). It might also be possible to get this information from the subscription history table since the current period, start and end, is noted here, and a new entry will be created with each change. But here I went for the invoice table, assuming every subscription renewal creates a new invoice. This needs to be checked with your data - I check different samples to get a good idea about the idea and then decide about the table I use. The model looks like this: ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f842cf609-6c0c-423b-a34d-ad39327dc02a_1344x422-png.jpg) Again, we build a row\_number to exclude the first invoice generated when the subscription was created. Here, we assume that all invoices are related to the subscription. This can be different in your setup. ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff736010a-14a0-4a81-957b-77db01bf22c1_714x356-png.jpg) The final step is again to bring everything into ActivitySchema. You can find the entire repository with the data model here: [https://github.com/deepskydata/demo\_stripe\_activities](https://github.com/deepskydata/demo_stripe_activities) With our first events created, it's time to combine the data with our product data in Mixpanel. We will continue this in Part 2 (I know, brutal cliffhanger). This post is sponsored by Mixpanel. For me, the simple reason is that I use Mixpanel's Data Warehouse sync in most of my projects. As described above, I want to combine data warehouse data with product data (and I even now sync the product events from the data warehouse), but I like my usual analyst experience. This is the reason why I use this combination so often. You can learn more here ([https://link.timodechau.com/mixpanel-s](https://link.timodechau.com/mixpanel-s)) ![](/images/posts/combine-product-analytics-with-subscription/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff569ecc5-52ed-465b-a38d-cf3d19782cc7_2200x1238-png-1.jpg) Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work.

How to measure a data platform?

Tue, 05 Mar 2024 00:00:00 GMT

I have to be honest. You can easily trigger me. Just write something: "How would you measure a \[add your business model\] from a product perspective?" - this immediately sets my brain in motion, and I usually can't stop it. It starts to think about metrics, potential north stars, and core activities. It creates graphs; it creates schemas. It is unstoppable. So don't trigger me (or please do). [Richa Verma](https://www.linkedin.com/in/richa-verma-berkeley/) did not trigger me directly or intentionally, but she wrote a really good piece about how to do Product analytics for a data platform. [Product Analytics for Platform Products](https://thedataproductmanager.substack.com/p/product-analytics-for-platform-products) — Hey Data PMs! Welcome back to another edition of the Data PM Gazette. Things have been busy at work — hence the late post. However, this edition is special, it combines and talks about two different things that I like: product and analytics. I got into product because I love doing analytics and telling stories from data that otherwise wasn’t obvious or p… She had my attention immediately. And her post shows a good way to measure a data platform from a technical perspective. There are a bunch of things you can take away. But it left me wondering - what would it look like from a classic product perspective focusing more on the user and business value? And here, my brain kicked in. I read the post while waiting for our flight at the airport with my family. We had a 90m wait time, and obviously, there was not much to do. The kids were busy with kids’ things, so I had some headspace to let my head wander around to see how I would approach this question. The first thing that came up was what a metric tree or map would look like. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## The metric tree for a data platform ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff7c69cfe-d97c-43b0-a433-ffe81b8a76d4_2084x1364.png) [Link to the Miro Board](https://miro.com/app/board/uXjVNk6DF9w=/?moveToWidget=3458764581307710132&cot=14) Data platforms have slightly different business models. But we often see consumption-based pricing, usually based on the job runtimes, and subscription-based pricing, which could include several free run seconds. Quite frequently, they are even combined. Even when you offer a consumption-based model, tracking a free subscription, aka active account, can also make sense since it is a good baseline for account engagement. I don't explicitly handle free job runs in my initial metric tree. Offering free runs or seconds for runs before you get charged is pretty common. As a first implementation, I would cover the free tier as dimensional values. So, a job executed can have a billable dimension to indicate if it was billable or free. But a v2 could also explicitly map the free-run metrics. Let's break it down. I decided on Total Revenue / Month as the primary output metric. It is monetary and indicates how the business is developing over time. We could extend that in the next version to introduce a profit metric by incorporating costs (platform, marketing, sales costs). But we want to start simple. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8f2138bc-5a78-4e37-aa8d-9999a378b053_774x240.png) We go into two tracks from the total revenue, one for the compute (consumption) and the second for the subscription. Revenue from both defines the total revenue. Let's look at the compute first since published metric trees often do not cover this. The computed revenue is built up by the computed seconds and the average price per second. The avg. price per second will be calculated by the total compute revenue divided by the total compute seconds in this month. The revenue for each compute run can be added as a dimensional value and then used for sum aggregation. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9ca34e45-a338-457a-b159-4d74ee7cadb9_1096x394.png) The total computed seconds is calculated by the total job runs and the average seconds per run. As before, the avg. seconds will be calculated by dividing the total seconds by the number of runs and the seconds provided as a dimension to each run. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc67fa615-a78f-4cfa-9d7d-28c999805638_1092x162.png) On the next level, we look at scheduled and triggered job runs. This is already a dimensional breakdown. We could also offer this as a filter option in our BI tool. But I like to break it out here already since this information is helpful for product and customer success to enable more scheduled runs, which have a more positive impact on the computed seconds and are also a good indicator for a received value and trust in the platform. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fed89b67b-c7a1-4ff4-9807-7fac7d1db7da_1156x194.png) Below, we look at the job development during this month. And again, we are explicit about scheduled and unscheduled here as well. Scheduled jobs are something we are aiming for. The pattern here should be familiar to you when you have worked with subscription metrics. We want to show how the total at the end of the month came to be. So we check how many new scheduled jobs have been created, how many have been unscheduled, and how many have been deleted, and we look at the retained scheduled jobs, the ones that we had at the start of the month and kept them. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb4ca4974-6806-4894-8be5-bd5277b73f15_1146x400.png) Additionally, we can do the same for unscheduled jobs. This depends on the amount of them. If they are just a side-aspect I would not add this break-down now. At the bottom of the first version, we have the account block. Here, we check for new, churned, and retained accounts. This is the essential baseline for our tree, from which we could also go deeper by linking marketing metrics to generate new accounts. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd433ed5d-961b-4602-81d1-61dfd75d5efb_1620x328.png) The subscription part is quite familiar; the details depend on the subscription model. We have the classic MRR bridge with total MRR as output metrics. Then, new MRR, expansion MRR, contraction MRR, retained MRR, and churned MRR are related metrics. These are then connected to the account metrics since we need an account to create a subscription. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f60d7b09e-1ee4-48ca-8b8e-63709537e717_1510x548.png) As written before, this is a v1. This tree or graph can quickly expand to include the specific work of all the different teams, such as marketing, customer success, development, and product. But this early version already enables us to understand our revenue mechanics better. It especially gives us a view into the compute part which is (most likely) the main driver of our growth. Getting as many details here as possible is important since it is also a tricky growth driver. Indeed, the more compute people use on our platform, the more revenue we make. But we also increase the incentive for cost savings. At a specific threshold, people will become cost-aware since our platform might now be one of the most significant cost items in their budget. The product team's job is to ensure optimization functions are in place to manage costs. Therefore, we must develop metric indicators informing us about these risks. The first simple step is the block about failed compute seconds based on failed jobs and the average seconds. This is an excellent first indicator of an area where our product does not provide value. Not because of us but as a good way for us to intervene and support from the product side (for example, pausing failed jobs after the X run). ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffe1e2cc3-9b59-4513-84c9-5ed365b5694f_1004x352.png) This first metric tree gives us a good setup to include more data in our decision-making. But before we can do this, we need to get the data for the metrics. In the next version (v1.2), I would definitely add the cost information. The costs of job runs and, therefore, the margin are the main output metrics. ## From metrics to activities There are naturally different ways to get data for metrics. The classic way would be searching source tables or existing metrics tables for metrics covered already. But I like to show an alternative way here. Not a big surprise to you if you have read other posts by me; I will do this by using event data. We will pick each metric and break down which entity + activity = event we would need to measure the metric and also look a tiny bit into which properties, aka dimensions, could also be helpful for us. [Link to the event schema.](https://miro.com/app/board/uXjVNk6DF9w=/?moveToWidget=3458764581310444459&cot=14) Let's start at the bottom: ### Account metrics For the account metrics, we need "account created" to get new accounts and "account deleted" to get churned accounts (even when we might come up with a different definition of churned - so we might inactivate accounts after inactivity). Retained accounts are calculated based on total new - total deleted accounts compared to the period before. We should add good properties, like industry, account\_size (number of jobs), and account\_age. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f65259873-dff5-4102-a0aa-0424d51ac07e_814x468-png.jpg) ### Job metrics We need "job created" for new jobs and "job deleted". Additionally, we need "job scheduled" and "job unscheduled"—the retained metric we can calculate based on these. Now we need "job started", "job finished" and "job failed". We could also have just "job finished" and handled success and failure in the properties, but since it is so important to us, I want to have failed as a separate activity. As properties, we need job type (especially when you have different jobs, like loading, transforming, and storing) and potentially more details. So, if you have connectors, we would add job\_connector. But the most crucial property is job\_runtime, which we need to calculate the seconds used. And we also need the costs of a job for our clients -> job\_costs. This might be something that we enrich later since these calculations are often batch jobs (to incorporate these crazy discounts that sales teams love to negotiate). To enable a later enrichment, we need "job\_id". ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7f939d7b-1370-4e02-a8fc-6e06edbcf1d8_1332x464.png) ### Subscription metrics We need "subscription created", "subscription renewed", "subscription cancelled", "subscription expanded", "subscription contracted" and "subscription ended". With these, we can calculate all MRR metrics. But to get the MRR, we need at least a subscription\_mrr property. This might be similar to the job costs, that this information is not present when you send the event. So you can enrich it later. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fae2fabe5-27e3-4835-8352-7fe83c82efbb_1538x478-png.jpg) But do you recognize something - we just defined only 15 events we need to implement. That sounds doable. With this, we can calculate all the metrics we have defined before. We can quickly get from no metric tree to a v1 metric tree in 1-2 weeks. ## Analyzing customer lifetime The metrics are great for seeing how our business is developing and where our current growth is blocked, and we need new initiatives to unlock growth. But I would like to add an additional layer (which, in the end, can also get a part of the metrics layer). But the angle is slightly different. We are looking at the customer journey. And such a journey, when we are honest, is never linear like a funnel. It's a journey where customers can be in specific states or translated to analytics in particular segments. They can move out of these segments after some time, so they often switch their state during their journeys. And they can be even in multiple states at the same time (like "have a subscription" and "power account"). Here are some initial states: **Account created (last 30 days)** - the state that kicks everything off and is our new accounts baseline (it is a dynamic window, so it can match with the "new account" metric when we look at the static window of one month). **First Job created (7 days after account creation)** - we assume that a new account will add a first job when we do a good job. So we want to move people from Account created > First Job created and measure the success for this (something that can then go to the metrics tree) **First Job scheduled (14 days after account creation)** - it might take a bit longer for an account to have a successful job and then schedule it to run constantly. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fae995d1f-0911-4ecf-803b-45ae25176441_1800x252.png) **Have 5+ scheduled jobs running successfully (for l30d)** - this can already become our "returning value" state. People who successfully run many scheduled jobs get value from the platform. **Did not add new jobs (last 30 days)** - an interesting segment, especially for customer success communication. This might be totally normal (people are just happy with what they have), but it can also be an indicator that they lost interest or track. **Unscheduled 30% of their scheduled jobs (l30d)** - also very interesting for customer success. Again, this can be normal but an indicator of a churn process. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f64c5e5e6-e17b-4e15-bc8a-3ceab8174cf6_1132x540.png) ## What about measuring features? Feature analytics is extremely powerful and an essential step for product work. But we already have a really good set of events that describes the features well. And they are pretty high-level, which is the best way to start with feature definition (instead of being too granular). Let's look at jobs. This is pretty high-level since, depending on the platform, a job can be extracting and loading source data, transforming, and reverse-loading it. It might be tempting to define each of these as separate features. But as a first step, we can use Job here since, in the end, all of them will define a job. For the job, we already have all the essential events to define a typical job lifetime. The account and subscription are the same. More important is that the product team looks at feature improvements from a data angle. So, let's take an example. The product team may work on ten new connectors to extract and load data. They interviewed and surveyed the users; these ten sources were the most requested. With the roll-out, the product team has to decide how they can see an improvement with the feature improvement. The obvious would be new jobs with these connectors (the connector would be a property value and can be used as a dimension for filtering), scheduled jobs, and computed seconds with these connectors. All this is already covered with the existing events and metrics. So, the new connectors can be rolled out without new events and metrics. It might be that a new feature improvement will surface metrics and events that are missing. Then, it is a good time to review and extend the existing setup. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. The described product analytics setup is powerful and relatively easy to implement. We need 15 events we track, which can be sourced from existing backend jobs or queues. Therefore, we can have a good setup in 2-3 weeks and a baseline with the metrics tree for planning and reviewing initiatives. We also get the first set of valuable segments that the growth and customer success teams can use to run targeted communications. Feel free to let me know if you have any questions or things you think I missed. ![](/images/posts/how-to-measure-a-data-platform/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd1c65aae-09f3-4328-ac2c-329f3382ae1a_1080x1080.png)

Eventify everything - Data modeling for event data

Tue, 06 Feb 2024 00:00:00 GMT

I started out with data modeling by supporting classic e-commerce marketing use cases. On a high level, this was all pretty straightforward. We usually had one core event - the order. And we build everything around that: sessions and marketing attribution. So we could end up with 1-2 fact tables just for the orders and some more dimension tables. The tricky things here were two things: - Handle all the different kinds of source data so we can be sure not to limit ourselves in the future (and make sure we cover these slowly changing dimensions) - Make sure we can support all the different kinds of analysis use cases that the analysts were dreaming of All this ended naturally in quite complex models, which we would translate today in an 80-150 model dbt setup. We managed it and provided the essential campaign reporting dashboards the marketing teams were eager to get. All this became much more complicated when I tried to help my dearest friends: the product teams. The source data for them is significantly different. You need behavioral data to analyze data for improving user flows and progress, which is event data that can be analyzed in a sequence based on an identifier like a user id. The same kind of data today is super useful for growth teams and sales teams to understand an account's progress and potential for renewals and upgrades. But how do I model these? Put them in 40 different fact tables with 40 dimensional tables. No, that does not scale. It took me years and plenty of conversations to develop an approach to event data modeling, which I can write down. The approach works really well for me in now five setups, but with all approaches, they live and evolve. This is the 2024 version of it. ## The layers of Event Data Modeling On an abstract level, we can understand the process of data modeling as a journey through different layers. All layers serve a specific purpose (store scalability, apply business logic, enable reporting or application), making a data model easier to grasp and extend. Rogier Werschkull described his approach to layers in this LinkedIn post, which I have bookmarked and would like to share with other people when they ask. ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f95a3fc2c-f9cc-4b95-a1fc-92fb8dd81ec9_1092x552-png.jpg) [https://www.linkedin.com/posts/rogierwerschkull\_datamodeling-analytics-physical-activity-6991149277917892610-PCo4](https://www.linkedin.com/posts/rogierwerschkull_datamodeling-analytics-physical-activity-6991149277917892610-PCo4?utm_source=share&utm_medium=member_desktop) He breaks it down like this: - Layer 1: Raw source - Layer 2: Read layer - Layer 3: Ensemble-based data modeling - simplifying future data integration - Layer 4: Dimensional modeling - for user-friendly presentation layer - Layer 5: Analytical consumption These layers show how you go from source data to a consumable version of the data. All layers have their own purpose and potential complexity. They help to modularize your data model. Therefore, I use a similar layer approach for event data. ### Disclaimer: But we already have a data model First of all, that is excellent. Congratulations - this is a perfect foundation. Every event data model that I have built so far has always been built on top of the existing model. You can describe it as a plugin, addon, or application. I like the application layer terminology. It's basically similar to the Layer 5 mentioned above or the presentation layer if you use this term. At this point, the data is finally ready for consumption. So, when we look at the layers in the next step, understand them as sub-layers of this specific application. Ultimately, the event data model does not care about the data model you use in general. It works fine on top of each of them. The first important part is where we can find the events. Therefore, let's move on to our first sub-layer. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ### Sub-Layer 1: Raw event data All raw event data are stored in their original form in this layer. The source and approach mostly drive the original form to load the event data into your data stack. It's optional if you really model this layer out physically. Sometimes, it might be enough to reference the original event source. I like to build it out because it gives me a point of entry and more control over the data. But it comes with additional compute and storage costs (depending on the materialization strategy). #### Event pipelines ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f2ad78b5a-f940-4fe1-83fd-403d861e3e1e_1404x788.png) The most common form is data from event pipelines. These can be specific analytical event data pipelines like Snowplow, Segment, Rudderstack, or even Google Analytics 4 (not a recommendation for event data pipelines, but it is often available), and general event pipelines like streams (Kafka,..). These event data pipelines get events ingested using either SDKs (wrappers around an API) or an API endpoint directly. They are built for high-volume ingestions, transport, and loading into a source system. Most of these pipelines have qualification steps in between to enrich, filter, or qualify event data before it loads in the source system. They also take care of the loading itself, and here, it is mostly about schema changes. Schema changes in event data are pretty common since you often extend the tracked context and, therefore, change the schema. But when you use the out-of-the-box pipelines, you are covered. They all solve schema evolution in different ways. One approach is reducing complexity by loading each event type in one table (Segment or Rudderstack). Or by using a powerful schema evolution process and defined schemas (Snowplow). Or by living an engineer's wet dream and showing off by using proprietary database functions like nesting (Google Analytics). #### Eventification ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f202f987c-1f34-4b1f-bf14-6161ed18bbf3_1332x756-png.jpg) Now it is getting interesting. We put on our Indiana Jones hats and go hunting for event artifacts in the different various tables. Here is what we are looking for: - Timestamps - Unique identifiers Let's pick a schema of some Fivetran data that is loaded in our Warehouse. We can use the Hubspot one since this is often our marketing team’s favorite- here is the marketing data model: ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc184fdc2-a2a5-4f2d-81d2-0f6b5809137b_2428x1474-png.jpg) Contact would be an interesting dataset for us since we can construct events like "contact created". The contact table does not have a created\_at field (which would be our usual candidate). But Fivetran is giving us a history table. History tables are a great source for derived event data. In the contact\_property\_history table, we can look for the first instance for the contact\_id and can use this as the "contact created" event. It is also worth scanning through the history table to check for specific property changes that could indicate other events. The email event table otherwise makes it easier for us to derive events directly. One important exercise - you need to make sure to find the best identifier. From the Hubspot model, it is most likely the contact\_id, but when we want to join this data with other data (behavioral data, for example), we need an id that is present in both sources. It is usually a good practice to add an account\_id or user\_id to Hubspot as well. You should use this id as the unique identifier if you have done that. When we look at the Sales & CRM Model: ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7716eb33-70d3-4391-b3b5-2cac54fb75d3_2694x1516-png.jpg) The Deals table is definitely interesting. It can be mapped back to a contact. And give us insights into the contact and deal journey. We again have the deal property history table, which can give us insights into the "deal created" event and potential other candidates. But Deal\_Pipeline\_Stages is even more interesting since we can get the defined pipeline steps and model the stages from there. How to work with this table is described in the Fivetran Hubspot Documentation: [https://fivetran.com/docs/applications/hubspot#dealstagecalculations](https://fivetran.com/docs/applications/hubspot#dealstagecalculations) With some initial work, we can already surface: "contact created," "deal created," and "deal phase finished" with a unique identifier and valuable properties (like deal\_value) as new events. Again, this kind of work is like a treasure hunt; I have to say, this is usually the part of the project I enjoy most. To get event data without implementation is still pure joy. #### Webhooks Another often overlooked way is to consume webhooks. Any webhook request, by definition, is already an event, and most 3rd tools support webhooks, often for a wide range of activities. For receiving and storing webhook data, you often need a very simple service that receives any event and writes it to your data warehouse. For example, I am using a very simple Flask app that runs on Cloud Run, and I am using dlt to free me up from all these nasty schema changes. With these three ways, you can already surface plenty of essential events and activities that define user and customer journeys. The best thing about it is that all these events are of a better quality than any that comes from any frontend SDK. As written before, you can introduce a physical layer where you bring all these different events together in one place and then go to the next sub-layer. Or you simply reference the origin from the next sub-layer. I prefer the first option, but pick the second one if you get headaches about computing or storage costs. ### Sub-Layer 2: Qualify events to activities This layer is not necessary, but it can help a lot. The layer aims to decide which events will be prepared for later use. This can be a simple selection of events or just renaming, but it can also include more complex operations like filtering, merging, rule-based selection, or calculations of events. With these operations, this layer can be very important for your setup. Here, you can ensure you deliver a superb data user experience. By ensuring that naming conventions are used, bad data is filtered out, and too granular events are merged or held back, you can design the later data user experience before the data goes into analytics tools like Mixpanel or Amplitude. I like to call the events after the qualification activities to make a clear separation between both of them. I like to use the analogy of iron and steel. Both have the same core material, but steel has been enhanced for later usage. You can use different approaches to achieve that, but I like to use the [ActivitySchema](https://www.activityschema.com) model for this since it has a very clear and simple structure. Since I use it for all setups, it is very easy for me to work in these different setups since all things are in the same place and happening in the same way. **As an example:** From a webhook, I might get a "contact update" event, and when I check the status, I can see if a contact has been created. Therefore, I can now add this in this layer by renaming the event to "contact created" and adding a where statement "where status = 'create’" By that, I make the event easier to understand and to work with, and I make an intentional decision that I want to use this event as an activity. ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f65024a2b-8745-4115-ab53-9d8114f9ff47_1790x330.png) In my setups, I define all activities by hand using this approach. This can become a tiring job. But this is intentional. I want to avoid to make it too easy. From my experience, you decrease the adoption rate significantly when introducing too many unique activities for people to work with in analytics tools. People who use the tool two times a month and then need to scroll through 100 events to find the right one for their analysis job are scared away by the effort it takes. I want to keep it simple for them. And when I define all activities explicitly, then, I make sure that I keep it simple. ### Sub-Layer 3: Make the event data accessible for your use cases My major use case is to analyze the event data in event analytics tools like Mixpanel or Amplitude or in nerdy tools like Motif Analytics. I have not worked so much with ML use cases based on event data. Therefore, I don't cover them here, but I might do a special post about them in the future. For event analytics tools, you now have two ways how the data can get into their systems. The classic way is to sync the data up to the tools. You don't need reverse ETL for that. Both Mixpanel and Amplitude support this sync from their systems now. You point them to the right tables (Mixpanel's case) or write a query to define the table and the data (Amplitude's case). Both syncs work really well. But you have a problem with changes to historical data. One of the big benefits of the DWH event data approach is that we model the data and, therefore, can also make changes to historical events. Maybe we can enrich a specific event with new source data and introduce a new property. It would be great to have it also for all historical data. But this does not work with the classic sync modes. One workaround is to use one table for each event type. Like Segment and Rudderstack are doing it when ingesting the data. With that, you can at least add new historical events since you add them to your sync. But it will not work with new properties for all historic events. Luckily, the tech is moving forward. Mixpanel has introduced their new [Mirror](https://mixpanel.com/blog/mirror-product-analytics-data-warehouse-sync/) product, which does a kind of change data capture on top of your data and, therefore, can handle all the cases where you change things historically. I have not tested it yet since it works fine for Snowflake but not for BigQuery yet, and my current prod event data model is in BigQuery. On the other hand, Amplitude has [a native connection](https://amplitude.com/blog/Snowflake-native-Amplitude), connecting directly to your Snowflake instance, sending the query down to the warehouse, and working with the result data. I saw one demo, and the speed and UX were still good (even when it was a bit slower). But with that, you have a guarantee that it always works on the latest data version. This approach is quite similar to that of newcomers like Netspring, Kubit, or Mitzu. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Working with the layers Working with these layers is quite straightforward. **Sourcing new events** Your product and business are evolving. Often quite fast. There is a new feature just deployed, and we want to see how the performance of the feature is developing. There is a new integrated tool that takes care of specific onboarding steps. These events would be great. The last analysis surfaced some interesting drop-off patterns, but we are missing two property criteria, which can be loaded from one of our application tables. **Adding new activities** The new feature is about to be deployed, and we loaded the new events for it from our application database (we used the table where all items are added); now, we can transform these raw items into proper new activities so that the product team gets their release dashboards. **Adding new properties** The new properties for our existing onboarding events are now available. We can now enhance the existing events with these new properties to enable our users to enhance their analysis. **Removing activities** Our latest survey and data usage analysis showed us three activities that don't make sense in their current form. Therefore, they will be replaced and transformed into new activities. We adapt the model with the changes and ensure the old activities are dropped from the analytics tools. **Add new use cases** The sales team is extremely curious to see which accounts did five defined core activities in the last 30 days, so they have a better view when making calls. Therefore, we add a new table to the application layer to have these activities ready and sync them to our CRM. ## Final words From my experience, a data model for event data is much easier to manage than an extended model for other business data. In the past, the analytics use cases were just harder to handle. Writing custom funnel analysis SQL takes time and experience and therefore excludes many people. This is now changing with DWH integration to event analytics tools. Therefore, a data model for event data becomes valuable. In most cases, the data model for event data is an extension of the existing model. It does not replace it. Unless you have no data model in place, you can use the approach described here to enable a data model for your event data quite straightforwardly. ![](/images/posts/eventify-everything-data-modeling/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd1c65aae-09f3-4328-ac2c-329f3382ae1a_1080x1080-1.png)

Everyone is a CDP now

Mon, 27 Nov 2023 00:00:00 GMT

Why hasn’t MS Excel announced its CDP product yet? ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5be1a3e8-2a4f-4b13-8672-8681ea3ea103_1456x816-png.jpg) It was quite a while ago when I heard the term customer data platform for the first time. And I had one big question: what is the difference between a CDP and a DWH? And honestly, no one could give me a good answer then (and potentially not today). Maybe this was a first glimpse of what we call today composable. In some DWH projects, we have already done the same thing. We collected and unified different kinds of customer data: Behavioral data and Property data from other sources like shop systems, ERPs, and analytics systems. Based on what I know today, this was most likely a CDP. But looking at it, it already shows a problem. How do we even define a CDP? ## What is a CDP Let's unpack the Customer Data Platform package a bit to get a better idea of of what makes a CDP and what maybe not. I have followed [Arpit's](https://databeats.community) content for years,; he is a great resource for making sense of the CDP space. And he spent a reasonable amount of time to define some things. > Check out Arpit’s series about composable and packaged CDPs: [https://databeats.community/series/understanding-composable-and-packaged-cdps](https://databeats.community/series/understanding-composable-and-packaged-cdps) > > They include a lot of foundational definitions. He makes a difference between CDP and Customer Data Infrastructure (CDI), which makes many things more manageable when we move on. A CDI is not a marketing category; no vendor calls it like that. One of the reasons for this might be that the event collection alone doesn't seem to be valuable enough. Alex Dean wrote [an excellent post](https://sourceoftruth.substack.com/p/your-primary-tag-needs-to-be-switzerland) about this phenomenon (and it is no surprise that he thinks that event collection is a robust stand-alone solution - which I also think). So, a CDI is an event pipeline. These can be instrumented pipelines like frontend or backend trackers. Or it can be endpoints that receive webhook data. A newer addition might be event streams that you collect into your Data Warehouse (like Kafka topics or CDC). A CDP is a package that builds on top of the event data. Beyond event data pipelines, it helps with identity resolution and syncing the data back to marketing, sales, or other tools. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4191f04e-dd2e-475c-9cc6-01ad1d04f3bf_1818x1070-png.jpg) The sync back here is an important piece. This was the missing piece in the old CDP setup I described in the introduction. We usually had 1-2 syncs back to ad platforms like Google or Meta Ads, but that was the whole sync setup. We get to the importance of the syncs later in this post. ## Composable or what? There are minds in the tech scene that praise the marketing approach of creating a new category so you can own it as the category leader. This motion brought us things like Analytics Engineering and the composable CDP. Honestly, I lost track of who was the first one that came up with. But one of the vendors offered a part that could make a CDP. Most likely from the reverse ETL category, to make their solution attractive for the wealthy marketing teams. Composable means as what the Modern data stack meant - unbundling a package and embracing a best-of-breed approach. So you put together event collection, batch loading, storage, transformation, and sync in one composable solution, and voila, you build yourself a CDP. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8b80217b-0ae3-40b7-ae4a-97d9326fcb70_1730x1260-png.jpg) The promise of composability is the same as the modern data stack - you pick the tools that do the job best for your use case. And it has the same issues as the MDS - stacking eight tools together never works seamlessly. So maybe composable CDP could have just been called the modern CDP to make it easier for people to understand. The composable approach included many different tools for different categories made it attractive for all these vendors to push the concept of the composable CDP together. They could be part of a new category without having any new product features. Snowflake or Databricks were happily promoting the composable CDPs since it is not relevant for them what queries keep their CPUs warm. But similar to the MDS and maybe even more, we see a step-by-step composting of the composable CDP. My educated guess is that Marketing teams have no interest in waiting for 12 months until the data engineering team has some resources freed up and composed a CDP for them. They want to start now. Therefore, they look for all-in-one CDPs. These CDPs have a price, which is usually event collection (you might need to add new instrumentation), and, of course, since they are a bundle, the initial pricing is higher (but as we know from the MDS, a composable CDP can have sneaky increasing costs over time as well). For the future, let's forget about composable or all-in-one and focus on the value that the solutions can get a company. Therefore, I want to examine the different CDPs based on their former functions. Because there is one interesting thing, most of the CDPs I know were something different before. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Different CDP buckets based on formers ### A lot of former As we already mentioned and referenced Alex's post here. One of the takeaways was that just event collection is not enough for most companies to meet their growth goals. And we will see this for more different models in the following. CDP is, interestingly, a category where most of the vendors today move into coming from an other category. I have a final bucket of original CDPs; there are some, but they are rare. ### The former Email tools Oh, email tools - the first instance where people thought they would do proper retention marketing by sending newsletters. They brought a category of email-sending tools. And some of them were successful and worked to provide more value to their customers. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fa3385fde-b909-4d48-86cd-644095224574_1678x1106-png.jpg) So, they started to collect some customer event data to enhance their reporting and show the success of their campaigns. Based on that, they created better reporting and analysis, enabling the building of audiences you could use for your emails. And then they added integrations to other marketing platforms like Google Ads. And voila, they got a CDP. A good example is [Emarsys](https://emarsys.com). We used it a lot in the old days. They did such a great job that they got acquired by SAP (not sure if this is the so-called heaven, but who knows.) Another example is [customer.io](https://customer.io), which started with email, added segmentation, then added communication flows which could also trigger things at other platforms, and now introduced event pipelines to become a CDP. Former email tools usually bring an activation layer with them. Often, it is just email, sometimes 1-2 more channels. The data integration came later, and therefore, they usually don't have a lot of instruments to control data quality better or transform data. ### The former event pipelines The event pipelines are the ones that start with collecting events from frontends and backends and, as a first step, usually store them in a database (or data warehouse). Additionally, they often were set up as a proxy to send these events and marketing platforms. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd9d1afe6-d243-4de0-9f42-858194a58b64_1682x1060-png.jpg) Segment started as this proxy. They started with an open-source tracker that made it possible to add tracking code once (and not for every tool that needs the event data) and then send the data to all tools. They were also starting to send the data to the the data warehouse at some point. And by that, they became a kind of mini-cdp, or more a CDI. At some point, Segment offered Personas, which was their Segment builder (Segment Segments was potentially too hard to understand). They were already the core collection point for a lot of customer data, and with Personas, they introduced a way to handle identity resolution and segment building before this data was sent to the different marketing platforms. Rudderstack, when they came out, quickly followed the same path, but in faster time (they, of course, had to catch up). This could also make them a candidate for the "no-former" bucket, but they designed a lot based on Segment's offer so that they can also live here. Former event pipelines are more robust on the data quality side and often have services for that (also for handling PII data). They both invest a lot now in the Segmentation part to enable the segmentation with data that does not go through their pipelines. ### The former reverse ETL tools Reverse ETL as a category had a short lifetime. There might be people that say it still exists; there might be more people that say it never really existed at all. Again, the term Reverse ETL is mostly a marketing term. The vendors had to find something that sets them apart but is still familiar enough so people get the idea of what they are doing quickly. In the end, they offer a service that is writing data from a data warehouse back into the tools (like marketing platforms) - hence the term reverse. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffca56d8d-eabd-4e27-b631-86cf10335574_1200x942-png.jpg) There is a feature in the CDPs that syncs the data to all the different platform, where it is then used to trigger emails, push, or ads. Since they were a feature in this setup, a natural step was to become a more integrated product. So, they started to move into more CDP features. Next up was the Segmentation part, which was a natural step. Since they are built on top of data warehouse data, they offered a missing frontend for marketers to create segments based on the DWH data. And then sync these segments to the different marketing & sales platforms. The big question now is, where they move next? Hightouch is doing the next logical step (to move away from the composable concept) and has added an event pipeline. Let's see what Census is doing here. ### The former analytics tools They are kind of late guests to the party, which is a surprise since they already have had the data and tools for some years already. You can build segments directly in Amplitude or Mixpanel (and now also Piwik PRO) and push them to marketing platforms. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f76303ede-d017-43b1-9a3b-fc7d0aef34bc_1264x960-png.jpg) When you are an analytics platform, the sync feels like a logical and natural extension of your business. You collect behavioral data; your platform allows the creation of segments anyway. So you "just" need to add the sync part. Interestingly, Google Analytics was quite early here - but only for Google Ads. However, the ability to create segments in GA and turn them into Google Ads audiences is still the most powerful argument for teams to stick with Google Analytics 4. The big gap for these platforms was all customer data beyond the behavioral data. But with the new Data Warehouse integrations, they open up for this kind of data. You can now enrich your customer properties in your DWH and sync them to your event analytics tool. You can now integrate event data from marketing platforms like email interactions and combine them with your data. The analytics platforms are my dark horse in the CDP race. ### The no-formers Yes, there are native CDPs. Maybe we can name one with a different former history but didn't fit into the other buckets: Tealium. Tealium was, first of all, an enterprise-grade tag manager. It's a bit like Segment, but pretty early, it launched the Audience Stream build on top of the tag manager. mParticle may be the better example since they don't have a visible event data pipeline history like Segment. But perhaps they were a former as well, and I just missed it. From a feature-set perspective, I would put them close to Segment or Rudderstack. But please correct me here if I am wrong. ## It's not about customer data. It is about activation Whenever CDP is mentioned somewhere, the big talk is about the customer data you collect from the different sources, put it in a central place, create some valuable audiences, and then pass them on to any tool that might use them. ### Ingredient: Customer data Don't get me wrong. I am getting the customer data in one place. Then, it is possible to combine them and, in this process, apply identity resolution. In the end, you will have an extensive dataset of your customer data. This is the foundation for what I will describe next. But I think this is not the essential thing. This kind of datasets have existed for a long time already. Sometimes in a Data Warehouse, most often in a CRM. So, for me, this is not the part that makes the CDP special. As mentioned at the beginning, this was the part I did not understand when I heard about CDP in the first place since I thought we had this already. ### The pricy meal: Activation The real deal for me is the activation part. I spend most of my life with event analytics. My typical use cases would look like this: I talk with the marketing team and pick up a challenge they have. They are not sure if the onboarding emails make an impact. So, I get into a first analysis, maybe extend our event design slightly. And finally, deliver an analysis that shows which steps in the onboarding flow need some optimization. Then, the marketing department will hopefully pick this up and test some different mails. So I could do the same analysis again (or if I am smart, I create a funnel so they can monitor it). ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6fd52d1b-9e34-4107-8eef-64df4507e505_1288x856-png.jpg) Sounds good, but not really. There is a lot of handovers and explaining involved,, which is not efficient. Here comes activation. Same example, this time different: Still, the marketing is struggling with their onboarding sequence, so I do the same analysis. But this time, I deliver different audiences for the marketing team: all people in onboarding step 3 (where we see the most drop off), all people who have not moved on to step 4, and all people that, based on the first information, look like our core ICPs but have not moved to step 4. Marketing can nowcan now create new communication and test them with particular campaign parameters, so we can look at the performance and potentially run more tests. ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f333f42d3-2bc5-4474-9563-b4320c962e21_1388x910-png.jpg) That is activation. Our handovers are not any notebooks, dashboards, slides, or loom videos, but instead, audiences with the potential for better communication flows. And I usually would go a step further - invest in plenty of training and loom videos to show marketing how to find audiences - and from that, they are in constant experimentation and optimization mode. This is real enablement. For me, the activation part is the essential part of the CDP game. And not that you have to run the activation part essential in the CDP, but that you have a closed loop from customer events & properties (analysis), segmentation, activation, and back to customer events. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Where are things moving and some bold predictions Of course, we are curious to see where this is all going. Maybe we should even create a new feature monitoring website. To keep track of who has launched which new feature at which timestamp. Sure, AI will be part of it (mParticle did this move already - at least in their website communication) ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6b743d62-6c21-4a2c-b877-4a90ecb08608_1308x634.png) But what else - we will see in the following months. So, to make it a bit more interesting, here are some predictions (aka acquisition ideas): **Rudderstack to acquire Posthog** The analytics part is missing in Rudderstack setup. Yes, they have an audience builder, but Posthog would enable them to create audiences based on deeper analysis. **Census to acquire Jitsu.** Hightouch added their event tracking (with some weird public uproar). So, to make a different example, Census could acquire Jitsu. Jitsu has a really solid event collector and could be a suitable extension. If the Census’ treasure chest is still impressive, then they could also look at Snowplow. This would give them the industry-best collectors, event data pipeline, and data contracts in one go. **Amplitude & Mixpanel to integrate with Zapier or Make.com** Honestly, it's the easiest part. Piwik PRO did this already. This would unlock so many tinker possibilities. Especially now with the new tables feature in Zapier. This would unlock millions of activation scenarios. I also thought that they could acquire one of the marketing platforms like customer.io, but I think they are better off by integrating as many marketing platforms as possible. In general, let’s revisit in 6 months which new CDPs came around and where the current CDPs have moved then. And by the way, I am a CDP, too (I have a perfect memory for names). I hope, I could make the differences and parts in the CDP world a bit clearer. What are your thoughts and experiences? Let me know in the comments. If you like this content - I also have written a deep dive workbook on event data design (part of the CDI) - have a look here: ![](/images/posts/everyone-is-a-cdp-now/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd1c65aae-09f3-4328-ac2c-329f3382ae1a_1080x1080-2.png)

Data pipeline orchestrators - the emerging force in the MDS?

Wed, 18 Oct 2023 00:00:00 GMT

Some weeks ago, I wrote about evolution threads in the Modern Data Stack and pointed out that data platforms are one of these evolutions happening right now. One of my takes is that what makes data platforms so strong is the control of the flow and the data loaded and transformed. And the sheer amount of metadata this produces. [After the Modern Data Stack: Welcome back, Data platforms](https://hipster-data-show.ghost.io/after-the-modern-data-stack-welcome/) — Data platforms are the next iteration of the Modern Data Stack setups. Let's explore why. So we are talking a lot about the Modern data stack again.Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. What is it right now? Is it dead, irrelevant, post, pre-post, cubistic? When I posted about it on LinkedIn, [Simon Späti](https://www.ssp.sh) rightly commented that you can achieve something similar in an open-source stack by having a powerful orchestrator. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. I never thought about orchestrators in that way, but he 100% has a point here. This is why we look into a different MDS evolution thread today: Data pipeline orchestrators, where they are, and what role they can play in the future. May take is here: it can be the one essential role. But let me explain this a bit longer. What does an orchestrator do? ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f10596212-bf32-4dde-82d5-927f4165cf61_1848x586-png.jpg) Maybe we can start with "Moving things from left to right." Data pipelines are often multiple copy machines; data gets loaded and then in a multi-transform setup in different copies changes and adapted and, at some point, finally stored. How much an orchestrator does depends on the orchestrator. The pure orchestrator triggers steps, waits for a response, and, based on the result, initiates the next steps or multiple in parallel. But it does not have any load or transformation logic. A hybrid orchestrator has some of both; for some steps, you use build-in connectors (like to load specific data from the specific platform), then some transformations can be added to the orchestrator as well as custom code, and for some steps, you trigger an external service. Some orchestrators need to persist data actively in each step to reuse it; others can "hold" it (well, they persist it, too; you don't need to do it actively). What combines them all is the value of taking data from a source in a specific shape and load, transforming, and storing it along the way. Why do we even need to do that? So far, we can't run queries like ``` Select lead_id, lead_name, lead_score from hubspot left join vertex on hubspot.customer_id = vertex.customer_id ``` \- so, we need data pipelines to combine all these datasets and make them work together. And the orchestrator is doing a lot of the heavy lifting or ensuring things are happening. We mentioned some aspects here already that we will pick up later, but let's look at what a data pipeline orchestrator does in detail. ## What does the data pipeline orchestrator do? Sorry, I take a nice shortcut here. But instead of coming up with some own examples, I pick the ones that the different orchestrators use in their documentation, getting started, or use case videos/posts. **Astronomer:** [https://docs.astronomer.io/learn/cloud-ide-tutorial](https://docs.astronomer.io/learn/cloud-ide-tutorial) ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fea9b0a90-c310-42a1-81e9-292f9f763251_1402x672-png.jpg) Build a simple ML pipeline: - Query data from a database - Apply basic transformation - Train a simple ML model - Schedule and control the pipeline versions with Github **Dagster:** [https://docs.dagster.io/tutorial](https://docs.dagster.io/tutorial) ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6a93cae6-9025-4287-8527-0fcb8966754e_1102x510-png.jpg) Build a simple load, transform, and store pipeline: - Ingest data - Work with DataFrames - Create a visualization - Store the data - Schedule the pipeline - Connect it to external destinations **Kestra** [https://kestra.io/docs/tutorial](https://kestra.io/docs/tutorial) ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2faa5488f0-b194-47c0-a75c-0f7d23417e62_1244x576-png.jpg) "Hello world" flow introducing the basic concepts (namespaces, tasks, parametrization, scheduling, and parallel flows): - working with the inputs and outputs of a task - define triggers in the flow - Add parallelism **dbt** [https://docs.getdbt.com/quickstarts/bigquery](https://docs.getdbt.com/quickstarts/bigquery) Wait, what - dbt, here - yeah - in the end, it is a SQL query orchestrator and a pretty light one, too. ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9ecb827e-5ee9-4964-98ba-eca2a96b778b_1334x544.png) - Access data in BigQuery - Run transformations - Materialize the data - Run tests - Add a schedule **Keboola** [https://help.keboola.com/tutorial/](https://help.keboola.com/tutorial/) Wait, what... All data platforms have orchestrators built-in - Keboola is an example here for the category. ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4f4a13ba-ea6b-4f91-b6a2-43c57adc6cc3_1274x604.png) - Load data with standard connectors or extractors - Transform the data - Write the data - Send the data to destinations **GCP Workflows** [https://cloud.google.com/functions/docs/tutorials/workflows](https://cloud.google.com/functions/docs/tutorials/workflows) We can skip the wait what, I think you get the idea. ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8ccd05d5-b99b-4b44-befe-2098bb0be4d9_1330x626.png) - Create custom functions to do the heavy lifting - Use Workflows to trigger the functions - Request data from an external source and use Cloud Run to work with it Naturally, these are basic examples (I still hope vendors will also put out complex tutorials for the next steps), but you should get the idea. These are good ways to test and learn about the different orchestration approaches by just doing the getting started tutorials (maybe something I might do a video about in the future). So, these tutorials will already show you that there are differences between the different orchestrators. Let's take them apart more to understand the different approaches and models. ## What are the different types of orchestrators? Compared products: ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe1f87f95-e9bc-4cf3-885b-8bf0ecd9468d_1152x626-png.jpg) All placements on the charts are subjective and can be debated by you with a fierce comment. ### Pure orchestration to data platform We span the axis from pure orchestration to data platforms. The significant aspect is much more heavy lifting is possible with the orchestrator. Or how dependent you are on external services. ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9eb8496d-2452-43e8-ba1b-7ea2123d9daa_1812x580-png.jpg) **Pure orchestration** A pure orchestrator is just handling the orchestration part: managing steps, triggering external services, taking the results, handling conditions, and scheduling the whole thing. You will rarely find any custom code here that is loading or transforming data. There might be some standard connectors already, but they are limited. **Data Platform** The backbone of a data platform is the orchestration. But many of the tasks in the pipeline can also be done with the orchestrator. Either by using prebuild components or custom code that the orchestrator executes. External services are just utilized in exceptions. The integration between all tasks is usually deep so Metadata can be used easily between the tasks. There is a clear tendency in the market to move to data platforms. Not a big surprise, people will ask, what is your tool replacing, and an orchestrator alone is first just an addition. I can replace an ETL tool once I can also handle 90% of data integration. Once I can do the transformation, I will replace the dbt cloud. And so on. ### General to Special The orchestrator can either be built for basically any orchestrator use case, or it is strictly built for a very special case. ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f75fae30a-a2e6-4dfc-9ee0-ca2411048dc5_1824x574-png.jpg) General is one tool to rule them all. One tool your team can focus on and no park of different solutions. It can be even that data and application engineering build things on a similar stack, which can be beneficial. Special gets you the best solution for a special use case. If this use case is dominant, a unique solution can be easily worth the investment. There are different ways of special. One way is the kind of use case. dbt is an SQL transformation orchestration. This is a very special use case and, therefore, a special orchestrator. GCP Workflows can be used for plenty of use cases, but it is specialized in GCP services (but not limited to them). ### By implementation ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb5880723-8641-43a6-b5cd-c90550556637_1808x464-png.jpg) **Opinionated or standardized frontends to code** Opinionated or standardized can be significant. It usually hides a lot of complexity under a specific kind of implementation. This can be a "plug & play" experience as we know it from tools like Fivetran or Segment. Or an opinionated way, like the YML configuration you use with Kestra. The limitation of standardized is natural customization. Every standardized tool tries to integrate customization at some point, but it is often limited and poorly integrated. Opinionated needs investment that the approach really pays off. It requires a learning curve to get started and be efficient once you master it. This is the real threshold for me for the opinionated approach. When my investment, in the beginning, is too big and the gains are not clear, I won't invest the time. **Code** Of course, code solutions are also opinionated. Some more, some less. But you can incorporate them into your usual way of building applications. So, getting started with Airflow should be straightforward when you have a good Python understanding. Code solutions are highly configurable and can usually extend to almost all use cases. When a team is experienced with a specific language and already has a codebase, an orchestrator in the same language can be a natural extension. ### Open source to proprietary There are almost no pure open-source projects. Pure is for me that no paid or managed services are in place as the next upgrade step. ![](/images/posts/data-pipeline-orchestrators-the-emerging/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f22515593-3760-426c-b3a6-362040e7d291_1846x460-png.jpg) **Open source** - Interestingly, we find quite a bunch of dominating open-source solutions for orchestrators. Airflow was at least a dominating project in the last few years, and it has generated many follow-ups using the same mode. - All of the ones mentioned also offer a managed or cloud version of their orchestrator. **Proprietary** - There are at least two versions of proprietary: platform-based or own-platform. - The platform-based ones are part of a platform offer, like GCP Workflows or AWS Step Functions. - The own platforms are an essential part of a data platform. - Interestingly, no single-purpose orchestrator service is a proprietary paid platform (as far as I know). Maybe something we might see in the future. ## The metadata sorcerer I started with emerging forces - there is a reason why I believe that orchestrators are in an interesting spot in the Modern data stack. One of the MDS's strongest but weakest points is decoupling things. It enables you to combine just the right tools without big platform lock-ins and bundle prices. On the other hand, the coupling is never easy and often so much work that most tools are isolated. This is fine as long as you have a small setup. It is no big surprise that most "Hey, I have this cool new data stack" posts or videos show stuff done on a local machine. But what works locally does not automatically work in a shared and scaled environment. So, orchestrators can be the glue to bring all MDS things together. And in theory, they are doing this job, but interestingly, not always. This might be why most managed orchestrator services are integrating more and more typically decoupled MDS assets to couple and bundle them. But there might be a source where orchestrators are in the perfect spot and is not utilized to its full extent: Metadata. Why iMetadatata important - [Lindsay](https://www.linkedin.com/in/lindsaymurphy4/) and I wrote a [post](https://www.secoda.co/blog/harnessing-metadata-the-future-of-effective-data-management) and even did a live show about it. So here is the gist: When we operate a data stack, we apply architecture, design, and implementation. But these things need constant feedback if they still do the job in a way that the data strategy has described. The best feedback system of operations iMetadatata. DevOps makes use of systems and application metadata all the time to identify bottleneck and optimization potentials. In data, the role oMetadatata is currently mainly used for some governance use cases. Even in data observationMetadatata does not have a central role, where most tasks are done by running SQL. I already described this in a post about data platforms - one huge asset that data platforms have is that they own and control 100% of thMetadatata. This enables them to offer tools to create data setups that can scale under control and handle data quality at each step. And if you are not on a data platform, the orchestrator becomes your lightweight data platform. I am extremely curious to see what role metadata plays for orchestrators and if it will become so strong that an orchestrator can use this as their main secret sauce. The interesting step for me is when orchestrators extend their execution metadata by pullinMetadatata from the services they trigger. A central shared log would be a great start. ## And when it might be over already I predict that orchestrators will become the essential tool in the next wave of the modern data stack evolution, similar to data platforms. And I see the generic ones as the strongest forces. And it looks like that open source plays a vital role here. Is it so, ruling the world for the next ten years? So far, nothing has ruled the data space for a long time. Cloud warehouses may be the ones with the longest realm. One pattern on the horizon will fundamentally change orchestrators: **Streams**. With data streams, we will see many patterns that are currently dominating. This is something for a different post. But an orchestrator for a stream will look differently if it is built natively for a stream. But until then, keep an eye on orchestrators; if you don't use them, consider them for your stack. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work.

About Data User Experience

Thu, 28 Sep 2023 00:00:00 GMT

I was in a podcast recording with [Juliana and Simo](https://juliana-jackson.com/standard-deviation-podcast/) this week, and we talked about event data and how to design it. And Simo asked an essential question: why can't we collect the events we need right now, name them correctly, and stick with that? That got me thinking. Because he is right, this is the better approach in some situations. But my gut feeling told me that these are exceptions. I thought a bit about it. I explained it in this way: The agile approach works for quick insights, especially when you are one person or a very small and hands-on team. Beyond this, it fails. It does not fail technically (ok, to be true at some point, this will also happen), but it fails for the **user experience**. When people ask me why I recommend reducing the number of event definitions (names), dashboards, or anything in front of the user, this is all about data user experience. **A product analytics example** - you want to find out what the retention looks like for a new free trial account, coming back and doing one out of three important activities in our product. Imagine selecting these events from a catalog of 150 unique event definitions. And there is most likely no documentation of all events and most likely no 100% clear standard of the naming. This can take up to a week. And this is a bad data user experience for me. Let's dig deeper. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## What is user experience? > "The user experience (UX) is how a user interacts with and experiences a product, system or service." > > "It includes a person's perceptions of utility, ease of use, and efficiency" [Wikipedia](https://en.wikipedia.org/wiki/User_experience) Utility, ease of use, and efficiency are already good for deeper analysis. But maybe we can get more. Let's look into Peter Morville's UX honeycomb: ![User Experience Honeycomb](/images/posts/about-data-user-experience/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fbc9f5b65-b328-4332-aa51-e4db777f4c31_440x440-jpeg.jpg) Source: [http://semanticstudios.com/user\_experience\_design/](http://semanticstudios.com/user_experience_design/) And more here: [https://www.interaction-design.org/literature/article/the-7-factors-that-influence-user-experience](https://www.interaction-design.org/literature/article/the-7-factors-that-influence-user-experience) Here we find: **Useful** - If a product lacks utility or purpose, it will struggle to compete in a market filled with useful items. However, "usefulness" can be subjective and include non-practical benefits like entertainment or aesthetic value. **Usable** - Usability focuses on enabling users to achieve their goals effectively and efficiently, and products with poor usability, like first-generation MP3 players, are less likely to succeed compared to more usable alternatives, such as the iPod. **Findable** - Findability is crucial for a product's success and user experience, as it ensures that the product and its internal content are easy to locate, much like how a well-organized newspaper enhances readability. **Credible** - Credibility is essential for a product's success, as users seek trustworthy options and are unlikely to give a second chance to products that fail to deliver on promises, impacting both user experience and business viability. **Desirable** - Desirability, influenced by factors like branding and emotional design, can set similar products apart, as seen in the preference for a Porsche over a Skoda, and highly desirable products are more likely to generate word-of-mouth promotion. **Accessible** - Accessibility in design is often overlooked. Still, it is crucial for reaching a broader audience, including the nearly 20% of people with disabilities, and it not only benefits those with impairments but often makes products more accessible for everyone to use while also being a legal requirement in many jurisdictions. We will take these factors and apply them to typical data UX scenarios. But before we do that, we look at where data UX happens. ## Where is data UX happening? ### The frontends Most obviously, we can find data UX when people are trying to use the data to find and get valuable insights. These can be: **Dashboards** - luckily for this asset, we have a UX discipline. Plenty of books have been written about Dashboard design. **Analytics tools** - Compared to the dashboard, these tools focus more on data exploration supported by a tool. Some have very poor UX, but the leading ones enable users well. But they come with training. All analytics tools need training and learning the concepts before you master them. **Monitoring & Alerting** - Mikkel Dengsøe wrote an excellent piece about the challenges of getting alerts right and how to establish processes and standards to work with them. [https://medium.com/@mikldd/data-tests-and-the-broken-windows-theory-60185afaade9](https://medium.com/@mikldd/data-tests-and-the-broken-windows-theory-60185afaade9) Applying our UX factors for data frontends: **Useful**\- means in this context that the dashboard's insights can bring value to the viewer. **Useable -** means that core functions of the dashboard are easy to understand, how to select a timeframe, how to add a filter, and how to understand that there might be multiple pages. **Findable** - Important one. Especially in bigger setups (+15 dashboards), it is often hard to find the right dashboard. And a search is often not a solution here. It needs a structure and a good discoverability approach. **Credible** - Can we trust the data, and is it fresh? Good teams work with indicators for both on the dashboard. **Desirable** - Most likely the most ignored aspect (only challenged by accessibility). Can a data frontend be made desirable? Of course, it can. I can't work with BI tools with poorly designed Charts and Dashboards (looking at you: Power BI). If you have two similar useful and useable dashboards, users will likely pick the one with a good visual design over the other. **Accessible** - In the UX sense, this means making interfaces accessible for users with disabilities. This should be something you try as well. So far, the tool limits you (I have yet to hear of good screen reader support for charts so far). But some basics, like contrast and accessible language, should be possible. Another aspect is who can access the data. Data is often isolated and restricted from broader usage. Often for no real reason. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ### The data structure People who work in dashboards or any analytics tool will care about something other than the data structure. However, anyone working to prepare the data for these systems will love an excellent data user experience. How can we improve the data user experience here: #### Naming of tables, models, and columns Naming always sounds boring, but it has natural superpowers. A straightforward naming convention used everywhere can improve the data UX significantly. For example, if your timestamp fields are named load\_time, ts, time\_stamp, timestamp\_,... you need plenty of more time to find them. To solve this, you need two steps: - First, define a naming convention (e.g., timestamp is always ts when representing event data) - Second, a system that checks it and enforces the naming convention. This is a lot harder; as far as I know, no tool can provide this. This will improve the Usable and Findable aspect. #### How you write SQL Styleguides for SQL are an excellent idea. Using SQL formatters (and your custom configuration) is a great way to establish a more straightforward UX. When a query or a dbt model always has a predictable structure, it makes it easy to find things and to work with the query. As an example, here is the dbt style guide used by Gitlab: [https://about.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/](https://about.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/) CTEs might cause performance issues, but are an excellent UX tool for making a model or query more readable and customizable. This will improve the Usable, Desirable, and Accessible aspects. #### Ownership A schema has shifted, the value pattern in a column has changed, and something downstream breaks. Naturally, we are having discussions about data contracts. One thing in data contracts can be applicable much earlier: ownership. Knowing who produces the data and how to write when something looks weird or breaks are already significant. This will improve the Credible aspect. #### Semantic continuous metadata This is a broader term where ownership will also find its place. Specific goods in the producing industry have bar codes; behind them, they have an extensive history and metadata. When we have metadata about a column to know: - about the source - the ownership at the source - if this is PII data and which level it has This is where data catalogs can shine. I'm not sure how well this is supported there, but it is definitely something I will test more deeply in the future. This will improve the Usable and Findable aspect. #### Architecture and Design Plenty of problems can be avoided and solved by having an efficient architecture or design. A data model comprising 300 dbt models gives everyone a hard time in the daily data user experience. And guess what? This will also be something a data catalog can't solve. Imagine you reduce this model to 30 dbt models. This increased the UX significantly. This will improve the Useful, Usable, and Desirable aspects. ### The data itself But what about the data itself? Now, it gets more tricky. But there are some aspects we can look into: #### The right form of the data Most data types enforce the right form of data, like boolean, integer, or float. But others can become a mess. We are discussing string/varchar or their newest friends like JSON or OBJECT. They have their rightful place, but please handle and use them carefully and only where it makes sense. Using the right type would be a Usable and Credible criterion for the user experience. #### Consistency of the data When we look into string values, there are at least two types. The widespread variances are ones where the values can be unique or nearly unique. Examples are email addresses or any user-generated content. These types of data are often useless for analysis in its original form. Then, there are string columns with enumerations a defined list of values. This is great for analysis but can also be a pain when enumerations are not checked and enforced with tests and rules. Ensure that enumerations and cardinalities are tested and monitored (cardinality, in this case, is the number of unique values in a string column). Well-monitored string enumerations make the user experience Usable and Credible. Avoiding widespread variances in string fields improves the Useful user experience. The NULLs. NULLs happen, and in many cases, they don't hurt. But sometimes they do. Make conscious decisions when you should replace NULL values with something meaningful like "not provided" or "not applicable" - anything that helps the data user to understand the value immediately. Replacing NULL values with understandable values can improve the Useable and Desirable user experience. ## How to work on your data user experience Reading all this sounds nice, but how do you incorporate this into your daily work? I would start with creating checklists of things you want to cover; most of them would be regular checks (monthly or quarterly). And then, I would set up a standard survey and do regular interviews. Here is my example checklist: - **Improve Useful** - I survey all data users, asking if they work with data, how disappointed they would be if the data were gone, and their blockers and use cases. This would be every quarter. - **Improve Usable** - Conduct user tests for the significant dashboards or tools - to learn where they struggle. At least 1x a month. - Look at cycle times and time to recover for your analytics engineers. Are the times long? Do they improve? - Is a SQL style guide in place and used for queries added to the repo in the last four weeks? - Are naming conventions in place, and are they followed - check commits of the last four weeks - ideally automatically. - **Improve Findable** - Track the request where people ask for things that already exist. Ask them why they could not find it and how they tried. - **Improve Credible** - Ask about the data trust in the survey; this can also include asking for specific metrics and if they are trusted. This would be in the quarterly survey. - For essential source tables, are the sources and the source owners documented? - **Improve Desirable** - It takes a lot of work to test. You can develop a visual guideline, apply it with one or more dashboards, and ask for feedback by comparing before and after. - **Improve Accessible** - How many people have access to the data? How is your data user / all people in your company ratio? Generally, the best way to monitor the data UX is to work with a continuous survey and regular user tests and interviews. This is a must-have setup for any data team if you want to improve your impact and role in the company. ![](/images/posts/about-data-user-experience/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f806bdbbf-5be5-4fb1-b2b4-dddc13baf32f_1080x1080.png)

Quo vadis, Data Open source

Sun, 03 Sep 2023 00:00:00 GMT

I have two close friends, and we initially bonded over one book series we all have read. The books are Daemon and Freedom (TM) by Daniel Suarez: ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8dcfaeb0-dec4-495a-afac-76938e7ac9f0_794x628-png.jpg) [https://www.amazon.com/dp/B074CDHK46](https://www.amazon.com/dp/B074CDHK46) Set in some near future, a genius game developer is setting loose a Daemon program after his death, which, step by step, disassembles the capitalistic world and power structures. And promotes instead autonomic communities and knowledge sharing across these. The book, interestingly, is pretty violent, but yeah, there is no peaceful revolution in the catalog, I guess. To quote from the Amazon book detail page: > Daniel Suarez's New York Times bestselling debut high-tech thriller is “so frightening even the government has taken note” (Entertainment Weekly). But the message resonated a lot with us. To have an algorithm that provides everyone with a leveled playing field and promotes (and enforces) collaboration over harmful competition If all this would just be so simple. ## The current state of open source in the data field I would like to look at the current state not abstractly but using four examples of different open-source approaches in the data space. We can use these examples to learn more about the different characteristics of the approaches, what worked well, and what is complicated. Naturally, this is an open-source is dead or whatever post. It is an examination by observation from the outside. The outside is important here. I have no exclusive insider value since I never worked on any of these projects. My observations are based on the visible interactions based on posts, videos, announcements, pricing, and my experiences working with the tools. ### 1 - Snowplow I don't know which is the oldest open-source data product out there, which is still in heavy use. But the one I know and have worked with (almost from the start) is Snowplow. Using Snowplow, you can build an event data pipeline that can receive events from practically everywhere, handle schema validation (hello data contracts), and sink the data into either Redshift, Snowflake, Databricks, or BigQuery. ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f34690ced-add6-4b53-a8cd-3202608a57f5_2978x1186-png.jpg) ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fcf0fb796-6254-4f94-b9f8-e4a9e795520c_2610x1214.png) When I discovered Snowplow for the first time, it was like finding a hidden, beautiful island. An island that was hard to get to and also hard to navigate once landed. But when you manage it, you have an excellent, reliable, transparent, and scalable event data pipeline solution. And the great feeling that you just have set everything up and feel a little bit like the god of this island. This is the power of open source, and Snowplow is still one of the outstanding examples for me. What characterizes Snowplow: - They had the open-source offering as a stand-alone for quite a long time. - They introduced a clear enterprise product with the BDP - which is a managed Snowplow instance in your cloud account. So, the upgrade scenarios from open source to managed are quite clear: - We don't have the resources to manage the pipeline. - The event data from the pipeline becomes so important for our business that we can't afford hiccups. - The open-source product until today is still close to the BDP from a feature perspective. It is widely supported and maintained by the Snowplow team. And when you talk to the engineers, you can hear the passion for the open-source product. - Setting up and maintaining the open-source version is not easy and requires either grit and passion or an ops team that does this for a living. This makes the OS version definitely not a free version where people can test things out. - The requirements that take you from the open-source solution to the BDP are quite special and narrow. This makes revenue development difficult. One reason why Snowplow is experimenting with different additional offers like the cloud offer (more affordable BDP) and a digital analytics package. Thanks for reading my substack! Subscribe for free to receive new posts and support my work. ## 2 - dbt Just 18 months ago, this might have been the poster child for a successful open-source and community enablement. Create the buzz with an open-source solution that solves a real problem, then create a group feeling based on the pain people have to get from raw data to insights and make it a community. And then create a new job role and level these people up as engineers. This has played out exceptionally well. ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8140f764-b37e-4291-903c-6d068764fe52_2764x1180-png.jpg) ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5cc2a766-a6d9-4bd2-91b2-9ccfa8671351_2608x1138.png) Until it was time for a go-to-market strategy. Here comes the caveat: We can't simply call all this what happened before (OS, community, conference, evangelism) a go-to-market strategy. Just because there was no market involved (when we define a market as a place where buyers and sellers exchange goods and services for money.) It was more a go-to-attention, go-to-adoption strategy. And this one worked out like a poster child. But as long as a big company or foundation does not fund you, you must find a market. So far, this seems pretty hard. dbt tried the managed upgrade approach. They offer a service on top of the OS solution that provides more benefits to data teams. One essential problem here is that the benefits are too marginal. I still see the opportunity here since data teams have many problems working with SQL models. Plenty of opportunities exist to provide a dbt cloud that makes people open up their budgets. The tricky thing about **"when you pay us, you get such a better experience"** is that it is extremely hard to find the right angle for it. In the web frontend space, GatsbyJS failed with a managed approach. They tried more of a lock-in-based approach and put essential features behind the paywall. Vercel with NextJS instead succeeded. They played it open (you can deploy NextJS everywhere - quite similar to dbt) but focused heavily on building a best-of-class developer experience. In dbt’s case: Running dbt build jobs for you only improves a 5% developer experience. I don't understand why dbt was not doubling down on the editor and making it the most ground-breaking developer experience. This would have earned them more fame and paying users. You can see the impact now with the emergence of pure dbt editors like Paradime or Deep Channel. What characterizes dbt: - A masterclass of kicking off an open-source project. Yeah, it was a timing thing. But they did really good things. The initial office hours with Claire Carroll were extremely good and helped create the "we are in this together" vibe the community had initially. So, there are good learnings for others to pick up when launching an OS project. - A managed upgrade that was too weak to provide a better developer/analyst experience and, therefore, had a low pull impact. - There was no focus (at least from the outside), sidetracks with adding a metrics layer, and then not - sometimes, I thought that the real goal of the company was to create the most significant and most relevant data conference (which it definitely was). Imagine what could have been achieved if they had spent all their people and money power on just building the most excellent editor. - Again, from the outside, there is no clear go-to-market strategy. What would an enterprise dbt look like? How to make it irresistible for CIOs, CDOs, and CTOs to buy and drop it on the organization. The dbt approach became the blueprint for most OS-based data tools for the Modern data stack. For the good or the worse. This is one of the reasons we look at OS skeptically at the moment since plenty of companies out there are already running or will run into the same problems as dbt. Thanks for reading my Substack! Subscribe for free to receive new posts and support my work. ## 3 - Rudderstack Rudderstack is a really interesting one. And I can tell it's a bit of an emotional journey for me. Rudderstack launched as an open-source version for Segment (quite similar to Airbyte launched as an OS Fivetran alternative - the similarities stop here since Rudderstack is extremely good in execution). ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f1f70650e-896b-458d-8b05-3afa79f962b6_2398x996.png) ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fec693724-3a97-473e-ab3c-8f60cc2be58a_2494x1430.png) The open-source version was extremely close to Segment's solution (theoretically, you could simply switch the endpoint URL from Segment to Rudderstack but keep your whole implementation). It was designed and launched with one clear goal: Make it as easy as possible to replace Segment. This is, by the way, a GTM strategy. But for me, it felt a bit strange. For me, initially, OS was more about taking an existing proprietary solution and creating something better for free (free cakes on Idealism Island) - Jitsu (launched roughly at the same time) took that route. But in the end, Rudderstack played the initial Linux game (everyone who disagrees, please let me know; I am no expert here). Linux was foremost making Unix systems affordable for the masses. Rudderstack also launched a managed service pretty quickly—the first promise: we take care of the hosting for you. And therefore opened it up for a lot more use cases and audiences. And then, the open-source and managed versions diverged quickly from the feature perspective. It was obvious that this gap would increase over time. This was my second struggle with Rudderstack - I felt a bit betrayed by my OS image. Working with Snowplow before had me in the camp; you provide an OS version that is almost similar to the feature set, just harder to set up. But I came to terms with it. Their open-source offering is just the core of the service for all teams happy with the core and want more control over the setup—a clear, focused solution for a small audience. What also changed was their communication. Today, you have a hard time finding open source on their website. And they are still committed to their open source core - the repo is well maintained. What characterizes Rudderstack: - Looking back, that OS was a clear GTM strategy. We are here to replace an existing vendor with a cheaper, more transparent solution. - And on top of it, build out your real (money-making) service. Then, use this money to invest in sales and growth teams and initiatives. And then start to overtake with new features. - But keep Open Source as a core offer and give back to the community. - If this continues to play out like this, it is not so far away from how the big ones like Facebook or Google approach OS (on a much smaller scale). We make money with managed service (or ads in FB's case), and we have OS as giving back, hiring, and goodwill offer. - This is not the idealized version of open source, but it is still open source. ## 4 - Apache Iceberg Now it is getting interesting. I want to start with some history and context also because I did not know it and read it up. ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f984b3b15-89d5-4e7c-b494-76c50b5a72b6_2472x1140-png.jpg) ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f809aad09-5425-41bd-bff7-61fd37f21380_2590x1076.png) First of all, what is Apache Iceberg? It is not my area of expertise, so I take the definition from Iceberg's website literally: > "Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time." I understand that Iceberg is one of the formats driving the LakeWarehouse approach. You can use it, for example, on AWS S3. Here you can learn more: [https://iceberg.apache.org](https://iceberg.apache.org) A bit about the history (from Wikipedia): > "Iceberg started at [Netflix](https://en.wikipedia.org/wiki/Netflix) by Ryan Blue and Dan Weeks. \[...\] Iceberg development started in 2017. The project was open-sourced and donated to the Apache Software Foundation in November 2018. In May 2020, the Iceberg project graduated to become a top level Apache project. There are already two things standing out: - Iceberg is, when used, critical infrastructure on the one hand and a standard on the other hand. Both Snowflake and Databricks published initial support for the Iceberg format this year. This is something only possible when you have an open standard (Snowflake might not do the same with the Delta Lake format, which Databricks developed). - Like many data products, this one originated in the work of data teams in one of the big data-heavy services (here, Netflix). But instead of spinning it out and creating a business around it, they pretty quickly got under the umbrella of the Apache Foundation. This does not rule out any commercial future things. Airflow and Superset are also Apache projects and have a managed version with Astronomer and Preset. All this creates this characterization: - Iceberg is not a tool in the first place; it is an infrastructure part and a standard. Therefore, openness is an essential trait. - It can be maintained with the help of the Apache Foundation - Future commercial models are not ruled out, but it would be interesting to see if and how they would happen. - The impact and enablement is massive. Formats like Iceberg and Deltalake are already and will significantly influence data infrastructure; by that, there are plenty of ways for us to work with them for good. But they will not be the sole foundation of a multi-billion dollar company, which is absolutely fine. ## So now, quo vadis Data open source? This is now very opinionated, and I would love to read your comments on why you think I am wrong. The dbt use case has shown the problem of combining their strategy with a real GTM strategy. But they provide plenty of stuff to learn for the other data companies. On the OS and community side: how to launch, promote, and develop an open-source product. On the business side: What do you need to provide in a business strategy to grow your business and not just GitHub stars? My take: Find your OS audience's real and biggest pain point and focus, focus, focus on building something experience-changing as your managed service. On the other hand, we will continue, or we may see a growth of open-source projects incubated by big companies and foundations. This is common in software development. Just see what OS projects Facebook is maintaining and supporting, like React. We can see first glances of this by Confluent's move to acquire Immerok, one of the main maintainers behind Apache Flink. This space is definitely something I will investigate more since it is the most solid model for OS projects. Rudderstack's approach is still an interesting blueprint if you want to shake up a solid category and use it as a clear GTM strategy and then, at some point, treat OS as a contribution. In a nutshell, I think all companies following dbt's approach need to double down on their GTM strategy. In step 1 - make sure to collect a ton of feedback and insights from the paying customers and not the engaged OS users. Then, isolate the biggest problems and your hypothetical solutions for them and do product work (build, measure, learn). But don’t wait too long to work on your GTM strategy. Look out for early revenue learnings. ![](/images/posts/quo-vadis-data-open-source/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f01b40557-8d29-4b62-a966-b1905b657341_1080x1080.png)

After the Modern Data Stack: Welcome back, Data platforms

Wed, 16 Aug 2023 00:00:00 GMT

Data platforms are the next iteration of the Modern Data Stack setups. Let's explore why. ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5376ab72-b14c-47a4-9097-c39e7592fb76_1456x816-png.jpg) So we are talking a lot about the Modern data stack again. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. What is it right now? Is it dead, irrelevant, post, pre-post, cubistic? Let’s assume that Modern Data Stack is just about data stacks. We don’t care about any artistic movement here. But we still need to start with some context about the Modern data stack, what made it look modern, what it was before, and what defined it. We need this to look for weaknesses and benefits, preparing us to take the next step. Luckily I didn’t need a long research and investigation because Matthew Arderne and David Jayatillake wrote excellent posts about the beginnings of the MDS. What was before, and what it changed? [The Way of Ways](https://groupby1.substack.com/p/the-way-of-ways) — The beauty of ideas is that they cannot die. That said, many consider the Modern Data Stack to have developed a bit of a rot I thought to pull out the eulogy I've had in the back of my mind for a while. It is a Tour de Links that follows the journey of [The Modern Data Stack is Dead… Long Live the Modern Data Stack - Part 1](https://davidsj.substack.com/p/the-modern-data-stack-is-dead-long) — The first time I heard the term "The Modern Data Stack" (MDS), was in a job interview. Dylan Atlas-Baker was the interviewer - if you don't know Dylan, he has been part of the dbt community from early on. Dylan also used to look after the London dbt community, at its start. Thanks in part to Dylan, the London dbt community was the fastest-growing in the… I highly recommend reading his full series (either now or after reading this). But I summarize his definitions as a baseline here: 1. **Enhanced Productivity and Capability**: Using MDS tooling, teams can build and maintain more than they could with legacy systems. The MDS allows for the expression of all necessary transformations, concurrent handling of various workloads, and defining metrics and dimensions for company-wide use. 2. **Rapid Iteration and Delivery**: The MDS enhances the ability to iterate and deliver solutions quickly, serving many users and handling complex analytics tasks that would have been challenging with previous technologies. 3. **Cost Efficiency**: Despite the potential high costs associated with platforms like Snowflake, it can run cheaper than previous data warehouses and optimization. 4. **Increased Data Utilization Across Roles**: The MDS democratizes data access and usage. Engineers, Product Managers, Finance professionals, Marketing teams, and even C-level executives can leverage the power of data, leading to a broader return on investment for businesses. 5. **Flexibility and Scalability**: The MDS can adapt to organizations' growing and changing needs. As more people realize the potential of platforms like Snowflake and Looker, they begin to use them more extensively, indicating the scalability and versatility of the MDS. ## The evolution The whole dead, next, better, Blabla discussion is only fueled by marketing. And to be honest, marketing could be better. A version has problems, so I introduce version B as the better version and combine this by defining a new era and being bold and confident. Congratulations. But data stack has a natural evolution, as in every technology. We introduce a new concept that solves some of our problems, and then different things can happen: ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4b362487-5606-414c-801c-86c6514ad55b_1508x1076-png.jpg) The new solution could solve a more significant problem nicely but cause 1-2 new issues from the start or increase an existing issue. So it is a classic trade-off. One that sometimes can still be valuable if the solved problem is big enough. Or the solution works well under specific circumstances but gets problematic in others. No-SQL DBs are a good example here. The new solution may be solved without introducing new problems immediately. This is quite good. These are usually solutions that can generate a bunch of hype. Snowflake and dbt are good examples of that. Both enabled a different way to work with data, which helped new use cases and, in the sum of everything, created new problems. The whole modern data stack is the second type. The democratization of data impact was vast and essential. It made powerful data setups beyond analytical or other closed systems accessible and affordable for so many more people. It clearly defined and opened up an industry. And, of course, with all the scaling up and new ideas that have been added and introduced, we see plenty of problems: some small and some business-threatening. And this is when evolution is happening. Just based on the pure drive of engineers. We see something that is not working, so we build a solution. And in one of 100 times, one solution is so good and well done that it creates an evolutionary step. ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f1ff2d23b-a3e5-41db-bc6e-f512efbeaa8b_1180x684-png.jpg) The funny thing about evolution and potentially the one often missed out. Evolution is never linear. It branches out, explores, and creates massive amounts of variants. That is the beauty of it. But it is also why there is never “the” next. But hundreds of next. And out of them, at some point, we will see a step changing the ways in such a good way that we could declare it as a new paradigm. We are not there yet. But we can already see the branches, which is exciting. ## The problems Spend a week on Data LinkedIn, and you will get a feeling for the problem of the modern data stack. It creates a lot of engagement to write about pain and issues so that you will see plenty of them. This is good; we must make them visible to create these new iterations. Let’s pick three core problems: ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fde5a5dc7-d9de-4509-b1ad-c27007f11537_1166x770-png.jpg) ### Missing design or architecture I guess this should have been the one that was the easiest to predict when the MDS happened. Of course, when you open up a space and make it accessible for many more people, you invite chaos. It’s quite a natural step. Setups that started small with a little bit of Fivetran and dbt, when scaled up, can grow into maintenance, quality and, finally cost monsters. The underlying cause for it is missing design principles. And not that they are missing in general. Talk to the folks that have done data modeling for years; they have design principles. And you can see them suffering on LinkedIn a lot by explaining simple things to the new audience. But it is also wrong that we know and have everything already. The environment has changed and keeps evolving as well. Just take event stream data - most existing data modeling approaches don’t have an answer to how to work with them effectively (how to store them, yes, but that is not the problem). So design and architecture need to evolve. And potentially in at least two different ways: We need to take a snapshot of the status quo, explain the landscape in its vast variation, bring it down to foundations again, and then apply common design principles. The [data engineering book](https://www.amazon.com/Fundamentals-Data-Engineering-Robust-Systems/dp/1098108302) by Joe Reis and Matt Howley is a great example. It is the most needed book for the current state of data stacks. ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f45ed2415-3595-4d98-b731-c82358ae9514_548x704-png.jpg) We also need to evolve common principles to match new challenges. Just because the circumstances have changed and therefore we have to provide new things. Let’s take my event data example. We can easily store it properly with common principles. But we now have business teams who ask entirely different questions that can’t be answered by classic BI principles anymore. Therefore we need work on Ahmed’s [activity schema](https://www.activityschema.com) and the [temporal joins](https://www.activityschema.com/temporal-joins) to provide a new principle. ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f53bc9d8e-4d7a-4f35-9db4-36b2dcc7adb0_2386x788.png) ### Data quality issues all around the place This might be the fastest issue to appear and become a real problem. Also, because it was always a problem. Before MDS, there were data problems. But they were potentially smaller. If you scale up systems with minor issues, they usually become big. So the MDS was not tired of inventing new categories to throw them at the problem. There were times when it looked like there was a new observability, governance, or catalog product launching every day. But the same with design. Some things could be solved by applying common principles (like tests), but some problems need to take a step back and find solutions closer to the root. All the work around data contracts is such an example of taking an old principle but evolving it to a new situation. [The Rise of Data Contracts](https://dataproducts.substack.com/p/the-rise-of-data-contracts) — 👋 Hi folks, thanks for reading my newsletter! My name is Chad Sanderson, and I write about data, data products, data modeling, and the future of data engineering and data architecture. In today’s article, we’ll be looking at one of my favorite subjects to write about: Data Contracts! What are they, what problems do they solve, and how do you use them. … ### Integration hell I can still remember one slide I had in a presentation I gave to a board of directors for giving us a budget to build a new data stack: The beauty of best of breed. - No more lock-ins - Fewer costs because we only buy what we need - Always the best solution for every phase We got the budget immediately - everyone was on honeymoon. They introduced two new stacks over the next five years. On paper best of the breed still sounds excellent. And you will see raving LinkedIn posts, where people show their cool (primarily open source) data stack. Look how cool I can assemble tools. And I must know it. My first LinkedIn post that went viral was about something I called the Hipster Data stack. People love it mainly because they don’t have to manage it. My problems with many tools so far: #### Constant context switching This is the invisible one. But it is serious. Imagine you can do all of your work in Excel. Yes, it needs training to get better and faster. But you know that after that you can do everything. That is exceptionally cool because every small new learning immediately benefits all your work. Now you have to do this for 5-7 products. You need to understand them and keep up since they release new things and change things. And my scariest part (and it is not fun for the people involved) is when a product gets out of business. It was switching context multiple times a day because introducing something new that needs changes in all the tools costs so much energy. #### Play them in sync. It is more than just an orchestration problem. For this, you can add an orchestration tool (we have tool number 8). It’s much more. Let’s take a simple problem. One of your metrics does not match. My favorite example is ad spending. You know how much you pay for it, but your dashboard tells you less. In theory, the problem can be caused across your whole tool stack. Is the issue already when loading it to your lake, when it is ingested in the warehouse, or during the 20 transformations, or is it a BI cache layer problem? You spend days finding it. With these problems, we see plenty of people and companies working on ways to improve or overcome them. I will now look at a specific evolution thread that is a natural response to what the MDS did in the first place. ## Hello, Data Platforms; great to have you back. While the MDS broke things up and made them accessible and affordable, a natural pendulum movement is to go back and bring some pieces together again. So, of course, we see this evolution step now getting more attention. But since we are on the evolution track, data platforms today differ from the ones the MDS blows up. ### Next-gen data platforms - what makes them different? #### Closed but open ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8838c3f3-802f-4ece-91e8-14fe5a5ae5c9_1006x816-png.jpg) Legacy data platforms were often a closed environment. A closed environment has a lot of benefits. The major is simply control. Control from a technical perspective means you can ensure that no bad things happen since you know what goes into the system and in which way. Control is also great for data quality since you control all steps for transformation. And control is excellent for commercial success. Once on the platform, it is hard to move on, so you keep buying these user licenses. But control means you control the use cases the platform supports. And here we have the biggest problem. Your customers are forced to live with the use cases and the standards you provide them. This might work out great for one team, but for another, it is like running into walls all the time. New data platforms still give you control, but more is needed. The control here is mainly on the integration steps or the meta layer. Here they need to control things to give you the platform's benefits. But they are as open as possible. So that you can integrate other services, sources, and endpoints - the ones you need to enhance your setup. What does this look like? I will do my examples based on the [Keboola](https://www.keboola.com) platform since I know this one best. But all others (I will list them later) should work similarly. In Keboola, I can use their provisioned Snowflake instance. This is great because they will handle all the nasty things (at least for a small data team), like permission handling for me. But I can point Keboola to my existing data storage if I want. Or I can run a hybrid with my existing one and some stuff on the Keboola one. It's a straightforward configuration. I can add custom transformations in SQL or Python directly in the Keboola platform. But if I want to, I could have my transformation service trigger it from Keboola, access and transform the data, and then pick it up in Keboola again. Once my data is in the right shape, I can push it everywhere where I like it to be. So, in the end, whenever I decide to keep things in Keboola, I get a fast and reliable integration as a reward. But if I have something custom, the platform is flexible enough to support that. Sometimes with some additional work on my end. But doable. #### The additional person on my team In football, we mention a 12th person in your team when you play at home and have a massive and loud crowd, like in [Liverpool](https://youtu.be/lt_DdNUSF2I?t=111) or [Dortmund](https://www.youtube.com/watch?v=QwLfNsVQ93w). It gives you additional power. For me, the new data platforms are just like this. I often work with companies with 1-2 data team members. And they do the complete service from data pipelines to the dashboard factory. Hiring more people is often not an option. Data people are expensive, and even when you have the budget, they are not sitting in cafes and waiting to get called to the pipeline. And we all know that you need the experience to solve things quickly and scaleable. The new data platforms are like an additional hire. This hire doesn’t come for free, but it/she/he can start on Monday and likes to scale like a pro. I would like to give you my favorite example. Usually, it is a good idea to move data around in incremental updates. You don’t want to load all the data all the time. But the total is not so easy to implement from scratch. You must extend your data with proper columns to help indicate the delta payload. In all data platforms, incremental load often comes with a checkbox (and a select to select the column that serves as criteria (id or timestamp)). So I can manage the costs and time of the pipeline without extensive experience. #### The meta power ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9277c164-c52b-4e6d-ae37-a649801b8897_1002x542-png.jpg) The meta-layer is the secret (or maybe not so secret anymore) missing thing from the modern data stack. So what is the meta layer? I do a short definition here since it would require an entire post to cover it in depth. The data is moved or changed when we do things in a data stack. All these transformations and copies create an invisible chunk of new data - the metadata. - How long did the load take? - How many rows were included? How many new entries (when incremental) - How much CPU runtime was required? How much did it cost? - How has the schema changed? You can get additional analysis, alerting, and automation based on that. We have a problem with the breakdown of this chart; the Sydney branch is missing. Oh wait, the cardinality is different after this transformation step. It was solved and deployed in 60m. For me working with metadata is just getting started. You can’t see it entirely played out yet in the data platforms. But they have the ingredients to go all in on it. Classic MDS tools would have to integrate the metadata from other services (if they can even get them). ### So what are these new data platforms? We see at least three different types at the moment. #### Integrated data platforms These platforms offer integration, transformation, and export in one platform with an integrated orchestration. Just as described above. They are usually open enough to run parts of your stack outside their platform. But you get all the benefits we mentioned before. Vendors: - [Keboola](https://www.keboola.com) - [Rivery](https://rivery.io) - [Weld](https://weld.app) - [Mozart Data](https://mozartdata.com) #### MDS tools in a box You can compose your data platform by combing MDS tools like Fivetran, dbt, Preset, and many more. The benefits are mainly on the admin side of things. You can add new integration with a click (and no talking to sales), get one bill (trust me, this is a significant help), can manage access roles in a central place. Vendors: - [5x](https://www.5x.co) #### Open source MDS tools in a box Bring different open-source solutions together, host and maintain them and make the transition between them more manageable. Vendors: - [Datacoves](https://datacoves.com) ## Summary and looking into the future After doing “classic” MDS projects, I did two projects using Keboola. Both were for clients with small data teams. We got everything set up quicker and spent less time working on the stack and more by adding new sources or creating new data assets for the other teams. Is it always like that? Most likely not; as always, it comes down to the requirements. I want to see more things from the metadata to give me cost and resource control and better guardrails before breaking something. I saw some promising demos, but they have yet to be generally available. Is this the only evolution of the MDS? Of course not; as written, it is one thread. I will cover another thread in my next posts: Native Apps on Cloud Warehouse. ![](/images/posts/after-the-modern-data-stack-welcome/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f0ab403b8-398c-46a9-a699-0e4dda447839_1080x1080.png) Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work.

Use feature analytics for better products

Mon, 31 Jul 2023 00:00:00 GMT

Digital products need feature analytics to benefit from iteration speed and unbiased feedback. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f20df5416-0463-4782-9949-c475f038d1bf_2048x1024-png.jpg) As a start, let's imagine a product without features. That may not be possible. But for sure, not all features are visible. But they are still features. And features are the one thing most product and engineering teams think about most of the time. Features are 90% of all the work; sometimes (or even more often), the teams and users go crazy because of them. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. Features loved to be listed and compared with the competition (one of the most ridiculous tasks invented). But we still have these feature list pages on almost all Saas websites. And to be honest, I look at them from time to time. Usually, when I have a specific feature in mind and want to check if it is available in the plan I am looking at. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f2eb6bfea-03ba-47e9-8aa5-14c5a1aa1cde_2320x782.png) As always, before we jump into any weeds, we should start with some definitions. To have a better-shared understanding, we will look here at features for digital products. > "In the context of product development and marketing, a product feature refers to a specific characteristic or function of a product that provides some benefit or advantage to the user. These features are the individual components and capabilities that make up the product and differentiate it from other products in the market." _From ChatGPT (no Wikipedia definition found)_ The **function** stands out for me, enabling the user to do something. This is important since we want users to do things in our product. And this is connected with a benefit or an advantage. Or even making progress. This is helpful since we don't develop features for the sake of a feature. A feature is an individual component. This makes it a bit harder to define. What is a component? We leave it like this for now and come back to this later. ## How to define a feature in your product? From my product experience, there are no clear boundaries to define a feature. Different teams and persons might define features slightly differently by merging or separating features into bigger or smaller ones. Because there are different levels of features. My approach is to start with the product entities and then examine the specific actions. And sometimes, from there, identify particular forms of activities if they make a huge difference. I also like to define how a feature has been used successfully and have a list of feature success events, which is helpful for many things. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f891ea99b-378e-40bc-8915-b0317835c238_1162x648.png) To make it more visual for us, we look at some examples. Let's take **Substack as our product**. What kind of entities do we have? ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9b0c1b3c-5051-4ef6-8afc-2c2b0611c6b0_1732x836-png.jpg) **Account** - is the atomic item. You need at least an account to create a substack, but you can also have one without a substack. A **Substack** itself might be a feature, but maybe not. This is harder to tell. But it separates the account from a substack. And I can be a member of several substacks. So I would vote to have a Substack as a feature as well. (in product marketing, the feature would be more: multiple substacks). **Website** - the website is the visible part for readers where they can find and read new content **Newsletter** - we can send our new posts as updates to our subscribers **Subscriptions** - works for both sides; you can be my subscriber or subscribe to my substack. **Post** - belongs to a Substack and is the core growth entity. These are the core product entities for me. But we can add some layers under it to investigate them deeper. The deeper levels are a way to create a hierarchy, and there is no right and wrong to it. Sometimes an entity on a precise deeper level than another can become a core entity because it is strategically vital. Example: **Substack > Dashboard** - the dashboard is an important feature but less critical for driving growth. So for me, it is a child entity of a Substack. **Website > Custom Domain** - A custom domain is a feature entity connected clearly to the website. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f84a23e51-94c4-4603-8f8c-b0e5c367e420_2400x1258-png.jpg) For an entity, you can then define the core activities. These activities usually represent a classic feature lifetime funnel. So for the posts, it would look like this: ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe7698dc8-1338-42f1-ba5f-72d4d1961dcb_2572x798-png.jpg) The activities describe a typical post lifetime, from getting started to generating another subscriber. Here for the posts, it’s also the core flywheel. The more posts your users create, the more you grow traffic and subscribers. So the core success activity here is: “Post published”; a secondary would be “Post created subscription”. If you sketch this out, this catalog can increase based on the depth you want to add and how many new features and activities you add. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4a70fdc4-882b-48dd-8be8-be57c583f560_1364x664-png.jpg) Please go slow here since every added item is working to set up and maintain. And we now want to add metrics and create a dashboard to monitor the performance over time. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Feature analytics With the catalog in front of us, we immediately have some questions: Which feature is used most by our users - an obvious and immediate question. And I also like to start my discovery from there. This will usually happen on the feature action level. The interesting design question is how deep you go with the actions. Or which activities do you pick for the analysis? Which features are essential to convert free users to customers - now we are getting to something interesting. It is necessary to look at the number of times a feature is used. This analysis is pretty complex and always has room for interpretation. But it is beneficial to understand usage and conversion patterns. Is the feature usage growing - it has at least two sides: total growth - not interesting for analysis, but great for PR. And relatively based on accounts - this is more interesting to see how new onboarded accounts impact the feature performance. You can create retention reports based on specific feature usage for a deeper picture. There are at least two typical ways to do a feature analysis: A standard feature data map or dashboard to help everyone to get a quick picture of how features are used in the product and some simple trends if things are changing. A deep analysis of feature performance and cross-feature effects. Are specific features connected? Business questions trigger this, discovered data patterns, or sheer creativity (the core trait of all good analysts). ### How to create a feature dashboard The deep analysis depends too much on the different use cases, so we focus on the feature dashboard. I usually create two levels of a feature dashboard. The first is a high-level overview of all core feature entities, and then a detailed one for the entity features I am currently focusing on. #### The high-level feature entity overview ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffd6edb1a-2af4-41e6-b44a-9bea60111f1f_1240x432-png.jpg) I usually create these daily check-in boards on the feature entities level for each product I work for. And I always use this structure. I add quick filter buttons to quickly click through the relevant periods (not all tools support this). Then I combine the total number and a time series to show detailed development. For some metrics, I like to add a relative metric that shows the total metrics in relation to a baseline total (like total Substacks): ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7186af55-3834-4752-a58b-6683bbaae78f_1000x324-png.jpg) This indicates I am growing aggressively for new Substacks, but the post frequency needs to catch up. So for our Substack reporting, it would look like this: ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f664f9190-ece3-4cec-ab3d-0964b2e072f1_1028x1312-png.jpg) #### Feature entity overview dashboard Now we go one level deeper, select a specific feature entity, and create a dashboard. The charts are very dependent on the entity itself and our current focus. So there is no complete blueprint available, more a collection of different chart types I use. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb5df3f92-b1b8-4b3e-8326-1e5e9d11a6ab_1716x854-png.jpg) If possible, I like to start with a funnel that shows an entity's lifetime with the activities. The steps depend a little bit on what I want to see. Here I might also end up with subscriptions that the post has generated. But this would be misleading. Funnels are excellent to show if steps have been done at all. So this funnel here is more interesting to see how people struggle to publish a post. So we might remove the last step in the funnel. The next one looks at how we get new substack users to post regularly: ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6135f839-d7ea-441f-8a3c-61d43a15e4c0_1718x664-png.jpg) I am a massive fan of retention reports. They are flexible, and I see development and not just one-offs. This one gives me data about how good we are at getting new Substacks to publish regularly. This is important for us to get new users into the publishing rhythm. The retention table helps me see how our initiatives are doing; we are deploying to activate new accounts right now. But retention can do more: ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f2f922263-9cad-49d0-8b62-4a883f508ef1_1774x670-png.jpg) This is a good check-in retention chart to see how different groups of writers do (group by publishing activity) and if we see risks that our power user groups are churning. This chart is even better when we add a second dimension to show the last five weeks, allowing us to conduct a trend analysis. Finally, I like to add some criteria charts that show breakdowns of the product entity. This is helpful for all people working with the feature to understand how posts are split up. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f799bbf9d-2be2-4df0-8f21-38d0db09143d_1834x1250-png.jpg) As said. There is no right version of this one. These charts always represent our current focus for the product feature entity. Here the focus is on getting people to write and see the impacts of SEO. But what if we now run different initiatives to improve the SEO impact of posts? Is that a feature or something else? Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Initiative analytics Now it gets interesting. We mentioned already that product teams are constantly working on features. But do they only develop new features? Of course not. Most of the work is the refinement and extension of existing features. Outstanding product teams are good at iterating on existing features to improve them with each deployment. So this is different feature development and needs different kinds of data. I call these feature initiatives. Each initiative has a clear objective of what it tries to achieve to improve the feature. And based on that, we can build a feature initiative dashboard. I am a huge fan of feature initiative dashboards and could spend most of my time just checking them. They are showing the lifeblood of product development. Ideally, the deployment of the feature initiative is done by using feature-flagging and behind an a/b test. This will get us substantially better data to work with. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f068e52f8-919d-4605-8c94-4283b14cb28b_1712x480.png) We start with some context data. How is the rollout going on (assuming we use feature-flagging)? To always know what the baseline looks like. The following part again is dependent on what the initiative should improve. If we had the initiative to get new users to post on a weekly base, we would use a retention chart here to compare users with the features vs. non-feature users. Here we need to see the impact the changes have on SEO traffic. Therefore we are comparing similar posts with the new features deployed vs. non-feature posts. And we are looking for increases in the feature-flagged posts over the old posts. Ideally, this data comes from an A/B testing tool (I can highly recommend [Eppo](https://www.geteppo.com) here). ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7d245bb3-f243-421e-b3ef-7e8ff2fab110_1028x778-png.jpg) So you could pull the confidence levels from the A/B Testing tool (to automate this is always a bit tricky). The feature initiative report/dashboard aims to have a check-in page where everyone who worked on the initiative or whose goals are affected by it can always check how the rollout is going. If you work with something like Eppo (I guess other A/B testing tools might support this as well), you can use their feature reporting: ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7a1b2e4c-04e6-4807-b55e-9b87712fcf39_1930x858-png.jpg) ## Final thoughts Feature analytics is a great way to start if you do not know what insights you should get from your product data. It is also an excellent way to consolidate things if you have plenty of different charts and analyses but need a structure for new people to see them in the right context. I might end up with a dashboard for each feature entity, giving everyone an excellent foundation to learn how a product is used. If you have enough event data volume for a/b testing, I would always go for feature-flagged releases. This gives you the confidence to roll out new improvements and enforces a very experimental, iterative, and data-driven culture for product development. ![](/images/posts/use-feature-analytics-for-better/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd1c65aae-09f3-4328-ac2c-329f3382ae1a_1080x1080-3.png)

What is beyond event data?

Tue, 18 Jul 2023 00:00:00 GMT

Activities are the missing business layer for your event data. ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f0e563ba8-89e8-430d-8b5a-ae0a1dd831f7_1456x816-png.jpg) Yes, there was a time when digital analytics data was just pageviews, and it was fine since each click caused a new page loading— a simple and straightforward system. But we moved on. Websites to use asynchronous requests (so no page loads after clicks), mobile apps with in-app interactions, backend processes with emitting events, streaming queues, and webhooks. Not so simple and straightforward anymore. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. My analytics Eureka moment was when I discovered tracking events with Kissmetrics. It was before Google Analytics introduced them. And I was immediately hooked. Defining explicit events when something explicit has happened was simple again. Someone submits a form, and I send an event “Form submitted.” Events became my first-class citizen in all my tracking setups. Pageviews became secondary. And with the new breed of analytics products like Amplitude, Mixpanel, or Segment, custom events became the standard. I call them custom events here since events in the old (now sunsetted) Google Analytics or in Piwik Pro have a fixed structure of what kind of context information I can send with them. A proper event for me has no limitations. ## What is an event in data? As just pointed out, even in analytics, there are different forms of an event. So let’s dissect this a little bit so we have a shared understanding of which is important for the next steps. In its simplest form, an event has: ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fba433b13-b742-49c8-a3e7-d9984573c15c_1354x478.png) - A timestamp - A unique identifier The timestamp is essential since event data enables us to understand sequences of actions from users or systems. We need a timestamp to bring this all in the right order. The timestamp itself can become complex, but this is a different topic for a separate post (as a teaser: a client timestamp and a server timestamp are different - and if we analyze micro sequences, milliseconds matter) The identifier is often a programmatically generated unique id. Unique is essential to handle potential deduplication. In this form, the events are not telling anything. They are technically valid but missing at least one thing for analytical purposes: **a meaning.** ### Give an event a meaning. So let’s extend it: ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f31216046-d85c-4cd7-a92d-765b302690d9_1396x494.png) - A name Please keep the concept of meaning in the back of your head; this will play an essential role in what will come next. We give an event a name. With this name, we try to describe what kind of action triggered this event as best as possible. There are famous blog posts on how to name events, and I even currently write a book about it (it’s a complete chapter about it). The reason is that we are now leaving the pure technical space we had before and entering the area of human mess, which is language. There are books written about the mess language creates. But we can also do it in a simple version. We have an event and named it “Newsletter subscribed.” Now we get around and ask people what this means. And we ask beyond the obvious, “Well, someone subscribed to a newsletter.” Did they submit the signup form? Have they been added to our CRM? Have they done the double opt-in? It’s quite impossible to name an event that would answer all these context questions. Maybe “Newsletter double opt-in saved” would be technically more precise. But have fun letting business teams work with that. We pick this problem up in the next paragraph. ### Adding event context One way to make the meaning more precise is to add context. And we do usually do that by defining key-value pairs that we add to the event data. So our “Newsletter subscribed” event could be extended like this: ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f2f2eadc7-d971-476f-942c-3b344981e006_1390x768.png) These event properties help us better understand the event's context and meaning and give us additional dimensions to analyze event data (here, we could generate a chart that breaks down subscriptions by topic). In most analytics systems, an event looks like this. You can see a user id often attached to group events to a specific identifier. This identifier concept can also be even more complex, but in the end, we add ids that we use to join more context later during analysis. ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3c9e6d81-c624-49b5-808f-15e6340285f0_1420x914.png) ## The problems with event data We already discover a glitch in our event data setup in our definition—the meaning of an event. Most problems I discover when working with a client on an event data setup are based on this. The other problems are technically around the event collection. Let me give you my list of event data meaning problems: ### Duplicate events ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f77e92bc4-0c58-4176-8a6b-f7e54a6bba94_1674x428-png.jpg) You want to analyze the impact of newsletter subscriptions and the discount users get on your orders. So you check your current setup to find proper events that would allow you to build a cohort of subscribed users who got the discount. You find these events: - “Newsletter clicked” - “Newsletter start” - “Discount added” Now you are still determining which would help you work with it. The docs need to be updated; no one can tell you which one to pick. And the developers tried to find where these are triggered, but you get mixed feedback. This analysis is important since you will initiate revenue-relevant activities based on this. So you must be 100% sure that the events you use are triggered for the right context. So you ask the developers to add a new event, “Newsletter subscription Discount added.” Your analysis is safe now, but you decreased the quality of the event data setup because technically, “Discount added” had a property “cd” which holds the discount code and is triggered when a discount is applied to the cart. Now you track the same thing two times. ### Too many defined events ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5bcdedcd-f30b-4c89-988e-194bf7253e99_1456x816-png.jpg) The scenario above led to too many events. But there are more reasons. The “be on the safe side approach” is also causing the issue. You are unsure what kind of event data would be needed for analysis. Or you want to do a proper job and show people you took the setup seriously. This always ends up in a high number of unique event names. And the high number starts for me at 50. This approach gives you great event names like these: - Layover top box close button clicked - Navigation second layer item clicked and these are still good examples. I saw far worse ones, and I guess you too. And concerning meaning: More events don’t make meaning easier; it usually makes it more complicated. ### Hey, I’m from Data. I speak a different language. Take a concept like revenue and ask all teams in your company how they would define it, and you will have X different definitions. All right in their context, but still different. When I work with data teams to design event data, I always recommend taking their first version of names and talking to plenty of teams about how they understand it and if a different name might be better. This usually improves understanding. But the knowledge of details can still be a problem. For a product team, “Task comment added” can be understandable, but for a sales team, it is too far away from their daily work (what does this mean for me). ### Enter the world of event data abundance. ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3c90230e-da3d-4c7b-b81c-6d67eded0821_1272x1026-png.jpg) When I started with event data, we explicitly added them to our front-end application. So one place where they were collected. At some point, we started to send some events from backend processes. Either it increased the quality, or the action was solely happening in a backend process. [The extensive guide for Server-Side Tracking](https://hipster-data-show.ghost.io/the-extensive-guide-for-server-side/) — I would love to learn if more people had a server-side awakening event. Mine was over six years. It was not my first time sending events from a backend, and I did this in projects before, mostly sending refund events to Google Analytics. But this time, it was a paradigm shift and a good example of why developers should always play a core and active role … Then we thought plenty of core actions were based on database interactions, so why not use this data? There is this concept of CDC (Change data capture). We can use these logs to derive events. If a new record is added to the member’s table, this is equivalent to a “member created” event. Implementation and quality can be pretty good with that. Then we wanted to know what happens in the third-party tools we use. Well, they offer an army of webhooks, so why not receive them and add them as events? Our development team has started to use streams to trigger different software processes. We can subscribe to them and select meaningful events for analysis. Now we can easily have 1000 unique events available. And some of them could be for a similar thing but at a different point of the sequence: - Newsletter form button clicked - Newsletter form submitted - Newsletter record saved in DB - Record synced to CRM - Created in CRM If we added all these just into an analytics system, we would kill any usability. This final problem was the tipping point for me to think about a different way to handle this. ## A new layer to bring meaning: Activities ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8081db95-26cc-4b46-bafa-10cd71d9af12_1456x816-png.jpg) Adding a new layer is something familiar and happens quite usually. Just have a look at software engineering. We don't write assembler code anymore; we added layers on top of it. Many layers, to be precise. The same happens in data as well. [dbt](https://www.getdbt.com) is, in the end, a layer on top of the native SQL engine of a database. And [Tableau](https://www.tableau.com) is another layer on top of it. A concept that has started to get more traction again is the semantic layer or, more precisely, the metrics layer part of it. The metrics layer is not a new concept but receives a new interpretation now. One core function of a semantic layer is to serve as a translation layer between business requirements and data availabilities. In the case of the metrics layer, you define a metric by name and how it should be calculated (and formatted), and then you define which data should be used to calculate it. It is a bridge document. It frees up the business to know anything about the database source and helps data understand what metrics the analysts need to do their work. [David](https://www.linkedin.com/in/david-jayatillake/) wrote an extensive 5-part series about the semantic layer, which I highly recommend reading: [Semantic Superiority - Part 1](https://davidsj.substack.com/p/semantic-superiority-part-1) — I have written about semantic layers before, in specific contexts, but I thought it would be worth going back to basics and explaining the “what” and the “why” in depth - as well as the “how” - in this series. What is a semantic layer? A semantic layer What if we introduce a part of the semantic layer: the activity layer This is also familiar. [Ahmed](https://www.linkedin.com/in/elsamadisi/) is doing this with his [Activity Schema](https://www.activityschema.com) approach, and they have an activity layer natively in [Narrator](https://www.narratordata.com). But you see it only sometimes in public posts, which is also because even events are secondary items in most data models. But this is a topic for a future post. So, let's have a look. What would an activity layer look like? ### The activity layer - a first draft Ultimately, we want to achieve something similar to what the metrics layer does, bridging the data with the business requirements. But here, we don't map database table columns to metrics, but we map events to activities. ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9e70ac3e-ceac-4e88-bbe8-eb55416d380b_1958x634.png) In this setup, activity is an event on a business level. Mmh, this is still too abstract. Let's retake our example. An activity could be "Newsletter subscription created." This is an activity that is important for the growth team. We now have different technical events where we can decide which defines this activity. We choose the webhook event "Subscription created" from our CRM. This mapping gets into our activity layer definition (we get to that in a second); therefore, it is easy to check for everyone if you want to know where the data for this event is coming from. We could change the CRM vendor; we could move it in-house. The activity will always stay the same, but we change the mapping underneath (versioned). So people can keep the analysis and report based on it the same. This is a direct mapping example, but we can do more complex examples. #### Using filters Maybe we want to create a "Lead newsletter subscription" (we could also handle this with a property - but this is important for us). We then can use a filter on the primary event from the CRM that makes sure we only map the events from users who still need to be customers. Again, all in the same central place, visible to everyone. ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fee2b066c-7bc1-4013-a919-0c8bd4e41fe3_2020x672.png) #### Using sequences We can even use sequences of events for the definition of one activity. Something we already know from product analytics tools like Amplitude or Mixpanel. Let's take an activity, "First value generated," which can combine different events indicating that users got their first value from a product. ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd041afe4-8d15-4b71-96b0-e60ef35fa706_1744x790.png) #### Coalesce event data Or we coalesce event data. So when we have two sources for a similar event, both tend to track only some of them. We can define a coalesce to use event source A if present, and if not, use source B (as long as we have a shared identifier). #### Refactor events This new layer also finally helps us to refactor event data setups. You can reduce unique events, merge events, rename events, and improve their definition and quality over time. Ultimately, the activity layer decouples the old paradigm: event instrumentation = event analysis. ## How to implement an activity layer Most likely, this will be part of your data model. You add queries that work on your raw data and define the activity. In a dbt setup, that would be a folder serving as the activity layer where you work, e.g., with the raw event data you get from [Snowplow](https://snowplow.io), [Segment](https://segment.com), or [Rudderstack](https://www.rudderstack.com). So far, I have used the ActivitySchema concept (in a slightly different form than proposed in the v2 schema). This works great for me. I still think about a real layer approach where the activity layer is a configuration (YAML or JSON) that translates to SQL. We can also work in batch mode compared to most metrics layer implementations that do this for ad-hoc queries. But a configuration would also enable streaming use cases. Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work. ## Final thoughts and why I think this important We move to a new stage of event data. Event data before was often directly connected to an analytics solution. This is breaking up. Only slowly but steadily. And we see more data use cases for event data. - We can now work with Data Warehouse event data in product analytics tools. Natively with [Kubit](https://kubit.ai) or synced with [Amplitude](https://amplitude.com) and [Mixpanel](https://mixpanel.com). - We can run a full-fledged experimentation stack on top of our DWH event data with [Eppo](https://www.geteppo.com). - You can sync the event data to [Customer.io](https://customer.io) or [Vero](https://www.getvero.com) from your DWH. - You send modeled activities as different conversions events to ad platforms like Google Ads - You can qualify leads based on DWH activities in [Correlated](https://www.getcorrelated.com) and create sales tasks. - And more and more products are coming built on top of the DWH data. We need an additional layer to abstract raw technical events from business activities. Please let me know what your thoughts are. Does this make sense, or is it too abstract or too early? ![](/images/posts/what-is-beyond-event-data/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6f13c7a0-148c-4520-ac9b-fd06676a2279_1080x1080.png) Thanks for reading timo's substack! Subscribe for free to receive new posts and support my work.

The Go-To Metrics for Data Products: Start with These Few

Fri, 30 Jun 2023 00:00:00 GMT

This was first published on the ModernData101 substack as a guest post. But I wanted my subscribers to get this one as well. When you write an article, the nice thing is you get to set the scene so the whole story can play out. So here it goes: **#1** The notion of a “data product” is pretty new, so we don't talk about something that has been around for years or has a clear picture and definition. The phrase can mean multiple things; people will have various ideas or interpretations when they see it. **#2** Most definitions come from the [data mesh concept](https://martinfowler.com/articles/data-mesh-principles.html#DataAsAProduct) and are from a technical and architectural angle. This is interesting, but I need an important aspect: A product is only a product with users and impact. This is the angle I care about. ## What is a Data Product (by someone coming from Product Management)? The work of [Marty Cagan](https://www.linkedin.com/in/cagan/) heavily influences my product work. I was lucky enough to start my product career with a one-week workshop with him at Audible, and his workbook became the one thing lying next to my keyboard for years. He describes the building blocks of a product like this in this post: ![](/images/posts/the-go-to-metrics-for-data-products/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5686b0f0-4a3e-4ce9-b3ad-10bff5b1cbbf_1522x218.png) The emphasis is that it requires an equal focus on all three parts, not just one (like we are customer-centric). If you miss addressing one of them, your equation (= Product) results in Zero. This is an important aspect of setting the scene for this post. From what I read about data products, it is about technology. [Animesh](https://www.linkedin.com/in/anismiles/) wrote a good post covering a data product's technical aspects: [https://substack.com/profile/109344470-animesh-kumar](https://substack.com/profile/109344470-animesh-kumar) But you need to learn more about the Customer and the Business aspect regarding data products (if you have such links, please let me know in the comments). [Leave a comment](#ghost-comments-root) This post will not cover the foundations of both missing dimensions for data products. That’s something for a different post. Here, I want to look at the metrics angle for data products. But all three dimensions that make a product are essential to analyze the proper metrics. Therefore, I will pick each dimension and evaluate what metrics can help analyze it. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## On Internal Data Products Another thing to set. When you ask ten people what a data product is for them, you can expect eight different answers spanning from an API, a model, to a Mac Desktop app. In this post, I will focus on internal data products. So what is it? I use this (my definition) definition for the next 2.000 words: ![](/images/posts/the-go-to-metrics-for-data-products/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f6a68d9dd-b72b-4543-b04c-76a505e0c9bf_1440x230.png) My favorite example is this: you can have a dashboard with a marketing campaign report showing core metrics for each campaign for the selected time frame. This is not a data product. When you introduce 1-2 indicators for over and underperformance and 2-3 buttons to increase or decrease budget spending within the campaign data table - this is a first version of a data product because of the immediate action. ![](/images/posts/the-go-to-metrics-for-data-products/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f139ea9be-1140-4188-8e0e-04750be6529c_1908x492.png) Another example is that a global marketing team oversees the marketing activities and rollouts in 30 local markets. The local teams do all the operations, and the international team is there to help with experience and standards but is also responsible for enforcing the standards. They use a simple alerting system based on operations, marketing, and sales data triggered by different severity levels. All alerts have immediate actions built-in (e.g., create a ticket with a click). Enough of setting the scene - I hope you understand where we are going next. * * * ## The Customer Dimension The customers are internal teams for an internal product, and sometimes you start with one person. A completely different situation than an external product. But with plenty of opportunities. Even a small target group can apply the same measures we use in external products. We need to understand these aspects of our customers to build a customer-centric solution: ![](/images/posts/the-go-to-metrics-for-data-products/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f308a2eb7-cc15-42d3-9cf9-a55eb62c11f4_1252x670-png.jpg) - Deep understanding of the problem and how progress will look for our customers - Their resources and motivation to invest in a new solution - The severity of the problem in their daily work ### A deep understanding of the problem and how progress will look like Each problem has internal and external factors. The external ones are things our customers can't or can only hardly influence, like reporting for specific public services or investors. When you interview your customers, it is important to identify these external factors and make them transparent. You can never build a solution to change these. They are untouchable territory. The internal factors are things your customer can control - like providing an early version of an investor report to have enough time for adaptation and changes. These are the candidates you can solve with a data product. I usually use two approaches to identify these factors: Event storming and JTBD interviews. I did a [60m free course on how to use event storming for data setups](https://www.deepskydata.com/courses/event-storming-for-tracking-design). Both are quite similar, so I usually run a combination of both. Ultimately, it is mapping out a process on a timeline, understanding the actions and people involved, and emotional and social challenges. The result is a process map on a timeline with clusters of actions and problems. ![](/images/posts/the-go-to-metrics-for-data-products/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5ea5d80a-7f43-40aa-af56-225302b93c84_1674x588.png) _This is an example of a map with events._ **Metrics:** For internal products, event data is not helpful, but you can use 1-2 events to get a simple "we are using it" signal like a successful authentication. This can give you a monthly retention rate (how many users return after 1,2,3,… month). Or a minimal monthly active user metric. Better is proper product-market fit metrics. Not the NPS ([problems with NPS](https://hbr.org/2019/10/where-net-promoter-score-goes-wrong)). I like the [PMF survey that Superhuman is using](https://coda.io/@rahulvohra/superhuman-product-market-fit-engine). Here you can optimize for the share of very disappointed users. ### Their resources and motivation to invest in a new solution Let's be clear, you don't develop a new product, and people magically use it every day. Sometimes it works like this when you solve something that is deeply needed. But people must invest time learning the new products in all other cases. And this process is also in your hand; when you invest time to teach people and learn about their usage, the better the adoption will be. **Metrics:** Let's start with the number of new onboarded users (and their retention rates). ### The severity of the problem in their daily work This is more important for the ideation phase. Before you dive into developing something, just because you understood the problem, the internal function, and 1-2 real good ideas, doesn't mean that this effort has an impact. You can read about the next dimension, where we look at the business. * * * ## The Business Dimension This one is quite interesting in the context of an internal data product since you might not charge the other teams to use your data product (I know accounting and controlling can have weird ways, but let's assume not). But this dimension is still interesting for us. ![](/images/posts/the-go-to-metrics-for-data-products/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc7ea92ac-e98b-4c92-ac93-f59fca902e17_1428x792-png.jpg) I already mentioned it when writing about the severity of the problem that you will solve with a data product. We could also call it business impact, which gives us the business angle. But the business impact makes it challenging to start with. How to define and measure business impact? There are books written about it. Let's start with something in the context and more simple. It is taking the two forces of a business: Revenue or Cash Flow and Cost Savings. An internal data product can have an impact on one of them. An internal A/B testing product impacts the revenue, not a direct one, but one. This could be an automation that pauses or adapts online campaign budgets based on stock levels to reduce costs directly. If this is not possible to connect, you can still go for time saved. When a team has a task that takes them 15h a month - like preparing an investor's report and your internal product is reducing this time by ten hours. You achieve a significant time saving for a year of 120h. That gives you some budget to develop it. Let's say you can create it in 40h - this would be a 3x return on time. This also helps you to get an idea of how much investment makes sense to build. **Metrics:** - Saved costs, if you can attribute them, are great to communicate - Directly generating Revenue is even better - but often hard to tell - Saved time can be an easier start but still a good metric to communicate to other stakeholders * * * ## The Technology Dimension This dimension sounds easier at first glance but is the hardest to measure. How do you measure a product technology? You can measure technical consumption, like how many API requests have been served. But it doesn't tell us something. It is the same question as how to measure a data team's performance and output. Also not that easy. What is the number of tickets resolved? Not really a good measure. I like to use metrics derived from DevOps since they tell us something about how operational a team is working. It gives us indicators if a team needs help with architecture or workload. In the book, [Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations](https://itrevolution.com/product/accelerate/), Dr Nicole Forsgren, Jez Humble, and Gene Kim have analyzed strong-performing technology organizations and what sets them apart. And based on that, they came up with four core metrics: **The Metrics:** - **Cycle Time** The time to implement pipeline, model, and other changes for a new feature - from first commit to deployment - tells us how fast the team can create product improvements and therefore react to customer feedback. - **Deployment Frequency** The number of deployments in a week or month - it depends if this makes sense for an internal data product - a mature one. But a smaller one will have low deployment frequency but still be fine. - **Change Failure Rate** An interesting one. Again, the number of deployments that created a failure in production might only apply to mature data products. - **Mean Time to Recovery** I like this one in the data context. Here stuff breaks without us even touching things. Just because something upstream was changing, it can break things for us. This metric measures the time between reporting to restoring in production. If you want to spice up things and can measure them, you can use the first incident appearance as the start time since this also gives you feedback on how quickly you recognize problems. If you are starting, use cycle time and mean time to recover as a start. * * * ## Summary and Takeaways Even when the metrics for internal products are not obvious, and we might say: we build it but can't measure it, the post should show you that there are metrics that can help you. And most of them are quite easy to implement. The three dimensions of a product are essential. If your product has great technology and even solves a customer (internal team) problem, it can still be extremely irrelevant to the business. So please make sure you cover them all. And metrics are great to put into a quarterly or annual presentation where you show your team's work. I can tell you these slides look a lot better with stats like these: - 75% of our product users would be extremely disappointed when we discontinue it (these are seven teams and 34 people) - Of the 65 created users, 54% use it every month - This product saves time - based on our research, we save the users 145h every month - they can now allocate different tasks. - We keep the ball moving - we get around 2-3 feature requests monthly and implement them in an avg. of 12 days. I would love these slides. If you do, too, start to get some metrics! Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

Leaving product analytics

Mon, 05 Jun 2023 00:00:00 GMT

## The current situation: Amplitude, Mixpanel, and Heap are setting out to new offerings Last year Amplitude announced that they officially had left the product analytics space. [https://amplitude.com/blog/new-digital-analytics](https://amplitude.com/blog/new-digital-analytics) Did they call it like that? Of course not. Did they really leave it? Depends on the definition. So let's start there. * * * **What is product analytics (in a nutshell) -** an approach to understanding how users or accounts use a digital product? With a focus on feature usage, cohort analysis, and based on retention. Product analytics is based on event data that is sent when users or systems perform a specific action. (ok, that is really short). But it has the important ingredients that we need in the next steps. * * * **Amplitude** introduced several new features last year. But the major one was marketing attribution. It enables you to analyze the impact of a marketing campaign on a specific user (and therefore enables also the design of a marketing funnel that does not stop at signup). It also enables you to group campaigns in specific marketing channels (for me, a core asset in marketing analytics). Finally, it gives you different attribution models you can apply and use. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3eb25c3a-78d9-44d6-9620-1980d42877de_1858x890-png.jpg) This was a missing core feature for me in the past when I was talking with growth teams, since they need both parts: analyze the product usage performance and where the user came from and how many touchpoints they had during their lifetime. This use case is clearly beyond classic core product analytics, and it is something we usually call marketing analytics. So Amplitude offers both now, and you can combine them. **Mixpanel** introduced similar features just four weeks ago: https://mixpanel.com/blog/mixpanel-marketing-analytics/ [**Heap**](https://www.heap.io/), in comparison, did a realignment of their strategy at the same time as Amplitude but went off to a different direction. Heap's core distinctive feature was always the **auto-tracking of basically all interaction** in the front end. They sold it as product analytics without implementation. But it sounded easier on paper than in real life since the real juice is in the context that you put into properties, which often can't be read from the browser context. And Amplitude & Mixpanel, plus the league of analytics consulting, were not tired of preaching that auto-tracking is the work of evil forces. They were not totally right, but they were definitely loud enough. So Heap rethought their whole approach and found an interesting direction. On the one hand, due to the huge amount of automatically tracked data, they could discover usage patterns in the front end that you were unaware of. That is quite interesting. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f51d7eae0-80c8-4f13-86f0-eb12d4644a39_1844x868-png.jpg) Additionally, they combined event data with mouse tracking and survey data. Therefore enrich the analysis workbench significantly. They don't call themselves like that, but for me, they just created a new analytics category of user & customer experience analytics. More on that later. **Posthog** tried a broader approach from the start. Significantly later to the product analytics game, they tried two significantly different approaches. First, they target a different ICP: the product engineer. They do it by making Posthog more transparent (there is a minor open-source version) but making the whole setup more extendable with plugins. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fa19906a0-a23c-4898-b822-c2bfe96a3f3b_1930x260.png) And second, they call themselves the product OS. This mostly means that they offer a collection of useful things to improve your product that the other tools only offered as expensive add-ons (or not all): Experimentation, Mouse tracking, and group-level analytics. And we have new ones arriving at the shores with Kubit and Netspring. Kubit and Netspring are trying to solve something I have been waiting for for years now. Product analytics on top of your events in your data warehouse. No more weird data loadings and enrichment (where most of them never worked). And mostly, no two setups for classic BI and product analytics use cases. We spent some more time with this approach later. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5c825be9-c0b5-4d83-9f2b-603671086b90_2102x514.png) Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Some history of product analytics An interesting fact about Amplitude and Mixpanel: both started as **mobile analytics** solutions (to be fair Mixpanel offered both app and web but was very popular for apps). ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fdb94ed26-1f62-45c9-b939-4298e81bb358_2438x1028.png) Interestingly we have seen something similar in the past already. Some of you might remember that Amplitude and Mixpanel both started out as mobile analytics solutions. Over time, both also added a web SDK which could be used for websites and web applications. But there were no words about product analytics at all. Even when both tools already had the core ingredients you need for proper product analytics already: funnels, cohort retention analysis, and event data tracking. And mobile brought something else - a pretty consistent identifier: a core asset to product analytics work. ### **And then came Firebase Analytics** Firebase was initially just a backend database that was used for app development due to its easy real-time capabilities (quite impressive at that time). Then Google bought Firebase and positioned it as a backend service for mobile apps. This also included a brand new mobile analytics solution at that time called Firebase Analytics. And you got all for free (well, we know not really for free, but who cared at that time). ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fdabc508f-97a7-4dca-ae6d-4b4fca756366_1490x650.png) And Firebase Analytics (and Firebase) took the same path that Google Analytics went before—gaining market share quickly and significantly. Firebase analytics did not have as extensive features like Amplitude and Mixpanel, but for most people, it was sufficient (also similar to Google Analytics). So the whole market became more difficult. ### **And mobile never really caught fire for Analytics** But there was a different problem in the mobile analytics space, which will already foreshadow what we will see in the product space. #### Attribution problem The major use case for analytics for a website is for marketing and analyzing their campaign performance. This is also the core use case for analytics for a mobile app. But here it is more difficult. When you run a paid campaign for mobile app installs, there is always an App or Play store in between. Since you don't control it, you lose any campaign information in between. So by default, when someone clicks an ad, installs the app, you load the app and want to track where the user was coming from, you can't because there is no information. That creates a small selection of special analytics tools called mobile attribution tools. They remember the device id when someone clicks an ad and then check this device id against the one you send to them after an install and then tell you the campaign (that doesn't work so easily anymore after iOS 14.5, but even SKAdnetwork can be handled via these tools). So this becomes the essential tool for mobile analytics, at least for the marketing team. #### Product not data-driven? Now get to a mystery one. Throughout my data career, I tried to convince, motivate and push product teams to use event data as an input driver for making product decisions. Most of the time, I fail. To exaggerate: Product teams are the most quantitive data-avoiding teams I know. And to be honest. I don't really have an explanation for it. It is a post by itself to investigate it. So, for now, based on my experience and the experience of plenty of data people, getting product teams to work with product analytics is a tough job. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## So, why move on? Why do you move on with your product? Moving on mostly means that your ICP (ideal customer profile) changes over time. That is quite a natural process. You usually start with a small and narrow ICP definition because this makes the product design significantly easier. Then you will start to attract audiences beyond your ICP. And some of them might interest you because they have slightly different features but are still close enough to your core product. So moving on is a natural process and sometimes not really visible to customers. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fedbe4ce3-0754-4908-9af8-614e9dab3010_1874x1058-png.jpg) But what Amplitude and Mixpanel did is a visible move. Yes, you can still read product on their websites (which is important), but Amplitude now calls itself a digital analytics platform, and Mixpanel speaks of analytics for everyone. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3026d5c1-8361-432e-87e6-26b9a991e9d2_2760x638.png) ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f857c26a5-a8b1-4836-8991-e2da1efce179_2704x546.png) Let's have a look if some significant forces could have forced this evolution. ### Product teams are difficult customers, and Growth teams need more. We mentioned this already above; from my experience, product teams are the hardest to work with regarding analytics data. They are quite confident to work with qualitative data like interviews or surveys. And this might already be a part of the answer, which is why it is so hard to work with them. You can develop a good product by simply working only with qualitative data. The feedback cycles might be longer, but that does not have a bad impact at all. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fde3c9fd1-58e9-48d9-9400-cc2e396f2b99_1526x950-png.jpg) Event data in products serve two main jobs: fast feedback loops and unbiased feedback. **Fast feedback loop** means that you can get quick indicators (after hours or 1-2 days) of how a new feature is developing (usage data) and, after a week, how it impacts the general performance towards your goals. You can get similar data with qualitative measures but this usually needs more time and effort. **Unbiased feedback** because the data is collected while your users are in real use case situations. It's never a lab environment or any environment at all (what you have in a survey). This can give you perspective on the data that might differ slightly from your interview data. So event data is useful for product teams but most of them still struggle to make heavy use of it. Another reason: it is far more complicated than marketing analytics **Marketing analytics** is straightforward. You define your customer journey funnel - which is usually defined by 5-7 core events. You track them and ensure all marketing campaigns that link to your website have tracking parameters. And then, you analyze the campaigns across this funnel. Your main concern then is how you handle attribution (to be fair that can be a real pain). **In product analytics**, there is no simple funnel. Yes, you will also use the core customer journey funnel as a baseline since later steps usually happen within the product. But a core optimization metric for the product is how users return since most products nowadays are built on subscriptions. For this, you need to understand how cohort analysis works and this is quite a complex topic to get your head around But a new ICP is appearing on the horizon: **Growth teams** and even better ones focusing on **Product-led growth**. They have some roots in product development, but their main focus is on growing the user base and, finally, the subscription base of a product. To do this, they need fast feedback loops and are naturally more metrics-driven than classic product managers. But they need something that classic product analytics could not offer: marketing attribution. We cover this in a second. ### The Google Analytics opportunity And then there the Google Analytics opportunity, but maybe it was just a coincidence. Almost 14 months ago Google announced that it will sunset the Google Analytics universal version and forced all users to migrate to Google Analytics 4. This is not a "click a button" migration, it is a full "implement new code" migration. So it takes a lot effort to achieve it. And then, Google decided to make Google Analytics 4 a different product. The different data model was necessary, but they also significantly changed the UI and data modeling. This leads to and is still leading to a lot of confusion and frustration in the GA user audience. No one really knows for whom the solution is built now. Google Analytics was always the best out-of-the-box tool for marketing analytics. You can still do similar things but it does not feel out-of-the-box anymore. As reminder, here is my video about the weird Google Analytics 4 user strategy: There is an opportunity for products to offer an advanced version of Google Analytics with a consistent product design. Something in between Google Analytics and Adobe Analytics. Big GA accounts are paying 100k+ for their setup, so there is a budget available for different solutions since they have to migrate. And also, the deep Google Ads integration is not the biggest asset anymore since marketing channels were diversified more than 5 years ago. ### Analytics for all And we may see again a time when it makes more sense to have one analytics product for all teams. For quite some time, I recommended multi-product setups. Just because there were specific marketing needs (like marketing campaign attribution) and special product needs like cohort analysis that you couldn't get with one product. So we ended up with something like Segment or Rudderstack for event collection and then Google Analytics and Amplitude on top of this data. Not a bad solution but with plenty of trade-offs. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4fe7bbf0-784f-4b79-8f05-f632cb3222aa_1386x1192-png.jpg) Having all things about customer & product events in one product can enable users faster and prevent complex onboardings and trade-offs. Therefore there is a growing market for a one-for-all analytics solution. We get to this later, with new requirements are already coming up. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Where are Amplitude and Mixpanel going? I will describe what has changed based on Amplitude's implementation since this is the one I have tested and applied. So what has been added that makes it a different platform: ### Marketing campaign attribution Even when it sounds simple, this core feature was missing before. Because it is not so simple. It means two core concepts: A user can have multiple marketing touchpoints. These touchpoints are usually the user's information when accessing the website or application, like UTM parameters or referral data. When you look at a conversion goal like a subscription, you want to have a proper way how to attribute these to all the marketing touchpoints this account (yes, I meant account instead of user here) had before they signed up. This can help you to analyze what kind of communication has an impact on the customer journey. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f5b8238c6-89b8-4185-a8bd-4e4bce991da3_1772x1058-png.jpg) Something that is often overlooked is the definition of marketing channels. Analyzing on the campaign level can be tedious, especially when you run plenty of campaigns. Therefore you can group campaigns into specific channels. You choose these channels based on similar kinds of users and behavior you might get. So it makes total sense to group remarketing campaigns into a remarketing channel even when they are on search because they usually have different performance than normal search campaigns. The same goes for brand campaigns. ### Data tables Maybe the most overlooked new feature. But it is such a powerful one. Marketing campaign performance reports are usually not really some charts; the real juice is a big table with all campaigns, the funnel performance, attributed conversions, and a potential cost/revenue relation, and this, in the best case, over time. Think of it as a huge pivot table. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3bac14d0-1357-4c27-b09b-3361186f5f84_1256x628-png.jpg) This kind of feature was missing in Amplitude (and, to be honest, also from GA - yes, there are custom reports, but this is a light version of what I am talking about). The new data tables are extremely powerful and not only for marketing campaign analysis. This would be a new post to describe them. ### Cost attribution When discussing marketing conversion attribution, you will next discuss cost attribution. In the end, you spent plenty of money on marketing campaigns, and based on the conversion performance, you might like to establish an ROI or a cost/revenue metric to see at one glance if a campaign is problematic and needs attention. Getting cost data into Amplitude before was not possible and it was always a step to export Amplitude data and then hook it up with cost data. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd2e4b98b-95f4-45dc-8486-f7d3f76d4434_1660x678-png.jpg) Now you can import specific marketing cost data automatically into Amplitude and work with it. These features now enable growth teams to work with a tool like Amplitude since it provides the missing pieces of the former puzzle. But it also makes Amplitude a pro tool for marketing analysts since Amplitudes funnel and cohort analysis and even data tables are more powerful than the Google Analytics version. ### E-Commerce Power Analytics? The interesting part is what Amplitude can do in the E-Commerce analytics space. E-commerce analytics was always an enhanced marketing analytics use case for me. And again Google Analytics was always the best out-of-the-box solution for it. E-Commerce analytics requires all the possibilities we described above but additionally has requirements towards the data model. One is properly handling object arrays since you usually have multiple products in a cart. Other is specific product attributes that behave like user attributes but are bound to a product. And you might like to have an internal attribution for onsite merch campaigns. You also need a product perspective. Tools like Amplitude have by nature a user perspective. But you want to analyse how a product is performing across the funnel. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffda847d3-9511-4670-9fd1-86db07d5beb8_1494x646-png.jpg) With Cart Analysis, Amplitude is taking a first step into e-commerce-analytics. I haven't tested it yet, therefore can't tell how extensively it already works. [https://amplitude.com/blog/cart-analysis](https://amplitude.com/blog/cart-analysis) ## Where is Heap going? Entering UX & CX analytics. Heap took a slightly different direction. Because Heap's core asset was always the auto-collection of events, they have an interesting asset. Additionally, they extended their product with acquisition and added two essential things: AI-based insights and Session Replays. The AI-based insights can surface interesting user patterns you were unaware of (since they capture every click or touch). One example is that you set up a classic funnel and Heap will recommend you to add or remove specific steps because of the additional data they have collected. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff914f91b-53c9-4916-b6dc-f3ba1eda2737_974x448-png.jpg) Session replays are also undervalued. When I worked in product, my data stack was always at least a product analytics tool and session replay tool. I then had to make sure that ids are exchanged between both so that I could make connections. The workflow was often like this: I discovered an interesting set of users (like some dropping off in a funnel), I went to the user explorer, picked some user ids, went to the session replay tool, looked for these sessions, and often got interesting hints what was causing the drop off by watching the replay. In Heap you can now do this just with one click. This is extremely powerful. You can track events in Heap from all different places (like from your backend) but with their strength just described here, they are a tool in a new kind of category which I call **User Experience analytics (or just UX Analytics)**, which is frontend-centric. And it is a powerful and important one. I would say UX designers are the most under-served group with qualitative data. And from my project experience, they are all eager to get more data. I see shiny eyes when I show them how features are used in detail and how a session replay discovers a flaw in user interaction. ## What awaits us with the new arrivals? - Product analytics on your cloud data warehouse I will focus here on [Kubit](https://kubit.ai/) since I only have experience with this product. There was a constant itch for me when I worked on data setups. We were defining data setups where we already knew that there would be a time when we have to move on. We always were running at some time in use cases where we had external data that we would need to push into Amplitude. But this was not an easy process and sometimes not possible. So we pulled some data from Amplitude into the data warehouse and ran some reports and analyses there. This brought us two systems to work and analyze the data. I was not really happy with it. But there was not really a way around it. But one thing kept spinning in my head: why can't I get a product analytics tool that works just on my data warehouse event data? I wrote about why you can't use BI tools to work with event data here: [Why product analytics is completely different than BI](https://hipsterdatastack.substack.com/p/why-product-analytics-is-completely) — Everyone in data has a unique origin story that brought them into data. At least I never met someone who didn’t have one. My origin story is a classic one (at least I heard it plenty of times from others as well): I was working as a product manager for a marketplace. And my backlog was basically the management team. They assembled once a week to discuss … Then I discovered Kubit. And they promised exactly what I wanted. Event analytics on top of my data warehouse data. And for me, **this is the future I want to live in**. But as a disclaimer, it is not yet the world for everyone. If you don't have a data team that can load, transform, and prepares the data for Kubit, this is not your setup yet. And it is not as deep as Amplitude is when it comes to the analysis features. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fbe431452-f52e-4eba-aa4f-789b997a18ce_1468x892.png) This is not even just product analytics on top of your data warehouse. When we project that approach to the near future, I call it technically **event analytics** or, more sophisticated: business and product activities platform - or many just **activities analytics**. Due to the data warehouse, I can collect and load events from anywhere: my frontends (maybe less), my backend applications (maybe more), directly from databases (CDC), directly from third-party apps via webhooks or I can connect to a data pool of a tool I use and load the events directly from there. Based on all these different events, we then model activities by selection, transformation, merging or enrichment of the event data. And this data is then used by Kubit to analyze it: **All use cases, no limits**. ## An outlook It is nice to see that a category that was quite static for some years is opening up and changing quite drastically. Are these changes good news? I think yes. We get features that were missing for quite some time and therefore, there are more use cases we can do. Amplitude manages the new features in a way that is not overwhelming for beginners since they are clearly pro features. Kubit is opening up a new category for mostly data teams who don't want to sync their data to an external tool for analysis. But new horizons and similar problems: **Event data collection** - But we still need event data. So the usual problems with collecting and tracking these stay the same. Data exhaust in, no insights out. One of the reasons why I decided to rename the book I am currently writing from "how to fix your tracking" to "how to collect and track event data". Because I strongly believe that event data will become even more important in the future. **Making sense of data** - Just because we have more ways to analyse the data doesn't mean we get more value. The people working with the tools still have to generate this value for the business and product. The pros will get more tooling now to deliver new insights and you still have to train people to become pro - no changes here. ![](/images/posts/leaving-product-analytics/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7567ed5c-3071-41dc-9949-2a23fd882565_1080x1080.png)

The extensive guide for Server-Side Tracking

Tue, 16 May 2023 00:00:00 GMT

I would love to learn if more people had a server-side awakening event. Mine was over six years. It was not my first time sending events from a backend, and I did this in projects before, mostly sending refund events to Google Analytics. But this time, it was a paradigm shift and a good example of why developers should always play a core and active role in any tracking project. The project was to implement a new tracking setup for a product with four different platforms (web, iOS, Android, Windows - yes, that was a thing), and a Windows desktop app was in development. Because of that, the development team was not extremely happy about the implementation, but they indeed wanted the data. We were planning to implement Mixpanel on all of these frontends. As usual, we started with an agnostic design since the events were similar across all platforms. And when we talked about the implementation, we immediately dived into how the different developers can add the tracking SDKs to the platform they support. When one developer had a question: "Is there an SDK we can use in our Python application" - I said, yes, there is one. "Why do you ask? Do you have specific events only accessible in the backend?". "No, I was thinking about something else. We have a central API serving all platforms, and most actions need an API call." "Would it be possible to implement the tracking in the API layer using the Mixpanel Python SDK - then we could get 90% of all events covered and implement them just once." This was my server-side moment. I always had the tools before me but was missing the essential concept. Server-side before was just an extension of frontend tracking for special events. But I never thought about it as the leading setup. And the implementation of the project worked well. There were specific things we had to test and refine (we will get to that later). But in general, the implementation time was naturally much faster, and the data quality was significantly better (also not a big surprise). From then on, I would do all setups on the server side. Unfortunately not. Why is that? Let's have a look. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Why don't we use server-side tracking in all projects? There are different reasons for that. Some are obvious, some not so much, and we start with a not-obvious one. ### Tracking designs are done by clicking through the app. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f2e464345-8c54-44f6-8e31-d1c2eb18c4bb_1422x842.png) Most tracking plans are defined and created by opening up the application or website you want to track and describing all the core actions happening in that frontend user journey. This will end up in a tracking plan very close to the frontend application; therefore, the tracking implementation will be too. My approach is different. I create the first tracking design by ignoring the application at all. We define the typical user journeys from how they can start (some models have different ways), then cover the essential value or aha moments and core functions of the app until the monetization. By that, we have an agnostic tracking design. This enables us to make implementation decisions after we have made the design and are not too close to the frontend implementation. And we most likely will have discovered important events but not visible on the front end. ### Development is not really involved in the process. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f49f8992f-626c-46e3-ab6a-1a91acf7b6b4_1448x982.png) Unfortunately, in most tracking projects, the tracking plan is created in Product, Marketing, or Growth teams and then put into a ticket and thrown over the fence by the development team. Usually, without any context, the tracking plan and a link to the docs - that is it. So development teams then implement it quickly, add the frontend SDK and are done. The development team will be responsible for the implementation approach when you involve the development team from the first meeting and work together to define the goals you want to achieve with the tracking. Interestingly, in these setups, we usually end up with server-side implementation. ### The architecture is not built for it. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f3cb2fa49-93fb-41e4-a4b4-990af79f1981_1000x626-png.jpg) Some architectures make server-side tracking a no-brainer since you have a central place to add the tracking. But not every setup is like that. When your applications are severed from very different systems, orchestrating a server-side approach might be much more complicated. It can still be an option to see if at least 50-60% can be done via server-side tracking and the rest via the front end. I hope these three reasons make sense to you. There might be more, but these are the ones I usually encounter. But I did one trick so far, and I was writing everything in a way that server-side tracking is superior to frontend tracking. Let's have a look to see if that is really the case. ## Why server-side tracking is the better option ### The limits of frontend tracking In the classic analytics setup, tracking was initially implemented in the front end (not totally true - there was log file analysis first, but let's not go back there). There are multiple reasons for that: \- In an anonymous environment, you get plenty of user context for free since the tracking is running on their systems \- It's much easier to develop just one tracking SDK in Javascript. Backends can be in plenty of languages. \- Old types of architecture made it easier to get the data where it is rendered and send it off to the analytics endpoints But the browser or the mobile device always had and have their problems. Most of all, you don't control them. The browser is running on the client's system. In this case, you depend on two players: the browser vendor and the user using the browser. The first one has the most influence and will always be in a browser vendor's interest to control tracking. Some have the policy to allow as much tracking as possible, and some counter that with the opposite. The user can control it by explicitly blocking tracking, but this requires some degree of knowledge. It is the operating system vendor for mobile devices since the systems work more tightly together. If one vendor sees privacy as a core asset in their offer, tracking will be restricted. And then it is the usage itself. Especially browsers are not built for 100% tracked events. You can't use a database to batch events as you do on a mobile device and make a guaranteed call to the tracking service. And the tracking service can't return an error when there are issues since the front end can't handle it. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8a9c0d83-64ab-4b4c-b61b-4f352618ef2a_962x774-png.jpg) Let's keep one thing in mind - you need as much control over the pipeline as possible regarding data quality. If data quality is important, with frontend tracking, you have one essential part in your pipeline you don't control. ### Take control in your environment. Speaking about control. Server-side tracking runs on your servers. Or at least in an environment that you control. Control is the essential difference. You don't rely on a third party to determine how the tracking events are triggered and if they can be delivered successfully. You can also include the tracking calls in unit tests more quickly than in the front end. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffc70663a-190d-439b-a22d-dff8533bb963_940x778-png.jpg) But things are missing compared to the frontend tracking. As mentioned, frontend tracking gathers much information from the browser (like device type, IP address, screen size, and operating system). This context information can be helpful in specific analyses. This data is missing in server-side tracking, and you can only use the context your application provides. And you need an identifier that you can provide with each server-side tracking event. This is usually the user id that is present for all application actions. You don't have a quick way to use anonymous identification, like in a browser or device. But you can work in a hybrid approach where you track 1-2 events in the front end, use a user-id, and then follow the rest on the server side using the same user-id. By that, both pieces of information can be stitched together later. ## How to implement a server-side tracking ### Close to the API layer As described in my initial story, if you have a global API layer where most user actions trigger a call to this API, you can implement the tracking close to it. The benefit of an API layer is that it already abstracts and provides most of the relevant information for tracking. Usually, all API requests require user information (usually by an auth token), which will then determine the user id. Then you can pull context information from the POST or PUT requests when things are created or updated. You can use the provided payload as tracking properties when someone requests information. ### Close to the stream Suppose you use a Kappa architecture and therefore have a streaming layer that will get events from application services and other application services that can pull and process them. In that case, you are as close as possible you can get to server-side tracking. In many cases, the tracking is another service subscribing to the stream. There might be some whitelist filtering to control which events you will pass on. I did two projects where the team already used stream technology to publish and subscribe to events, and the initial tracking implementation was done in a short time. One challenge can be the stream data itself. If the data is lean and uses ids heavily, you might need to enhance the application that handles the events with additional API or database requests to get all the context. ### Close to the database Similar to the stream approach. You can listen to database changes. This is one of the most common approaches; it is usually not called tracking at all. The technique you will use is called Change Data Capture, and different databases offer this out of the box. There are different types of CDC and different methods to ensure, for example, a transaction(outbox pattern). As a different approach to CDC, some newer database systems offer webhooks (Supabase or Fauna). You can define a webhook triggered when a specific database operation has happened, and an application receives it and then sends it to the analytics system. This brings us to the subsequent implementation. ### Close to third-party applications Specific parts of an application are not handled within the application but by using third-party applications. The best example is Stripe, where you run all your subscription and payment processes. But these events are also essential for your analysis. In this case, you have two ways to get these events: \- via webhooks. A lot of third-party services are offering webhooks that are triggered by specific actions. You can build an endpoint to receive these events (sometimes, you already have that for your applications) and transform them into true tracking events. \- via batch load. Many third-party services have an API where you can pull the relevant application for you. From there, you sometimes get an immutable event log (Stripe offers this, for example) or snapshot data that enables you to derive events from that (a subscription has a created\_at date, a last\_renewal date,...). The batch load has the benefit that you are not responsible for ensuring 100% of receiving and delivering the events since you pull the result. But sometimes, the API data is not transferable in proper event data. Webhooks are events by nature. ### Close to your application code When none of the above methods are possible, you must add the tracking code where the action is handled in your application code. You might abstract this into a specific module or class to make it easier to update tracking metadata (or when you want to switch the analytics service). This approach definitely works, but it requires more implementation and leads to very spread-out tracking and, therefore, can become a bit of a maintenance nightmare. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffc2e79ba-cdf3-42d9-a4d4-973d4bf0f5ff_3456x1728-png.jpg) [Check out the book](https://timodechau.com/book) ## How to combine frontend and server-side tracking In the end, not so complicated. The essential part is that both need the same identifier. When users are logged in, it is straightforward since you can use the user id in both environments. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f072d233a-0e6c-494f-bb5f-7f5e79c19d8e_2130x774-png.jpg) The difficulties start when no user identification is possible, mainly on the front end. A popular example is a typical Saas flow. A user arrives with marketing source information (campaign URL parameters, referrer) and is recognized by a generated id that is then put into a cookie. All events are sent with this anonymous id. When this user signs up and gets a user id, the anonymous id and user id are sent together at least once. By this, they can be stitched together later. All product analytics tools do this automatically when you provide both ids. If you don't want to generate an anonymous id in a cookie, you can also pass the relevant marketing information as URL parameters until a user signs up. ## How do I implement tracking in 2023 If you can, on the server side. It makes the implementation, in most cases, easier and more robust. But first, by involving the development team in the tracking data project from day one. They usually have a good sense of what implementation makes sense, and from my experience, they tend to implement it on the server side. And they know if the architecture on the server side is too complex for a simple server-side tracking implementation; therefore, a frontend implementation is more straightforward. A hint at the end. If you, while reading here, thought about server-side tag manager as a server-side tracking solution. It's not. They share the name, but that's it. The server-side tag manager is just a tracking proxy that runs on your server, and tracking server-side is something completely different. ![](/images/posts/the-extensive-guide-for-server-side/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8c7daa3d-0e32-433f-a2a7-8af118a9eb1a_1080x1080.png) Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

Ways to solve the data user identity & privacy crisis

Mon, 08 May 2023 00:00:00 GMT

Plenty of things made me fall in love with Kissmetrics when it came out - but one feature made it irresistible for me: The user explorer. ![](/images/posts/ways-to-solve-the-data-user-identity/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f940a2a50-8d4b-485c-accd-78c69aec11f9_1176x938.png) _The Wayback Machine is great when looking for some old feature descriptions._ [_https://web.archive.org/web/20140228094856/https://www.kissmetrics.com/features_](https://web.archive.org/web/20140228094856/https://www.kissmetrics.com/features) In an aggregated world of analytics data, this report looked weird since it showed event sequences on a user level. But for me, it was like a treasure chest. I picked specific segments - like the ones who dropped off at a particular step of the funnel and then checked 20-30 individual user explorer profiles. And I could find patterns and ideas for deeper investigations just by the sequence of events and user properties. That made me also fall in love with the user id, which was required for good user explorer reports. So I still prefer setups where I can get a user id when users have to log in. But data privacy is challenging this concept. So, do we always need a strong identifier like a user id, or can other approaches also work? Let's investigate this. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Identifying a user as a global proxy Why do we try to identify a "user" at all? To get these event sequences by one person because we hope the sequence can tell us more than all the isolated events alone. We get to the sequence in a second. But let's look at the different ways to identify a "user"“ And yes, there is a reason why I use ""“around a "user." ### Ways to identify a user ![](/images/posts/ways-to-solve-the-data-user-identity/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f207c01de-0678-4ccd-9164-63f4d7aa47a0_1416x764.png) **By IP-Address** In log file analysis, the IP was the identifier. We could analyze a sequence of HTTP requests for one IP-Adress to see how one "user" is browsing a website. But how unique is an IP address - well, it has limits. If you have multiple people in your local network, you can appear with the same public IP address (at least what my 10m of internet research told me). And IP-addresses are usually dynamic, so they change within a day or two. So a short sequence within an hour is pretty likely unique but not a user journey that spans over days or weeks. **By cookie value** When we got the first Javascript trackers, we got a session, client, or user ids saved in a cookie. The setup is simple. A person opens a website, the script checks if there is an id in the cookie; if yes, it uses this one for its tracking; or no, it creates a new one and saves it in the cookie. Pretty simple. How unique is it? If the user stays on the same device and does not delete the cookies pretty consistent and unique. In the early days of the internet not a huge problem. But with mobile phones and multi-platform, significantly more of a problem. And with tracking consent even more, every identifier saved in a cookie or local storage potentially needs positive consent. **By device id** With mobile phones, we got something more persistent than cookies - the device id. This can be an idea the operating system provides, like the IDFV, or a generated id stored on the device. How unique is it? For a mobile device is pretty unique. Suppose the tracking uses an operating system identifier like the IDFV (and you don't offer multiple apps - the IDFV is similar for each app by one developer and the user's device). In that case, you don't even set a unique identifier. But it might still be something you can only use if you have the user's consent (https://mobiledevmemo.com/french-privacy-watchdog-to-voodoo-games-use-of-the-idfv-requires-consent/) By login and user-id When a user signs into an application, the backend system usually returns a unique user-id for this account. This makes it possible to track users across multiple devices and platforms. How unique is it? As unique as it gets. Since the users proactively identify themselves in the application, you can be sure that this user is really a user. Does this need user consent? Yes, at least something you should check with your legal advisors. ### Why do we want to identify a user? We don't need any identifier if we are interested in how many signups we have or how many new tasks have been created. The event count itself is sufficient. ![](/images/posts/ways-to-solve-the-data-user-identity/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fa1348b43-228d-473e-ab57-ac6864001d8b_1598x1348-png.jpg) We need an identifier when: \- we want to see if users are getting to the point where they get value from our application. This will always be some funnel - like account created -> website published \- we want to see if our application sticks with our users. Suppose they come back over days, weeks, and months. Classic retention analysis works with consistent unique identifiers (like user ids). These two use cases are the core cases for any product analytics work. Is product analytics then even possible without a unique user id? We get to this in a second. Let's look at marketing analytics. If we want to analyze a classic e-commerce funnel, we could do it without an identifier, assuming the usual buyer's journey happens in 30-60m. We could look at the number of events for each step - from cart to order. Problematic are events that are happening multiple times per user - like product views. We need an identifier to know which initial marketing campaign led to an order. The information about the traffic source is usually only present with the first event/pageview, and an identifier allows us to apply the traffic source to all the following events. So we need identifiers to do more advanced marketing analytics. But let's ask the obvious question? Do we need a user identifier? ### The user proxy We use the concept of a user (ip, cookie, or user id) mainly because it was at hand when we were starting. And because it was as granular as possible, many different types of analysis could be just derived from it. The user identifier is a catch-all approach. When we can track a unique user identifier, we can build anything from it since it is the most individual and granular identifier possible. I am not legal, so I will not examine this from a legal standpoint. But let's look at it from a simple privacy-by-design perspective. When I collect data in the way I need it for my analysis, I don't default to the identifier with the most privacy impact, the user id. Instead, I look at what kind of aggregated identifier is sufficient for the analysis. ## What are different aggregated identifiers? Let's develop different identifiers. Identifiers that are on an aggregated level and see what we can do with them. ### Account or Team ID What is the difference between an account and a user id? In an application where one user = one account no difference at all. But we have an interesting new identifier in a typical B2B application where an account can have plenty of users or even different teams. In a B2B Saas use case, we are even more interested in account performance than single-user performance. Because the account performance, in general, tells us if this account might convert to a subscription (which is on the account level) or if it is likely to churn. If we need more details about the different types of users in an account, we can introduce a role property that extends the data by tracking which role in this account has triggered the event. When we have huge accounts (meaning 100+ users in an account), it can make sense to break it down to teams if this is a feature in the application. Limitations: An account-based identifier comes close to a user identifier when we are in a B2B application with accounts and multiple users in that account. Using a user ID identifier is problematic in these scenarios since the account dimension is missing. Classic product analytics tools like Amplitude or Mixpanel usually track this with the group identity feature. ### User group ID This is a variation of the account id approach for cases with no account level, and you basically group users based on different criteria. At deepskydata, we ask members, when they sign up, about their experience level, job title (from a list), and where they heard about us. We can combine these criteria into one id based on every permutation. By that, we choose an aggregation level that will be broad enough for privacy but close enough to analyze correctly. Limitations: A classic problem of aggregated identifiers is that avg. The time between funnel steps can be off since it lacks individual and granular data. One way to solve it is to extend the group criteria with the signup date, and this will also enable a proper retention analysis. ### Content ID Let's assume you have a content website. With 100-1000 different content assets, you are primarily interested in individual content performance. Here we can use the content id as an identifier to track core content usage events. Limitations: The time between funnel steps will be missing. Same for content retention. ### Campaign ID Let's assume you have a marketing website for a product. You are primarily interested in how a specific campaign performs on the customer journey. Therefore you can use the campaign id as an identifier and track all customer journey events. This gives you a proper customer journey funnel broken down by the different campaigns. Limitations: Same as for the content id. Avg. time between funnel steps and retention analysis is missing. The account and user group id are good examples of choosing an aggregation level of users. Still, they can do everything we usually do in product analytics (funnels and cohorts). The content or campaign id is a good example where we choose a different entity we focus on and use it as an identifier because we are foremost interested in the performance of the content or the campaign and don't care about the individual user. > I am currently writing my first workbook: : How to fix tracking. > > Preorder it now for just 35 USD. [Get my upcoming book](https://timodechau.com/book) ## How to track different identifiers? All good product analytics products have a group feature for the account id approach. But be careful. In most of them, it is a paid add-on, so ask for the price. In cloud-warehouse native tools like Kubit, the account is just a different schema you can use for analysis. If you use the aggregated user id or something like the content id, you need to use the user id property for this. You can use any value as a user id in a product analytics tool, usually done in an identify call. If you want to make sure to respect the user privacy and don't do any user-level tracking, you need to make sure that you: \- disable cookie tracking (check the docs, for most of the tools, it is possible, but not for all of them) \- provide the user id with each event call You can even go further when using Snowplow for event tracking. Snowplow has the concept of custom contexts, and you define which context is relevant for an event and provide it accordingly. So for a "content read" event, you can have an account context (if a user is logged into an account), content context (with content name, id,...), consent context (for what the data can be used), experiment context (which test variant is visible). But the user tracking itself is set to be anonymous and therefore doesn't generate any user-level data. On my list is to experiment more with the aggregated user-id approach to introduce a level of user data that respects the individual but still is powerful enough for funnel and retention analysis. ![](/images/posts/ways-to-solve-the-data-user-identity/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc25479a3-9ab9-4345-aa07-132a1df8d824_1080x1080.png) Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. **"how to use this content disclaimer."** After I wrote about taking every bit of data content with the correct grain of salt, I used a disclaimer under each post to indicate how to use it. Some concepts in this post are theoretical (the aggregated user id). I have implemented some ideas successfully (account id pretty often, content id, and campaign id in one project each).

Here is your stack of salt for reading or watching data content

Tue, 11 Apr 2023 00:00:00 GMT

Hey, welcome back. Maybe you have already read some of my content, or it is the first time. If you have read/watched something I have written or recorded - thank you for keep reading it. But here is a warning - I might have **_influenced you_**. So let's hope it was for a good cause if I did. First of all, it is a great thing when people share: \- learnings \- thoughts \- rants \- tutorials \- reviews about data setups, methods, frameworks, or projects. Only when we are open to sharing our knowledge, **we make data benefits accessible** to as many people as possible. This, at least, is my ideological agenda for why I share data content. Let's call it **the rational mind and goal**. We will get back to it. But humans are complex; we rarely act based on one clear, rational goal. **We have a collection of agendas** when we do things. And so is everyone who is creating data content. Myself included. In the following, I am trying to give you different grains of salt that you can apply when you read a text to add perspective to a piece of content. With that, you can better put content into the proper context. Which doesn't remove any value from the content - even better, it adds value. To make it easier to read, I will use myself as an example and share my different agendas influencing my writing and recording with you. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Agenda 1 - what do I sell? ![](/images/posts/here-is-your-stack-of-salt-for-reading/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc1fe4fb5-54cc-48e3-95be-bd3ed88af935_1542x694.png) If you want to add just one context layer to a piece of content - that is the one. Check the author's job and company. Here is mine: I am the founder of [Deepskydata](https://www.deepskydata.com/courses) - now the tricky part because it is in flux at the moment: \- Deepskydata (from the website) is a platform that offers free data videos \- Deepskydata (from network knowledge) is offering consulting and freelance services around tracking, analytics, and data engineering The second one is easier - I produce content to find and build trust for future and existing clients. You will rarely find any call to action in my content to my consulting services, but I can tell that the content helps me a lot during an acquisition process of a new project (leads who know my content are more likely to convert). The grain of salt is that I write this kind of content to position myself in a particular way for future projects. To some true degree - I usually write about things that I already do as projects or that I want to do more in the future. So any founder creating content is doing this to some degree that it positively impacts their product or service. And this is fine if the content is about: \- implementation examples \- extensive description of projects (extensive - not just showing the sunny sides) \- learnings from their product or business Be careful when: \- they mention their competition by name and put their product above them \- they use provocative thoughts or classic clickbait - this is often aimed to set their product and service apart from the "usual" way - it can make sense, but read it with plenty of grains Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Agenda 2: how deep is the knowledge? ![](/images/posts/here-is-your-stack-of-salt-for-reading/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f52e618ce-c92e-432f-95a7-9dd859ccb845_1566x630.png) This is much more tricky to find out. Sometimes people ask me how to test and write about many different products. The answer is simple because my depth of knowledge varies significantly between them. I have ~5 products that I know on intense levels. The reason is simple; I have worked with them for years and on multiple projects. My knowledge is based on experience (and the extra effort to test out edge behaviors and ideas). Then plenty of products have enough similarities with the five products I know deeply. I have a deep understanding of Amplitude, which also gives me a very good knowledge of Mixpanel, Heap, Posthog, and new kids on the block like Kubit and Netspring. Why? Because the underlying principles of product analytics are similar across these solutions. Details and focus topics are different, but I can start from level 30 to learn about these. When I write content about a specific product or approach, I only do that when I have some experience. Some experience means, as a minimum for me, I did an implementation already end to end. Not always for a production environment - often, I use sandbox environments. When I write about thoughts or ideas, it is based on plenty of experiences, but the connection is not always visible. I will try to make it more transparent in the future about what foundation has led to the thoughts and ideas to make it clearer for the reader. Good signs of knowledge: \- code samples (beyond simple hello worlds) - someone shows how they implement things \- understandable concept drawings - to create a concept drawing, it requires quite some effort to drill it down to a simple visualization \- a clear description of a setup and steps/scopes of work Pick your grains when: \- when people skip complex steps by mentioning them with 3-4 words - like "then you model the data" - spotting these complicated steps is harder if this is a new topic. But I use this method - when you get ??? floating around your head when you read a simple sentence - the author likely has underplayed the complexity. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Agenda 3: What are their background and history? ![](/images/posts/here-is-your-stack-of-salt-for-reading/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f27a0811a-1dc7-4494-bf6e-10673ba4cc90_1470x656.png) This sometimes can also be not so easy to figure out. The background shapes a lot about which products and approaches people write about. Because most of the time, we write based on our experiences. Take me as an example: I usually worked for smaller startups, so my tracking, analytics, and data engineering background is based on smaller and leaner setups. I never worked for an enterprise with 1000 different data sources that needed integration, and I never worked in an actual streaming environment. So if you read my content and think: Does this work in a big enterprise organization, you are right - I can't tell you because I don't know. You have to make the transfer yourself. And this also goes in the other direction. Plenty of data content is influenced by companies and people who worked in FAANG or similar environments. I don't know why these groups have a high content output, but they have. The problem is that their approaches worked well for them in their setups (and they are remarkable based on the scalability and streaming parts), but they are over-engineered for the rest of us. Things to watch out for: \- high focus on open source solutions (there needs to be someone to keep them up and running) \- complex architecture diagrams \- anything about streaming (Kafka, Flink) or heavy data transformations (Spark) This content is still interesting to read but more as a guide for a future state where you have the same problems as high-scaled companies. And not as a blueprint for your next year's planning. With these three types of a grain of salt applied to data content, you will get a lot more from the content you read or watch about data. ## To finish: What is my agenda when writing this: As noted above - my idealistic view is: Data content can help: \- to provide you with real solutions to your implementations - giving you a shortcut \- provide you with plenty of contexts, so you can adapt solutions to your setup and learn the mechanics \- broaden your views with new approaches and solutions So I get easily triggered by content: \- that is mainly written for awareness (like XY is dead/broken) but is not providing any clear value besides useless rants \- that is practically relevant for a small audience but six sizes too big for everyone else. So I was trying to write a small but practical guide for consuming data content. I hope this helps. Let me know if you have any different approaches to data content.

Tracking, Measurement, Collection, Creation — again?

Sun, 26 Mar 2023 00:00:00 GMT

Roughly two years ago, I wanted to do something that any business mentor recommended. Specialize on something and create a category where people can recognize you easily since the generic space is too crowded. I did anything across the data stack at that time: from the collection to pipelines, modeling, and monitoring. And with no surprise, everything was at a good enough quality level, which left me pretty unhappy. To find the special niche, I checked on two sides: What do I really like to do, and what is in constant need in all projects I have done so far? And it was easier than I thought: I enjoy working where we create the data - this can be in frontends, backends, or webhooks. Just the place where something happens in the application, and we say: "Hey, that is interesting. Would it be ok when we pipe this down to an analytical system". And luckily, this was (and is) also one of the topics that come up in any project (in different severity levels). I have my niche, but how do I name it? That was a lot harder than finding the niche. And it still is. And this opens up the topic for this post. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. Let's start with a technical definition: "When a specific action happens in an application (platform-agnostic), it can trigger an event, and we can send this event data to an endpoint where we store it for the analytical purpose (the purposes could also be automation, ML models,...)". Quite too mouthful for a landing page headline. At that time, I tested with potential clients, which term let them immediately understand what I am offering. And it turns out there is one term that does precisely this: ### TRACKING DATA Well, it was not something to open up a bottle of champagne. It's great that people immediately understand it (Do we need to fix your tracking - hell, yes, when can we start). But not so great when the term is a bad reputation. But let's start there with our journey of definitions: "Web tracking is the practice by which operators of websites and third parties collect, store and share information about visitor's activities on the World Wide Web" - Wikipedia ([https://en.wikipedia.org/wiki/Web\_tracking](https://en.wikipedia.org/wiki/Web_tracking)) And Matthew Brand brought up an etymologic reference when we discussed on [LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7043485742760611840?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7043485742760611840%2C7043521120834068480%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A7043485742760611840%2C7043623411239018496%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287043521120834068480%2Curn%3Ali%3Aactivity%3A7043485742760611840%29&dashReplyUrn=urn%3Ali%3Afsd_comment%3A%287043623411239018496%2Curn%3Ali%3Aactivity%3A7043485742760611840%29): ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fcf4e7110-f814-4de6-bcf0-3d979b0230ea_1022x498.png) Taking the initial "trac" and the Wikipedia definition - gives us an excellent start to why tracking has a hard edge. We follow an individual digitally and collect their foodprints (or better fingerprints) along the way and build (maybe even unintentionally) a profile. We intend to keep following and see where it goes, and we stop when a goal has been reached that satisfies us (the famous conversion) or when we lose track. Tracking on this level is not so far away from stalking, isn't it? ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7a3b9ef5-fce2-445a-a9c1-cdad711aeabd_912x836-png.jpg) Why do people immediately understand when I mention tracking what service I offer? For historical reasons, I guess. At least as long as I can think, tracking and analytics always went hand in hand. With tracking pointing to the marketing part where you tracked the performance of ads. And here, the tracking part was essential because to measure a success of an ad, you need to follow someone from the click to the conversion. ### What about Measurement? [Stephané Hamel](https://www.linkedin.com/in/shamel/) used a definition quite like this: ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc71c8422-4544-4d34-8e81-92ab8b903dc5_968x584.png) Because he puts it in contrast to tracking, where the purpose is to observe users, and sees measurement as a more generic approach, with no intention of identifying individuals. But this can also be true since the measurement is a much broader term. As Matthew put it here: ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7b33e003-e37a-4445-b09b-e39dd6e884a5_932x246.png) So tracking is just a special measurement, which I think is true. But I still like the picture of a more generic measurement that does not need any identifier to collect an individual activity over time. ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f2d3c2350-3a48-4186-b299-8fe7a9d906df_972x826-png.jpg) Maybe measurement is the innocent version, where tracking already took the apple. Ok, perhaps too biblical. ### What about data collection? Interestingly, since I walk in different worlds on the data engineering island, I often come across data collection. My not scientific explanation for this is that data collection is usually ignored by data engineers. The data drops in there, and then the real work starts. Where it comes from? We don't care since we have to clean up anyway. This also describes the term collection in a better way. We only pick up what is already there and left by someone or something else. So if this data contains any tracking purpose, we might not care. Or maybe if we do, we ensure we tokenize it properly. ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7648031a-20d8-45c7-b52d-c6bdd55e1f0e_1172x670-png.jpg) What if we could come up with something friendly and positive? ### Maybe Data creation? I don't know if Yali Sassoon was the original person to come up with the term in this context, I don't know. But it was the first time I saw it. [He wrote](https://datacreation.substack.com/p/organizations-need-to-deliberately): ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4e462103-85f6-46cd-a3b5-90be1cd8f374_1504x552.png) What I like about this view is the proactive part and the real purpose of the process. I create this data point because it helps me to understand if we have generated value for our customers. It's beyond the simple: "let's track our application," which usually ends up with 95% of data never used. It also goes deeper than measuring something, where you focus on putting the sensor somewhere without a clear intention of what comes next. And it is way more active than just collecting data. ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f03c1f5dc-753a-4afa-8e71-5c1b3a2783a7_1246x644-png.jpg) So, this would be the winner for me. **Ok, I put "How to fix your data creation" on my landing page.** **Not really.** The language used is lazy, solid, and static (ask women about "guys"), and it changes very slowly. Sometimes you open a new area with your term because you describe a changed behavior much better than old terms. Analytics engineering is an excellent example of it. But data creation is not there yet. But maybe we will get there in the future. And you might have recognized that one term appeared throughout the post: ### Purpose. The purpose was something that floated in my head but was unintentional. Stephané used it with a lot of intention (same quote as above): ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f8c04ba90-4780-4e63-bb09-c6b063f49d0c_968x584.png) And Aurélie Pols brought it up just recently in a different discussion in the same context of data privacy: ![](/images/posts/trackingmeasurementcollectioncreation/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f59d64ea7-f5f1-4f22-a8c4-a29e28ae09af_1088x236-png.jpg) The purpose is what makes data creation powerful. I first define the purpose of the data, and then I take care of the creation. With a goal in mind, we will come up with less tracking because, to be honest, we don't need to know anything about an individual behavior for plenty of use cases. I will extend this purpose more in a future post. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

Retention Analytics - the definitive guide

Sun, 12 Mar 2023 00:00:00 GMT

One of the reasons why I like to work on product analytics projects is that some questions are easy to answer. Here is a regular one: What should be my North Star metric? And here is my answer that fits in 90% of all cases: A specific retention rate. Like the first-week retention rate for your free users. Because you just need to love users coming back. ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fac3c206d-eefd-457e-a162-cf12795c8520_2048x1152-png.jpg) Wow, super easy. Isn’t it? So what are you waiting for? This is great news for everyone building products and wanting to use data to optimize them - there is just one kind of analysis you need to master - the bad news is quite a steep hill to climb. That might be why so **few teams use retention analytics** extensively to learn about their product activation and adoption. So, let’s change this. My goal is quite a tough one - when you have finished this guide, you should have learned enough to open an analytics tool of your choice and create a retention rate that fits your product, telling you how you perform for product activation and adoption. It might be a good idea to take the concept of retention apart as a start. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## What is retention? Most straightforwardly - someone comes back. This means there needs to be a first time and then at least a second time. And maybe more return for something **we can identify**. ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fecd43e5b-5a27-43ba-92fc-bf6ee3290f65_1520x614-png.jpg) ### What does this identify mean? To measure if someone has returned, we must be sure it is the same person. Therefore, our first essential requirement for retention analytics is a solid user (or account) identification. I usually only do retention analytics when the service **requires a sign-in**. Only by that can we collect good enough identified user data to analyze retention. ### So what about the first time? How do we define a first time, and is there even one first time? Of course not; there are plenty of first times in a customer journey. So, we must decide which one to use for a retention analysis. As we see here, there will be plenty of different retention analyses - just by picking different first times. ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff2ebcefa-1743-4396-9b63-e04c492970a7_1538x598-png.jpg) Good candidates to pick as a first time: \- **Account created** - the classic one and a good point to build your first retention; that is where everything starts (or at least where we indeed can track it and assign it to one identical account) \- **Subscription started** \- another classic one, especially for revenue retention \- **First value experience** - a lot more tricky since you first have to create an event where a user has experienced a first value from your service. This varies naturally based on your service/product. In an analytics tool, it could be the first chart shared, in a todo app, the first todo completed. But this retention can then be compared with the Account created one and therefore identify the difference and uplift it brings when you get users to experience a first value return. ### Now the coming back After you get your users to start something in your service, you need them to return. Because as we all know, Acquisition > Activation > RETENTION > REVENUE. And we want to get to revenue eventually. ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fb40634d8-9331-4d69-9cc9-61b105166584_1062x652-png.jpg) We need to define the **“coming back condition,”** which is often a specific event, but in some cases, it can also be a combination of events or several events. We use this “coming back condition” to analyze it over a more extended period. Because we are primarily interested to see if people are returning regularly and doing something we hope they do. **What are good coming back conditions?** \- for the simplest retention, we used account creation as a first-time condition; in combination, logging in or authenticated (if you have a longer authentication cookie lifetime) or just any event is a good start. By that, we are simply interested in attracting users to return. This is fine for a start, but it doesn’t tell us if they get value out of the product \- so the next one would extend the first-value example by simply taking the same value condition (see above) and using it as a “coming back condition.” \- In the subscription context, it is more straightforward; this would be the subscription renewal as a “coming back condition.” ### How do we measure retention? Now we have a “first-time” and a “coming back” condition. How do we calculate our retention? We can do this with a simple funnel and conversion rate. ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f587ad655-3675-4d17-bae0-3f98bb651fd0_1068x632-png.jpg) So we take all users that fulfill the "first-time" condition as the foundation and divide all users that satisfy the "coming back" condition by it. Which gives us a conversion rate. Quite good and easy. But you will quickly see that this is missing an essential piece that defines retention: **The time** ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f423f24a6-c0d7-4133-8f2e-7dee136b0296_1568x896-png.jpg) Looking from a high aggregated level on a conversion rate for retention does not tell us enough, and it’s too blended and generic because we don’t incorporate the time aspects and want to see if people are retained over time and not just in general. How do we get time into this picture? ## Measuring Retention with Cohorts One type of analysis lets us investigate the performance from a first-time event to a "coming back" event over a defined time: a cohort analysis. And it gives us an analysis over time and a way to analyze how the performance improves or decreases with the release of new features. That is quite a package - let's unpack it. We keep our **"first-time" condition** or event. And we also maintain our **"coming-back" condition**. Now we need to define how **we build a cohort**. A cohort is a group of users who share the same criteria. The most commonly used criteria in a cohort analysis are time, whether the day, week or month defines a cohort. ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7079daf6-5aee-4299-a521-6760c509b82a_1290x784-png.jpg) Why is this helpful? It helps us to quickly see if changes for our product are starting to increase or if the coming-back ratio is. Here is a visualization of week cohorts: ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7673ee88-abd0-4ba8-9c13-af0bdb29a594_1216x744-png.jpg) All the cohorts are in the left column and contain all users that have signed up for this specific week. The coming back percentage is the number of users logging in divided by all users in the cohort on a weekly base (in the next seven days and so on). The table view of the cohort analysis immediately gives us feedback if we improve our retention. How does it do it? Let's assume we want to improve retention for the first week (the first seven days). On average, over the last 12 weeks, we have had a first-week retention rate of 60%. The team worked hard to find two new ways to onboard our users, and they deployed feature 1 three weeks ago and feature 2 last week. Our cohort table now looks like this: ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f52abe935-03b9-4e29-b53d-ee1d322b12bc_1266x764-png.jpg) We can see that the first-week retention started to improve two weeks ago - which looks like a good indicator for feature 1. For feature 2, we need more data to come in. But as you can see here, we only need to watch each new cohort's first-week column to see if our efforts move any needle. It is an excellent approach to test new features with a/b testing, but cohort analysis gives you a second option and a view of the general performance. ### Going beyond the time-based cohorts Defining different cohorts allows you to look at retention differently to find over and under-performing cohorts. Here are the cohorts I usually create for a product: \- based on customer/subscription type - at least free vs. paid (if you have a free plan) \- based on the initial marketing channel that brought the signup \- based on subscription plans \- based on the platform used - if you offer multi-platform products \- based on feature usage \- based on company size \- based on the target audience (if defined and you track the data to identify them) ### How to work regularly with retention data? First, define your **2-3 core retention tables**, like free vs. paid users, and put them on a dashboard. If you have a subscription model, add one for each plan. If you start, start with people just coming back as come back criteria. You can advance from here to define how they get value from your product and use this as a comeback criterion. Then define which period makes the most sense for these. Ideally, you can do this based on when you expect the users to return. In most products, this is on a weekly basis, and for subscription retention, it is most often on a monthly basis. Now check the retention tables and see which retention rate makes the most sense as a focus metric. Pick the one where the drop is most significant. When you are starting, this is usually already the first seven days. Now you can slice and dice with different cohorts to see if cohorts significantly perform better or worse than the general retention rate. And then comes the fun part - work with the team to move the retention rate and always keep in mind: \- **better retention for free users -> higher chance of subscription conversion** \- **better retention for paid users -> lower chance of churn** Sounds quite good to me - your way in product to become **a revenue hero.** There are more details about retention analytics - so I will extend this guide over time. Make sure to subscribe to get the updates. Topics to cover: \- extensive funnel/retention tables \- forecasting with cohorts \- rolling or fixed definitions ![](/images/posts/retention-analytics-the-definitive/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fce41ccf3-7466-42cb-bb8e-c07bf6511549_1080x1080.png) Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

Why product analytics is completely different than BI

Tue, 28 Feb 2023 00:00:00 GMT

Everyone in data has a unique origin story that brought them into data. At least I never met someone who didn’t have one. My origin story is a classic one (at least I heard it plenty of times from others as well): I was working as a product manager for a marketplace. And my backlog was basically the management team. They assembled once a week to discuss anything about the business, and a part of it was filling my backlog with their input. So my backlog was a collection of features of competitors (90%) and gut feeling features (10%). No surprise; that sucked a lot. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fc322de43-7c93-44d4-8c42-086c3dac3861_1024x770-png.jpg) My answer to this was data. Qualitative and quantitative. This gave me some superpowers in these meetings because I could respond with: \- I can see that only 0.3% of users use this feature, and their avg. revenue is not significant \- Based on interviews, user test sessions, and funnel data, we can see that we lost 78% here \- Sounds like a good idea; we can test it if you like against the current one These things drastically changed the conversation, and the fun returned to me. (Looking back, it was too easy for me, they were not qualified enough to challenge my approaches and deduction and thereby make them and me better, but that came later). So my approach to data comes from a product perspective. And this is still what drives anything in data for me. When I get frustrated and think, “why am I doing this” - because I want to get good products so that they can provide real progress to the users. You know, the bicycle of the mind thing really is my driving force. Building products that customers love ([Marty Cagan](https://www.amazon.com/INSPIRED-Create-Tech-Products-Customers/dp/1119387507) had an immense influence on my early work): Product analytics as a category came a lot [later](https://web.archive.org/web/20161206004549/https://amplitude.com/), but I was home there already (even when it was not called that). I was an early adopter of Kissmetrics (omg - I loved this tool so much), ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7cfcbb34-f9a1-45ab-bf85-f5ccf8103e6f_1982x828-png.jpg) And Mixpanel and Amplitude. Just because Google Analytics never did the job for me. I needed funnels (working funnels), cohorts, and user explorers. At that time, I did not really realize how different my way of working with data was, compared to other web analysts. Their focus was more on classic marketing analytics or website usage use cases. And funnily, the same thing happened later when I struggled to find a proper data model for product event data in a data warehouse. And people did not understand why a star schema was not really working for me (40 fact tables for each event type - join hells, you get it). Because their focus was on classic BI and marketing reporting. And finally, I had so many conversations during the last 12 months about what makes product analytics so hard for product teams and why so few really embrace it and use it in their daily work. Which still keeps me up at night. **That got me thinking - what makes product analytics so different from marketing analytics or classic BI?** ## The source data **Marketing analytics** doesn’t need plenty of source data: \- event data for the core funnel events - these are usually 5-8 events - super easy to implement \- aggregated data from the marketing platforms ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe866c50e-0437-4249-8ad3-aea6d5989cb8_1024x652-png.jpg) _Credits: Excalidraw_ **Classic BI reporting** can have plenty of sources, but it can be easily modeled into facts and dimensions (potentially using something like a data vault to handle changes in the source data under the hood). ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f9be19b9b-a7e6-4d89-b30a-a387b4fe955f_986x924-png.jpg) [_Thoughtspot_](https://www.thoughtspot.com/business-leader) _demo report - quite a classic BI report. Metrics on time periods._ **Product analytics** has 2-3 types of source data (depending on the business model): \- event data for product usage - can be up to 30-40 different events (or if not designed properly, some people end up with 300-500 different events) \- user dimensional data - which can change pretty often (forget slowly changing dimensional data) \- account dimensional data (if you have an account feature that can have multiple users) - which can change pretty often ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ff3e97371-f4f7-4e01-9e7d-6fe6ed27620f_1114x748.png) _Credits: Excalidraw_ These are extremely different requirements already for data creation/sourcing or collection. It takes a lot more effort to implement the instrumentation for product analytics. ## Working with the data In **marketing analytics**, everything is based on campaign performance and optimization. So you usually end up with a simple presentation layer representing the core funnel with some dimensional data about the marketing context (campaign, ads, channel,..). The real marketing analytics challenge is handling conversion and cost attribution. But once you have a solid model and a final reporting table, you are done. Some small adjustments over time, but not much. And the people working with the data will mostly work by comparing different dimensional combinations to find over and underperformers. So mostly filter work. Or you save yourself some time and use a tool like [Kausa](https://www.kausa.ai/). **BI** is not that much different, you just have more and different reports for the different teams. But you usually end up with some reports and give the users the possibility to filter/breakdown by dimensional data. **Product analytics** often starts on a blank sheet. You want to find patterns that tell you which kind of product usage leads to a high customer lifetime value. This sounds simple, but in the end, it is a bucket with millions of variants. So, we could say that product analytics is exploratory, and BI is descriptive. Which sounds harmless but in practice requires a complete different way of working with the data. ## How do you work in product analytics? As I said before, determining which product usage leads to maximum customer lifetime value is the main objective. When we unpack this we have first product usage. Product usage consists of: \- a sequence of events (results of user or system actions) of a long period of time (weeks, months, years) conducted by one identified user \- properties for these events (dimensional data) at the time of the event \- properties for the user (dimensional data) that can change over time (quite often not slowly) ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fd6839512-dbeb-4b41-bde8-4f9ce1f294dd_1286x522-png.jpg) Only by this do we end up with millions of combinations we would need to analyze (or parameterize if we want a model doing the job for us). Customer lifetime value in theory is easier. Just the collected amount over time through subscription or one-time revenue. Since you usually work with identified users, it’s easier to calculate this than other business models. But waiting for monthly revenue to add up is often too late. So you need earlier indicators if your product works and these can be hard to find. In this labyrinth of options, you need to generalize to get indicators about your product performance and starting points where you want to explore patterns. ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f7291a363-4978-4383-a51a-d6d8d258df4b_1728x864-png.jpg) ### **These general approaches are usually** **#1 the core product journey funnel** (you sit down and sketch what a successful product usage looks like, then generalize it a lot, so it fits in a funnel with 6-8 steps). ![Interpret your funnel analysis – Amplitude](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f4171887f-85c9-4465-bf42-102e08f616fc_2258x1162.png) _Funnel in Amplitude_ **#2 2-3 different retention curves** (free vs. paid users, different ones for your main product use cases), and then identify which retention rate you want to focus on (2nd-week retention, 2nd-month retention). ![Interpret your retention analysis – Amplitude](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fa9d44bac-22f5-423b-bf1b-1c5cdd38daf9_1179x679.png) _Retention in Amplitude_ And then you start to explore. You take your baseline retention curve and start to break it down by different dimensional data. Or you combine different criteria to create different cohorts and then compare them in the retention chart. This whole process takes time and only gets faster and better when you execute it constantly over a long time. It requires pure data senses, a good product understanding, and, ideally, a combination of qualitative data to get new ideas for quantitative exploration. **But exploring is the key here.** This makes the analyst role and mindset completely different from the one of a marketing or BI analyst. You need a lot more patience and time to conduct product analytics. This also makes the technical requirements so different. There is a reason why there is still dominant usage of product analytics platforms, even when there is a data warehouse and all other reporting happens based on that data. Because it is mostly exploration with event sequences, and dimensional breakdowns, it would need a crazy SQL hacker to achieve that by writing SQL in a reasonable time. ## What does that mean for a company? If you already have a decent BI setup serving different teams, especially the marketing team with performance reports, adding the product team is not simply an extension. It will take a different approach, different processes, different mindsets (and therefore, most likely different people), and most likely a different tool for this job. ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fec4f5701-4a4d-4d5e-8fed-7f868c038da0_1728x864-png.jpg) Good luck if you try to tackle it with your existing analytics (like Google Analytics) or BI setup (Tableau, Looker,..). I can already guarantee you that you will fail. Adopting product analytics is hard but can be extremely rewarding when achieved. You need to invest first in how you approach the implementation and adoption of product analytics. Look for the right persons to own this process from day one. These are usually people with product backgrounds but with the mindset to waste hours in slicing and dicing product data. The emphasis is on product background - without it, any analytical talent will not help you. Then start with defining, tracking, and analyzing the core product usage funnel. Then set up your core retention curves. Once this is in place, the real expedition can start. ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2fe59df57c-be1c-4459-90b0-51ab7b0e740a_1080x1080.png) ![](/images/posts/why-product-analytics-is-completely/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2ffbe12880-686b-47e2-9d12-f1185549d9d0_1728x864-png.jpg) Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

More than 30 unique tracking events will cause you problems

Sun, 12 Feb 2023 00:00:00 GMT

I really have a problem here. People don’t take me seriously about my take on too many unique events. ![](/images/posts/more-than-30-unique-tracking-events/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f74542c5e-09dc-4c1c-a34f-637712d748a9_1536x1536-png.jpg) Ok, maybe I was a bit too much clickbaity in the past. Here is my video - You only need eight events to track your business. But to be honest. This is also still true. For core business performance, eight events are usually sufficient. ## Let’s look into some definitions ### Unique events But what are actually unique events: An event usually has an event name, like “account created” (or a different notation if you use a different system). When I talk about unique events, I mean unique event names. ### More than 30 unique events I haven’t done any qualitative research here. It’s more of a threshold based on experience. Maybe 40-50 events will still work out for you. But from all the projects I was working on, beyond 30, the problems started taking place, which I will describe in a second. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## The problems with many events ### The Analyst productivity dilemma If an event is not analyzed, it is useless (unless it is used in automation). The typical analysis scenario is a classic exploration scenario. You start with an idea or a curiosity. Based on that, you open up your analytical tool of choice (a product analytics tool, a notebook). Now comes the first critical question: On which event data can I base my exploration? Let’s take a real-life example. You are a Saas company with a free plan. Your current conversion from the free to paid plan is still quite poor. And you want to determine if there are conditions where free users are more likely to convert to a subscription. So you would build different funnels starting with if a specific feature has been used and then if they converted to a paid subscription. You will start with a feature list from your product sorted by what you think are the essential features. And now you quickly want to build out different funnels. But this requires that you find the right event for “a feature has been used,” and a subscription has been created. If you have a focused setup with around 30 events, finding potential candidates for the analysis should not take too long. At least the chance is low that there are multiple ones. Finding the right events in a 150-unique event setup will take plenty of time. Because you often end up in a scenario where you discover multiple candidates for the right event. These candidates sometimes have quite similar names (subscription created, subscription submitted, create subscription, subscription started). So you first analyze all candidates for volumes and dependencies to rule out unlikely ones. And hopefully, end up with one promising candidate. And if not, your last resort is a debug session or speaking with a dev to figure out where the different events are triggered. This process I call the analyst productivity dilemma. You are highly motivated to find clues to improve the product but need to spend days and sometimes weeks even to get started. It’s like using a search, and the first message is that a specific index for this question needs to be built, and you need to come back later. ### The documentation excuse The most given answer to this problem above is: you need better documentation. And it’s true, really good documentation would solve this problem. You identify candidates for the right event and then check the documentation on how they are defined and how they are triggered in an ideal world. This whole process takes you some minutes, and you are ready to start. But let’s be honest - if our answer is: better documentation - we are most likely doomed. Yes, it is possible to have really good documentation. But this requires a lot of resources, strict processes, and documentation monitoring. And yes, there are tools like Avo that help significantly with it. But still, when a developer not clearly states how she/he has implemented the event, the question marks don’t disappear. And now, think about maintaining this documentation for more than 30 events. This already requires a team. ### The monitoring problem In a serious setup, you would monitor your events for two aspects: \- syntactical correctness \- volume changes The first tells you if a new release suddenly has changed the event name slightly. The second tells you if there is a potential triggering issue when the volume drops or increases significantly. All monitoring is doable. Syntactical with a tool like Avo, volume with any data observability tool (where you potentially would have to write a test query for each event - have fun with 100 or more events to do that). But when you start to receive alerts for more than 100 events, you need to figure out when and how to react to this. Alerting on volume always creates noise - more events, more noise. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. ## Why do we create many events? ### The legacy problem I would say the major reason for plenty of events is “generations” of ownership of the events. It’s easy to define and implement an event. But when time moves on and the documentation is not precise enough. There comes a day when someone else needs a similar event. She finds the existing one but is unsure if it covers what she is looking for. Even the devs can clearly find an answer. So to be sure, they implement a new event (for the same case) at a slightly different in the code base. And this process can repeat itself multiple times. I worked on setups where we had over seven instances of the same event just because of this. ### The ownership problem In a small startup, it should be unlikely, but I saw this even there. Multiple teams require event data. And the use cases and preferences are usually different. When there is not one team that is responsible for the event tracking, you will pretty sure end up in chaos. But even with a team that owns the event tracking, aligning all the requirements is a huge effort, and you always risk a severe bottleneck situation. ## How to avoid too many problems ### By design A good tracking event design is the best way to prevent too many events. My approach is to develop a source-agnostic event concept that is built around the customer and product/service journey and choose a high enough aggregation level. ![Inbox user interface](/images/posts/more-than-30-unique-tracking-events/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f48fa7212-1ad4-48ae-81cd-2752b2f20504_1796x1370-png.jpg) That means breaking down your product/service journey into entities that make the product. So for a task management tool, you might have these entities: customer, user, task, list - that’s it. Make sure that you work a lot with properties to reduce the unique entities. For each entity, you then design the activities that matter. These usually cover the lifetime of an entity. So customer lifetime activities, task lifetime activities. With these approaches, you will end up with 20-30 events. But what about these granular events that are irrelevant to business questions but important to evaluate the specific feature usage? E.g., how a list got sorted. Treat these in catch-all events like interaction clicked (element-type: sort, element-value: due date desc, element-text: sort by,…). When you provide enough context in the properties, these events work perfectly for quick feature evaluations. When they become more important, upgrade them to proper product events. ### By ownership We had this already before. One team (or person when you start) needs to own the tracking event schema. The major challenge for this team will be not to become a bottleneck. A good design can help with that. So when a new tracking event request comes in, you check: \- is this event already covered by the core customer or product events, and is maybe just a property missing to serve the exact use case? Extending and growing properties is less of a problem \- if not, introduce the event as a catch-all event and show the team how they can work with this event (how to filter it out from the other catch-all events) Besides new events, you are also responsible for event maintenance. Check regularly how the different events and properties are used in reports and analyses. Can they be improved or even reduced? If yes, do it. An event less is a big productivity win (see above). ## Embrace the lean event approach. I have seen many problems with tracking data quality and adoption that had their root cause in the event design. Based on my experience, you can drastically increase data productivity if you invest time and effort into data creation. And limit the unique event count is a really good measure for this. What are your experiences with high event volumes? Are you skeptical about reducing them? If yes, let me know in the comments. [Subscribe now](#/portal/signup)

The data’s trojan horse

Sun, 05 Feb 2023 00:00:00 GMT

This is our dashboard at [deepskydata](https://www.deepskydata.com/) - To be more precise, a part of it. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. It’s our only dashboard. There will be a second one tracking our initiatives, but that’s it. We don’t need more - we are young and laser-focused. Two dashboards are great for me because I dislike building and maintaining them. Both dashboards serve two different purposes: 1 - See if deepskydata is ok and moving - we currently focus only on member growth, so that’s what we are looking for—no retention, videos watched, or courses finished, which is all important but for a later stage. 2 - See how our bets are doing. We launch and try new things, smaller and bigger ones. Because we are not god-like, we want to see how the bets are doing. Do they move the needle or not? So, I really resonated with Benn’s latest edition [The insight industrial complex](https://benn.substack.com/p/insight-industrial-complex) — At some point over the last decade, every company became a platform, leeching all meaning out of the word: Peloton, which sells indoor exercise bikes, calls itself an “interactive fitness platform.” Casper, which sells mattresses, is a “platform built for better sleep.” Beyond Meat, which sells faux-burgers that taste like bee… when he talks about getting to the basics of insights: > “… we celebrate and reward the “simple” work of creating basic reports and of working with experts around the business…” (sorry, I took this a bit out of the immediate context, but the meaning stayed how it was written). Benn was pointing out the kind of ridiculous approach of BI tools to be more than just tools that provide “basic reports.” They want to be platforms for whatever. From a marketing and sales perspective, that makes sense. Who wants to justify the budget for a simple tool when you can get an insights platform that paints a bright future of data enlightenment? But the problem companies have are not missing insights from data; it starts much earlier. Most setups of BI platforms are **dumpsters of misunderstandings**. Between everyone. My favorite is still Metabase, who wanted to make approach data thoughtfully, so they introduced the concept of a question - first, you ask and then generate the results. And everyone later finds these questions and extends them. Most of the Metabase setups I know are a collection of 300+ questions, nothing related, most of them with data issues - in the end, no use for anyone. And, of course, this is not Metabase’s fault (maybe a bit with the question approach). It’s the people and the communication. As [someone much smarter](https://twitter.com/ejames_c/status/1593074802050617348) than me pointed it out. Data is: People > Process > Tools Looking at this, I have an idea for BI tools: **Why not start with the people?** Data and the rest of the business are often relatively isolated from each other. They live on different islands, speaking different dialects and having totally different lives. Sure, there are multiple ways to improve that. More communication, shared work - classic relationship management - all the things that data teams love to do (skip pipelines for meetings). But there is an additional way that not many have realized yet (including the tool vendors). Dashboards are the common interface between data teams and the rest of the company. They are the product of all the pipeline, transformation, testing, and modeling efforts. And they are usually something that the other teams look at regularly. So why don’t we use this space to improve our communication between data and the rest of the world? **Like a trojan horse into the growth, product, sales, and operations teams**. Of course, a dashboard shows some numbers and charts, but why end there? Let’s take our simple “is deepskydata still moving” dashboard as an example. Sometimes deepskydata is not moving - numbers don’t change - so nothing happens, or is something broken? Some extensions: \- Freshness state: when last generated or cached \- Links out to results of e2e tests, tracking logs (quick check if something has failed) \- Or in a v2 - results of these tests are pulled into the database and can be viewed with a click as context sidebar \- Quick ticket generation - when something looks off - we provide a button to generate a “check-up” ticket quickly - when we do a good job, we provide a bunch of context from the data already to the ticket (charts, logs,…) Or maybe these charts and numbers create plenty of question marks? What do we mean by members? Where does this data come from? For everyone now googling the data catalog, forget it; it’s not suitable for business teams. Why not: \- Create and link Loom videos to dashboards - explaining the dashboard itself - or just every metric \- Quick button for people to ask a question about the chart and when implemented - why not have a Q&A part linked to the chart \- Ask the users if they trust this metric or, if not, why not? - So you can generate trust ratings for dashboards. And maybe these charts are just sparking the creative minds of the viewers. So let’s give them some space: \- Quickly ask for a different view of a chart (and yes, a cool system could now reference existing versions of this chart) \- Or create a ticket from it asking about the why and what I think the options are limitless when we leave the path that dashboards are just for numbers but more a starting point for a conversation. And yes, some tools allow comments on dashboards: a good start, but just a start. And maybe there will be BI tools (platforms) that understand their real power - to support the conversation between data and business teams. To help with the people part of the problem. Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

Sometimes, you need to say: screw it

Sun, 29 Jan 2023 00:00:00 GMT

\==== **When you ask me, what kind of book can you recommend?** Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work. My answer in 95% of all situations: “[Obviously Awesome](https://www.aprildunford.com/obviously-awesome)” by April Dunford. As you can see, it’s not a data book. So why this book? Because it covers the one topic that is pretty hard to achieve but compelling when done right: Positioning. Positioning can be applied to different things: obviously, your products, but also for yourself (in a team, a community,..) or for your data team. I spent plenty of time in the last months doing this - figuring out how I position myself in the data space and how to position [Deepksydata](https://www.deepskydata.com/) as a learning platform. I defined what topics I should cover, in which way, and how it connects the dots between LinkedIn, Youtube, this substack, and then over to Deepskydata. To create this magic machine where all effort is putting maximum energy into a flywheel. **So did it work?** Initial parts, yes. I got a better idea about my strength, and for example, I want Deepskydata to be primarily a free platform giving access to data learning for a broad audience (worldwide). **But as it happened to me repeatedly, I overdid it.** I was so keen that all the things I was doing served a clear purpose and worked together hand in hand: \- LinkedIn posts should convert people to Youtube, Deepskydata, or Substack \- I only cover reasonable data setups - the ones I would do in a consulting situation \- I try to cover new emerging topics like data products or impact strategies for data teams In the end, **I had no fun anymore** to create any content. Take this Substack - I have already changed the title and position four times. It finally became apparent when I talked to a friend who also produced content and told me he had lost motivation to write about the topic most people know him for. He just needed something new. So he did, and his energy came back. After that, I realized I had to do the same. I’m just returning to the things I love - even when they don’t fit together. This means for me: \- writing on LinkedIn about the things I learn along the way. And the random thoughts I get when I am driving in the car: no more segways and just promotion for stuff on other platforms. I will mention what I do in a comment, but it is more like a footer to my posts. \- use this Substack to write about things I really love - even when they are highly unreasonable. ## The return of the hipster stack When I talk with consulting clients about future setups, I have a clear message: \- Data is always: People > Process > Tools \- Start simple, master the simple setup and then upgrade \- a tool is never a solution But here is a secret about myself: I am my worst customer. A data setup is only fun when I can play around with new tools and approaches. The best example is Deepskydata - no, we don’t have a simple data stack: \- we use a custom proxy instead of server-side GTM \- we use Tinybird as a real-time data pipeline (and also for the upcoming personalization of the platform) \- we do all dashboards in real-time \- I have to hold myself back to stick with a simple database concept and not try new ones (Firebolt, Clickhouse,…) - if [PuffinDB](https://github.com/sutoiku/puffin) would be preprod and not a concept - I would go for that. So what should I do with that? I keep my reasonable approach when I work with clients. It would be unacceptable to propose any experimental or overfitted setups when I am out of the game after some weeks. For myself. **I embrace the Hipster data stack**. This comfortable approach to use and try out new approaches. Jump into friendly rabbit holes. When I appear again - I will write about it. So, don’t forget to subscribe. Ah, I mean - do whatever you like! Thanks for reading the hipster data stack! Subscribe for free to receive new posts and support my work.

Using EventStorming when building internal data products

Sun, 22 Jan 2023 00:00:00 GMT

When I was 16 years old, I loved adventure games. And all my friends loved them too, and we spent afternoons discovering Monkey islands or Maniac Mansions. But I had a problem. While my friends had no problem investing hours and hours in figuring out where to find this missing piece or the right words for a pirate sword contest, I didn’t have any patience for that. Which was a pity because I was so much into the stories of these games, and I wanted to learn what would happen to Guybrush and Elaine, about Zack and the aliens, and what makes Edna tick in her mansion. ![Maniac Mansion on Steam](/images/posts/using-eventstorming-when-building/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2feeb11ad8-9263-49ba-a73a-33eab89bfe2f_1280x720-jpeg.jpg) My solution? Cheating. I let my friends tell me the instructions. And I used walk-throughs from magazines (yes, that was pre-internet as we know it). Was that bad? My joy came from expiring the game’s journey, not solving the puzzle. If I built a trillion-dollar app today, it would be a walk-through for start-ups. You would buy it, follow the setups, and enjoy the journey, the fame, and the fireworks exit. Yeah, that would be great. But stop dreaming; I don’t have this app for startups 😱😢. But I have it for data teams who want to build internal data products. And I am giving it away for free—here and today. Let’s unpack it. What you can get as a data team is a detailed map that shows where you can find your impact treasures. Clear in the open, to surface with a bit of work. So where can you get this treasure map? You provide the canvas and let your internal customers paint it. Let’s dive into it. ## Introducing EventStorming The good news is that a very simple workshop format will help you map out how a team in your company works. ### Why are we interested in how a team works? In data, you will often hear: “try to learn about the problems the team wants to solve and base your work around it.” But problems are problematic. They often include aspects that can’t be controlled easily. And they tend to have too many input variables and complex interconnections. And by this, they can quickly become hard to solve and will eat your resources for breakfast. Focusing on the functions of a team is much more straightforward. First, it is much easier to map and understand the function (this is what we will do with Event storming). And it is a lot easier to help to improve functions. It’s just optimization. And data teams are extremely good at optimizing things. ### What is EventStorming Alberto Brandolino invented EventStorming (describing it in [his blog post](http://ziobrando.blogspot.com/2013/11/introducing-event-storming.html) from 2013-11-18) - see [https://en.wikipedia.org/wiki/Event\\\_storming](https://en.wikipedia.org/wiki/Event\_storming). It is a specific workshop format to make actions, events, actors, and more visible in a particular domain. Alberto invented it in the context of [Domain Driven Design](https://en.wikipedia.org/wiki/Domain-driven_design). But I find it extremely useful in all different areas where you must figure out how things work together. In an EventStorming workshop, a group of people with domain knowledge map out (or storm out) all events from a starting point to an end state. They start with the events that are happening along the way on a timeline. Then extend these events with specific actions that trigger the event, systems involved in the events, what kind of data is needed, and where open questions and challenges are. The process can take a while. 2-3h or longer sessions are not rare depending on the scope of the analysis. The outcome is a massive map with all events, actions, actors, systems, and data covering a specific function of a domain. ![](/images/posts/using-eventstorming-when-building/https-3a-2f-2fsubstack-post-media-s3-amazonaws-com-2fpublic-2fimages-2f24239a41-7110-478c-bc41-c739778fc09a_2762x1370.png) The workshop can be done in person, using a huge whiteboard or wall to map things. Everyone gets post-its that represent the different assets: \- 🟧 Orange: Events \- 🟦 Blue: Actions/Command \- 🟨 Yellow: Actor \- 🟪 Purple: Business Rules \- 🎟️ Pink: External System \- 🟩 Green: Data needed \- 🟥 Red: Errors, Challenges But it also works fine in a remote environment using tools like Miro, Whimsical, or others. Sounds good so far? Let’s show it in an example in the next step. But before some links to watch some videos and find templates: \- Miro Template: \[[https://miro.com/miroverse/event-storming/](https://miro.com/miroverse/event-storming/)\] \- Website: \[[https://www.eventstorming.com/](https://www.eventstorming.com/)\] \- 50.000 orange stickies later: \[ \- EventStroming for fun and profit: \[[https://speakerdeck.com/tastapod/event-storming-for-fun-and-profit](https://speakerdeck.com/tastapod/event-storming-for-fun-and-profit)\] * * * If you want to see it in action: Watch my free course for EventStorming for Tracking Design (just ⏰ 30m). [Watch the free course](https://www.deepskydata.com/courses/event-storming-for-tracking-design?ref=sub_eventstorm) * * * ## Example EventStorming sessions for data teams to understand how other teams are working ### Product - How does a feature lifecycle work. Understanding how the product team and the product engineers work on and release features is a powerful asset. Data can significantly help product and engineering teams to save time and learn about future features. But it is crucial to know which data and where to apply it. You invite a product team into a workshop. This should include all roles, from feature ideation to deployment and analysis. In the session, let all persons map out all the required steps until a feature is released. Listen carefully for these things: \- Where is uncertainty - where are multiple back-and-forths? You usually hear when people describe situations where they are not sure. \- Ask for times between different events - go deeper on the ones that take longer. Ask the usual plenty of whys to understand why it takes so long \- Pay attention to the data needs. These sometimes come hidden, with sentences like - “we would need to know.” \- How do they follow up tests and early releases - what can be exciting for them when a feature gets released (and usually gets a lot of attention across the company) - learn how you can help them to shine (it will shine back on you) ### Growth - How to run experiments Usually, growth teams work around experiments. If not, it would be good to understand their work and map their approach to growth activities. The experimental approach is powerful since it requires good data to move quickly and with solid insights. Listen carefully for these things: \- How do they design experiments? What kind of data do they take into account for the design \- How long do the experiments run, and how does it take to set them up - can you help them to speed things up? With different models to analyze the variant data? \- How do they analyze the performance? Locally for the experiment but also globally - can you help to develop some templates that can easily be applied to new experiments There are plenty of more examples. Just talk initially to different teams and ask them about their primary functions. If they struggle to tell, you can do a storming session and map out what the team did in the last 30 days or three months. ## Takeaway: Are you an internal consultant now? Yes, you are like an internal consultant. As a data team, you are, in 95% of all cases, dependent on other teams that implement your insights or put your data into workflows. Therefore your first task is to understand how the different work and function. These maps of their ways of working will be precious for all discussions you will have in the future. Use them in catch-up meetings and update them when something has changed. Use them in feature meetings, asking where in the process the requested data or feature can help the team and how. Even this small thing will get exciting answers and usually leads to different implementations than the usual dashboard no one is looking at. Tell me how it went when you did your first storming session. And don’t forget to subscribe to get my next post on how to build internal data products. [Subscribe now](#/portal/signup)

Why not try a simple data stack

Sun, 01 May 2022 00:00:00 GMT

There are these rare no-nonsense people. I meet them sometimes in my projects. Their power is the focus. Their super-power is execution. This is what we need to fix in the next three months. Everything we do must contribute to this. They make things simple by limiting the options. It‘s convention over configuration (hello Rails or Django). The modern data stack is all about configuration. It‘s super flexible for any future scenario. It gives you the freedom to configure everything that fits 100% to your business model and operations (assuming that you know it for 100%). It would help if you had an Avengers-like team of Analytics engineers, and the sky is the limit. But when my focus is to learn where we lose leads or customers - is it ok for me that I get these results in 12 weeks because it needs to be configured. When I want to see if the new feature idea gets me more subscriptions, I don't want to wait four weeks for a new data model and dashboard. Please don‘t get me wrong; an effective, organized, big data team with data ops can get these results much quicker at scale. But we are talking about Airbnb, Netflix & Uber scale (the ones who write a lot about their setups). But what about the others. The current data is an excellent option for bigger data teams with data ops. They can scale and monitor configuration. But what about the others. ## Introducing the simple data stack ![](/images/posts/why-not-try-a-simple-data-stack/https-3a-2f-2fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984-s3-amazonaws-com-2fpublic-2fimages-2fc9181a62-96f1-43cf-91ec-4651688e5275_1280x720-png.jpg) Going back to our Rails example. What are conventions that can speed up things: \- no (low) data transformation and modeling \- managed data operations \- simple (no-code) funnel, cohort, and segmentation analysis When you get this out-of-the-box, then one person can deliver the insights that can create quick returns: \- Understanding your business model and scaling mechanics \- How many potential customers do you need to win long-term customers, and where can you find them? \- Instant feedback on business/marketing/product experiments - worth any further investments \- What are the potentials to save costs (find the clutter - aka unprofitable operations) Ok, so you suggest we go into a closed, proprietary platform? No, not - we keep the system open and flexible; we add some shortcuts. ## The simple architecture ### decoupling When systems are getting too complex, they slow down. With the MDS, this can happen pretty quickly. One strategy for this is decoupling things. In our case, we introduce a fast and slow track: fast track - delivering the metrics we need for the growth decisions. Data for experimentation and quickly moving forward. The single source of growth truth slow track - persisting the company's data brain: Well thought data warehouse infrastructure - the single source of all truth ### focus It is about tools, but issues in data stacks appear because of too much and too many. And not too much data. Too many events. Too many special business rules. Too many compromises. Too many lines of SQL. Too many people worked on parts of the solution. Focus is essential. Do less but do it great. Ok, that sounds like cheesy pub philosophy. But it‘s pretty accurate in this case. ### Data schema We introduce a hierarchy for the data we collect. Business core events are our backbone. These events are designed carefully, tracked from reliable sources, monitored, and match the operational data 100% (no more missing transactions). They don‘t need to be questioned because everyone would immediately know if there is an issue with one of them. Product and UX-related events are essential for feature development. But they are not watched all time. Usually, when a team works on a feature. Important features should get a schema and monitoring. Anything else can be tracked more loosely (yes, even with auto-tracking). For teams with an existing event setup - see this as an event diet. Any system bigger than 20-30 events becomes hard to use and maintain. ### Data collection We use one layer for all event data. We are receiving it from the frontend, backend, or SaaS tools. From there, the data is passed on to different systems. And yes, we also load it into a Data Warehouse. Why? Because it‘s easy and cheap. And there can be use cases where it becomes handy (see later). The unusual thing here is the SaaS tools. More and more business data is generated outside of your system. Be it in a CRM, Customer Success, Customer support, Subscription tool. You can integrate these by using Webhooks, where tools send relevant event data to your endpoints, and you collect them. ### Data activation For most companies, there are these core functions where data can immediately help: \- Visualize the customer journey funnel (and in cohorts to see improvements over time) - this tells you where you need to focus \- Show if growth experiments (the work on marketing, sales, or product features) change the funnel (aka business outcome) \- Segmentation, Segmentation, Segmentation to find over- and under-performers within these reports (helps you with the optimization) Use a tool that easily lets you create these reports. Besides the device, everything else is just setting up a process. ## Stack examples Pretty easy. **Data schema:** Avo or Segment Protocols (I recommend Avo since it is agnostic and has more collaboration and testing features) **Data collection:** Segment or Rudderstack (Jitsu is another option but relatively new. MParticle is an interesting option but more targeted for enterprise). Rudderstack and Jitsu both offer an Open Source version. What about Snowplow? We will talk about Snowplow later. **Data activation:** _The classics:_ Amplitude or Mixpanel (Amplitude offers more for experimentation as an add-on). _The challengers:_ Heap or Posthog: both offer auto-tracking which can be interesting for product feature analysis. Posthog is also open source. ## Extending the stack From the tools, this is not spectacular new. We used this stack for years. What has changed for me is the clear focus on tracking the customer lifecycle touchpoints. Wherever they are happening - if in a CRM, we use webhooks to track them. The tools themselves are old stuff. But now, let‘s talk about the extensions. We all love add-ones, don‘t we? ### Enrich with backend data Your application database usually holds information that can be valuable for segmentation. Imagine you offer data integration like Fivetran. I would put: \- the number of rows loaded \- the number of sources \- what destinations \- ... With a Reverse ETL, I can get these into my analysis tool. ### Control and enrich the event data before it enters In one use case I am implementing; we are using Snowplow for the event pipeline. This enforces a schema and gives us some enrichment out of the box. The data ends up in the database, and we send it from there into our analytics tool via Reverse ETL. ### Marketing cost attribution Marketing cost attribution by itself is a complex thing. But you might start with some simple implementations. The calculation can happen in your database. Ideas: \- calculate the number of signups for each campaign and the campaign costs for a week. Divide it and push it back with Reverse ETL to the user properties \- map campaign data to channel data in your database and push it back ### What’s next I am trying out a lot to see how a simple data stack can be extended. But the core works really well. Especially for companies with no dedicated data people, it's better to start and move forward. You can hire someone to set up a modern data stack for you, but you can't extend it and maintain it afterward. The modern data stack setup is "easy" to implement and hard to maintain. The simple data stack setup is straightforward to maintain. That's why I like it. Do you work with a similar approach or similar to some extent? Or do you think the modern stack is far easier to maintain than I described it? Let me know and just hit the reply button.

Coming soon — a new project announcement

Fri, 18 Mar 2022 00:00:00 GMT

**This is Hipster Data Lab**, a newsletter about When no one asks anymore if the data is correct. [Subscribe now](#/portal/signup)