<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Frequentist.org</title>
<link>https://frequentist.org/</link>
<atom:link href="https://frequentist.org/index.xml" rel="self" type="application/rss+xml"/>
<description>I build analytics systems, dashboards, and models that replace intuition with evidence — from KPI frameworks to forecasting and experimentation.</description>
<image>
<url>https://frequentist.org/assets/DSCF6338.jpg</url>
<title>Frequentist.org</title>
<link>https://frequentist.org/</link>
</image>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Thu, 23 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>My German Website Is Live</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20260423-frequentist-de/</link>
  <description><![CDATA[ 






<p>If you’ve been following this blog, you know that I write in English. But my professional work is largely rooted in the German-speaking market — Berlin, Germany, and the broader DACH region — so I’ve built a dedicated German-language site.</p>
<section id="what-is-it" class="level2">
<h2 class="anchored" data-anchor-id="what-is-it">What is it?</h2>
<p><a href="https://frequentist.de">Frequentist.de</a> is my professional homepage as a Business Intelligence &amp; Analytics freelancer based in Berlin. It lays out what I do, how I think about analytics, and who I typically work with. The site is written entirely in German, aimed at companies in the DACH region looking for hands-on BI expertise.</p>
</section>
<section id="whats-on-it" class="level2">
<h2 class="anchored" data-anchor-id="whats-on-it">What’s on it</h2>
<p>The site covers six service areas:</p>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="kpi-systems-metric-design" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="kpi-systems-metric-design">KPI Systems &amp; Metric Design</h3>
<p>Defining metrics that actually reflect business performance, not just ones that are easy to track.</p>
</section>
<section id="business-dashboards-reports" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="business-dashboards-reports">Business Dashboards &amp; Reports</h3>
<p>SQL-driven dashboards, cleanly structured, without unnecessary overhead.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="forecasting-planning" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="forecasting-planning">Forecasting &amp; Planning</h3>
<p>Projecting cashflow, revenue, and costs with explicit uncertainty modeling.</p>
</section>
<section id="experiments-causal-analysis" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="experiments-causal-analysis">Experiments &amp; Causal Analysis</h3>
<p>Designing and evaluating. experiments to answer the real question: does this actually work?</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="analytics-data-engineering" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="analytics-data-engineering">Analytics &amp; Data Engineering</h3>
<p>ETL/ELT pipelines and reproducible data workflows.</p>
</section>
<section id="decision-support" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="decision-support">Decision Support</h3>
<p>Translating analytical findings into concrete, actionable recommendations.</p>
</section>
</div>
</div>
</section>
<section id="who-its-for" class="level2">
<h2 class="anchored" data-anchor-id="who-its-for">Who it’s for</h2>
<p>The typical clients I expect to gain through <a href="https://frequentist.de">frequentist.de</a> are mid-sized companies, startups, and agencies in Berlin and the DACH region — either building internal data capabilities from scratch or professionalizing existing BI structures.</p>
</section>
<section id="see-also" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some of my recent posts on BI, analytics, and strategy that might be of interest:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="U2FsZXMlMkNTdHJhdGVneQ==" data-listing-date-sort="1774224000000" data-listing-file-modified-sort="1774302780174" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="855" data-listing-title-sort="Practical Concepts for AI Driven Sales Coaching" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260323-ai-sales-coaching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260323-ai-sales-coaching/index.html" class="title listing-title">Practical Concepts for AI Driven Sales Coaching</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="1" data-categories="UHJvZHVjdCUyQ01hcmtldGluZyUyQ1NwYXRpYWwlMkNHZW9zcGF0aWFsJTJDU3RyYXRlZ3k=" data-listing-date-sort="1770422400000" data-listing-file-modified-sort="1770660370665" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2663" data-listing-title-sort="Using Transit Time to Rethink Hotel Search" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260207-transit-time-hotel-search/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260207-transit-time-hotel-search/index.html" class="title listing-title">Using Transit Time to Rethink Hotel Search</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="2" data-categories="S1BJJTIwRGVzaWduJTJDQkklMkNTdHJhdGVneQ==" data-listing-date-sort="1769558400000" data-listing-file-modified-sort="1772988050630" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2630" data-listing-title-sort="The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260128-seven-step-kpi-blueprint/index.html" class="title listing-title">The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="4" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="6" data-categories="QkklMkNFVEw=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1769683733598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="397" data-listing-title-sort="BI System Blueprint" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250106-bi-flowchart/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250106-bi-flowchart/index.html" class="title listing-title">BI System Blueprint</a>
</td>
<td>
<span class="listing-reading-time">2 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>News</category>
  <guid>https://frequentist.org/posts/20260423-frequentist-de/</guid>
  <pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20260423-frequentist-de/image.jpeg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Practical Concepts for AI Driven Sales Coaching</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20260323-ai-sales-coaching/</link>
  <description><![CDATA[ 






<p>The modern sales landscape is often characterized by a “broken” coaching model where managers, overwhelmed by data and deals, fall back on reactive, subjective feedback based on a tiny sample of calls.</p>
<p>In a recent webinar <a href="https://www.linkedin.com/events/7437378587051048960/" target="_blank">How to Build Personalized Coaching Paths for Every Rep</a> featuring leaders from <strong>Docebo</strong>, <strong>Glyphic</strong>, and <strong>Enso Connect</strong>, several innovative frameworks were introduced to solve the scalability crisis in sales enablement.</p>
<p>In this article, I overview four concepts for building a high-performing sales team in the age of AI.</p>
<section id="the-nine-box-framework-for-managing-brain-fry" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="the-nine-box-framework-for-managing-brain-fry"><span class="header-section-number">1</span> The Nine-Box Framework for Managing “Brain Fry”</h2>
<p>One of the most common mistakes sales leaders make is “shoving the entire elephant” down their team’s throats by introducing too many changes at once. To combat this, Mark Kosoglow (CRO at Docebo) introduced a <strong>nine-box framework</strong> designed to measure a team’s <strong>“available capacity of change”</strong>.</p>
<p>The framework plots initiatives on two axes:</p>
<ul>
<li><p><strong>Competency:</strong> Moving from <strong>Aware</strong> (passed a quiz) to <strong>Competent</strong> (internal role-play) to <strong>Mastery</strong> (proven in a real-world call).</p></li>
<li><p><strong>Cognitive Load:</strong> The amount of effort a rep must exert to change their existing behavior, categorized as Low, Medium, or High.</p></li>
</ul>
<div style="margin: 2rem 0;">
<table class="caption-top table">
<caption>The 9-box matrix measures the mental effort required against the desired level of skill</caption>
<colgroup>
<col style="width: 19%">
<col style="width: 19%">
<col style="width: 19%">
<col style="width: 19%">
</colgroup>
<thead>
<tr class="header">
<th>Cognitive Load /<br>
Competency</th>
<th>Awareness<br>
(1-3 pts)</th>
<th>Competence<br>
(2-6 pts)</th>
<th>Mastery<br>
(3-9 pts)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Low</strong></td>
<td>1 Point</td>
<td>2 Points</td>
<td>3 Points</td>
</tr>
<tr class="even">
<td><strong>Medium</strong></td>
<td>2 Points</td>
<td>4 Points</td>
<td>6 Points</td>
</tr>
<tr class="odd">
<td><strong>High</strong></td>
<td>3 Points</td>
<td>6 Points</td>
<td>9 Points</td>
</tr>
</tbody>
</table>
</div>
<p>Under this system, a team is given a <strong>17-point quarterly budget</strong>. A “High Cognitive Load Mastery” project is worth <strong>9 points</strong>, meaning a team can only handle one such major behavioral shift per quarter. This forces leadership to make <strong>trade-offs</strong> rather than overwhelming reps with “brain fry”, where employee satisfaction plummets.</p>
</section>
<section id="moving-from-call-samples-to-aggregate-data-sets" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="moving-from-call-samples-to-aggregate-data-sets"><span class="header-section-number">2</span> Moving from Call Samples to Aggregate Data Sets</h2>
<p>Traditional coaching relies on managers manually reviewing a handful of “cherry-picked” calls — typically four or five a week — which provides a distorted view of performance. AI shifts this paradigm by providing an <strong>aggregate data set</strong>.</p>
<p>When every single call is automatically scored against a rubric, several things change:</p>
<ul>
<li><p><strong>Accountability:</strong> Reps behave differently when they know every interaction is being reviewed.</p></li>
<li><p><strong>Action over Collection:</strong> Managers stop spending hours collecting data and instead spend those hours <strong>actioning</strong>.</p></li>
<li><p><strong>Personalization:</strong> Leaders can see team’s strengths and weaknesses, allowing them to tailor coaching to specific individual needs rather than generalities.</p></li>
</ul>
</section>
<section id="forecasting-via-binary-risk-assessment" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="forecasting-via-binary-risk-assessment"><span class="header-section-number">3</span> Forecasting via Binary Risk Assessment</h2>
<p>Personalized coaching is described as the most critical driver of <strong>forecast accuracy</strong>. Many organizations struggle with “haircuts”, where each layer of management arbitrarily reduces a forecast because they don’t trust the underlying details.</p>
<p>To fix this, the speakers suggested creating a <strong>binary forecasting system</strong>. Instead of using vague terms like “best case” or “most likely”, AI-driven scorecards can determine risk based on objective, binary yes/no criteria.</p>
<p>By decoupling <strong>deal maturity</strong> (the sales stage) from <strong>risk assessment</strong>, one organization reported increasing their forecasting accuracy from <strong>54% to 97%</strong>.</p>
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Sales teams often assume that deals in later stages are likely to close. In reality, stage ≠ certainty. Evaluating actual risk signals separately improves forecast accuracy.</p>
<p>A deal can be in the late stage and still be high risk because a) the decision-maker hasn’t been involved, or b) a competitor is actively being considered, or c) budget is not fully confirmed.</p>
<p>By separating deal stage (where it is in the process) from risk signals (how healthy it actually is), teams can make more realistic forecasts and avoid last-minute surprises.</p>
</div>
</div>
</div>
</section>
<section id="rewarding-the-exposure-of-risk" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="rewarding-the-exposure-of-risk"><span class="header-section-number">4</span> Rewarding the “Exposure of Risk”</h2>
<p>AI tools now allow for <strong>proactive risk identification</strong> by tracking “competitive mentions” on calls and instantly alerting managers via Slack. This data is only useful if the organization fosters a culture where exposing risk is rewarded rather than punished.</p>
<p>The speakers emphasized that “losing alone is a fireable offense”. The goal of AI-supported deal reviews is to ensure that the minute a rep feels a deal is in danger, they flag it so the rest of the team can step in and work to save it.</p>
<p>AI provides the “receipts” to show reps that specific behaviors (like quantifying a pain point) can increase win rates significantly, turning the scorecard from a “Big Brother” surveillance tool into a “self-service” empowerment platform.</p>
</section>
<section id="conclusion" class="level2 unnumbered page-columns page-full">
<h2 class="unnumbered anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>A recurring theme across these concepts is the need to move beyond individual call reviews toward <strong>continuous, contextual understanding of rep behavior</strong>.</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>Check out the Projects section to learn more about the technology behind the Sales Signals app:</p>
<p><a href="https://frequentist.org/projects/app-sales-signals/">Sales Signals app (Agentic AI)</a></p>
</div></div><p>This is the core idea behind <a href="https://www.salessignals.app/" class="salessignals" target="_blank">Sales Signals</a> — a system turning conversation data into actionable signals that connect behavior, coaching, and deal outcomes over time.</p>
</section>
<section id="see-also" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some of my other posts related to sales strategy, coaching, and professional development:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="UHJvZHVjdCUyQ01hcmtldGluZyUyQ1NwYXRpYWwlMkNHZW9zcGF0aWFsJTJDU3RyYXRlZ3k=" data-listing-date-sort="1770422400000" data-listing-file-modified-sort="1770660370665" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2663" data-listing-title-sort="Using Transit Time to Rethink Hotel Search" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260207-transit-time-hotel-search/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260207-transit-time-hotel-search/index.html" class="title listing-title">Using Transit Time to Rethink Hotel Search</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="1" data-categories="S1BJJTIwRGVzaWduJTJDQkklMkNTdHJhdGVneQ==" data-listing-date-sort="1769558400000" data-listing-file-modified-sort="1772988050630" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2630" data-listing-title-sort="The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260128-seven-step-kpi-blueprint/index.html" class="title listing-title">The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>
</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Sales</category>
  <category>Strategy</category>
  <guid>https://frequentist.org/posts/20260323-ai-sales-coaching/</guid>
  <pubDate>Mon, 23 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20260323-ai-sales-coaching/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Implementing a Neural Network in Base R</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20260214-backpropagating-love/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>This small project implements a fully connected neural network using only matrix operations, a logistic activation function, and manually derived gradients. There is no high-level frameworks like <em>keras</em> or <em>torch</em>, just linear algebra and backpropagation written explicitly.</p>
<p>The network is trained to approximate a parametric heart curve. During training, frames are saved and combined into a video, so you can watch the model gradually learn the shape.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20260214-backpropagating-love/valentine.gif" class="img-fluid figure-img"></p>
<figcaption>Animation shows a model learning a parametric curve</figcaption>
</figure>
</div>
</section>
<section id="network-architecture" class="level2">
<h2 class="anchored" data-anchor-id="network-architecture">Network Architecture</h2>
<p>This network is a <strong>Multilayer Perceptron (MLP)</strong> structured as a <img src="https://latex.codecogs.com/png.latex?1%20%5Cto%206%20%5Cto%207%20%5Cto%202"> feed-forward architecture. It maps a single input scalar representing progress along a path through two hidden layers of 6 and 7 neurons to a 2D coordinate output.</p>
<ul>
<li><p><strong>Input Layer (</strong><img src="https://latex.codecogs.com/png.latex?L_0">): A single node (<img src="https://latex.codecogs.com/png.latex?n=1">) representing the value <img src="https://latex.codecogs.com/png.latex?x">, the amount traveled along the curve from 0 to 1.</p></li>
<li><p><strong>Hidden Layer 1 (</strong><img src="https://latex.codecogs.com/png.latex?L_1">): Six neurons using the logistic sigmoid activation function <img src="https://latex.codecogs.com/png.latex?%5Csigma(z)%20=%20%5Cfrac%7B1%7D%7B1%20+%20%5Cexp(-z)%7D"> to capture initial non-linear features.</p></li>
<li><p><strong>Hidden Layer 2 (</strong><img src="https://latex.codecogs.com/png.latex?L_2">): Seven neurons that further process the signals to handle the complex geometric “lobes” of the heart curve.</p></li>
<li><p><strong>Output Layer (</strong><img src="https://latex.codecogs.com/png.latex?L_3">): Two nodes representing the <img src="https://latex.codecogs.com/png.latex?x"> and <img src="https://latex.codecogs.com/png.latex?y"> (or <img src="https://latex.codecogs.com/png.latex?y_1,%20y_2">) coordinates of the curve’s position in 2D space.</p></li>
<li><p><strong>Connectivity:</strong> The topology is <strong>fully connected</strong>, meaning every neuron in one layer is linked to every neuron in the next via a weight matrix (<img src="https://latex.codecogs.com/png.latex?W">) and a bias vector (<img src="https://latex.codecogs.com/png.latex?b">).</p></li>
</ul>
<div class="light-content">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="network.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Network topology: a multilayer perceptron"><img src="https://frequentist.org/posts/20260214-backpropagating-love/network.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" width="480" alt="Network topology: a multilayer perceptron"></a></p>
</figure>
</div>
<figcaption>Network topology: a multilayer perceptron</figcaption>
</figure>
</div>
</div>
<div class="dark-content">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="network-dark.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="Network topology: a multilayer perceptron"><img src="https://frequentist.org/posts/20260214-backpropagating-love/network-dark.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" width="480" alt="Network topology: a multilayer perceptron"></a></p>
</figure>
</div>
<figcaption>Network topology: a multilayer perceptron</figcaption>
</figure>
</div>
</div>
</section>
<section id="gradient-descent-process" class="level2">
<h2 class="anchored" data-anchor-id="gradient-descent-process">Gradient Descent Process</h2>
<p>For each iteration of the training loop, the network updates its parameters to minimize the distance between the predicted coordinates (<img src="https://latex.codecogs.com/png.latex?a%5E%7B(3)%7D">) and the target heart curve coordinates (<img src="https://latex.codecogs.com/png.latex?y">).</p>
<section id="cost-gradient-calculation" class="level4">
<h4 class="anchored" data-anchor-id="cost-gradient-calculation">1. Cost Gradient Calculation:</h4>
<p>The error is propagated backward through the layers. For the final weight matrix (<img src="https://latex.codecogs.com/png.latex?W%5E%7B(3)%7D">), the Jacobian is calculated as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BJ%7D_%7B%5Cmathbf%7BW%7D%5E%7B(3)%7D%7D%20=%20%5Cfrac%7B1%7D%7BN%7D%20%5Csum%202(%5Cmathbf%7Ba%7D%5E%7B(3)%7D%20-%20%5Cmathbf%7By%7D)%20%5Ccdot%20%5Csigma'(%7Bz%7D%5E%7B(3)%7D)%20%5Ccdot%20%5Cmathbf%7Ba%7D%5E%7B(2)T%7D"></p>
</section>
<section id="stochastic-update" class="level4">
<h4 class="anchored" data-anchor-id="stochastic-update">2. Stochastic Update:</h4>
<p>The code applies “aggression” (learning rate) and optional “noise” to prevent the model from getting stuck in local minima:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7BW%7D_%7Bnew%7D%20=%20%5Cmathbf%7BW%7D_%7Bold%7D%20-%20(%5Cmathbf%7BJ%7D%20%5Ctimes%20(1%20+%20%5Ctext%7Brandom%5C_noise%7D))%20%5Ctimes%20%5Ctext%7Baggression%7D"></p>
</section>
<section id="topological-impact" class="level4">
<h4 class="anchored" data-anchor-id="topological-impact">3. Topological Impact:</h4>
<ul>
<li><p><img src="https://latex.codecogs.com/png.latex?W%5E%7B(1)%7D"> (6x1): Adjusts how the single input <img src="https://latex.codecogs.com/png.latex?x"> is mapped to the 6 neurons of the first hidden layer.</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?W%5E%7B(2)%7D"> (7x6): Adjusts the 42 unique connections between the first and second hidden layers.</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?W%5E%7B(3)%7D"> (2x7): Adjusts how the 7 high-level features are combined into the final <img src="https://latex.codecogs.com/png.latex?x"> and <img src="https://latex.codecogs.com/png.latex?y"> coordinates of the heart.</p></li>
</ul>
<p>The script performs approximately 20,000 of these updates to refine the curve. Each iteration slightly shifts the 2D coordinates produced by the output layer until they trace the distinctive heart shape defined in the <code>training_data</code> function.</p>
</section>
</section>
<section id="acknowledgments" class="level2">
<h2 class="anchored" data-anchor-id="acknowledgments">Acknowledgments</h2>
<p>This project is based on one of the lab exercises from Imperial College London’s “Multivariate Calculus for Machine Learning” course. I re-implemented the network in base R, translated the mathematical components from Python, and extended the project with visualization pipeline to produce a frame-by-frame learning animation.</p>
</section>
<section id="source-code" class="level2">
<h2 class="anchored" data-anchor-id="source-code">Source Code</h2>
<p>Source code is available on GitHub: <a href="https://github.com/AxesAccess/Backpropagating-Love">Backpropagating Love</a>.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some related posts that you might find interesting:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="1" data-categories="UHl0aG9uJTJDUiUyQ01hdGxhYg==" data-listing-date-sort="1739491200000" data-listing-file-modified-sort="1770926078462" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="607" data-listing-title-sort="Nerdy Valentine's in Python, R, and Matlab" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250214-valentines/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250214-valentines/index.html" class="title listing-title">Nerdy Valentine’s in Python, R, and Matlab</a>
</td>
<td>
<span class="listing-reading-time">4 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>
</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>ML</category>
  <category>Animation</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20260214-backpropagating-love/</guid>
  <pubDate>Sat, 14 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20260214-backpropagating-love/image.gif" medium="image" type="image/gif"/>
</item>
<item>
  <title>Using Transit Time to Rethink Hotel Search</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20260207-transit-time-hotel-search/</link>
  <description><![CDATA[ 






<section id="executive-summary" class="level2 unnumbered unlisted">
<h2 class="unnumbered unlisted anchored" data-anchor-id="executive-summary">Executive summary</h2>
<p>Price competition in hotel search has largely saturated, especially for leisure travel. As a result, platforms increasingly differentiate by helping users make better decisions.</p>
<p>One of the largest hidden costs of travel is time spent commuting between accommodations and the places travelers want to visit. Yet most hotel search interfaces still rely on distance-based location filters that ignore public transit topology and fail to reflect real travel effort.</p>
<p>This article explores an alternative approach: ranking accommodations by total public transit time to user-selected points of interest. Using Berlin as a proof of concept, I show that <strong>transit-time-based ranking</strong> reveals meaningful structure that distance-based filters consistently miss. Well-connected areas extend far beyond the city center, while poorly connected locations can triple total travel time over the course of a trip.</p>
<p>The findings suggest that optimizing for transit time can improve user decision-making, <strong>increase conversion</strong> rates, reduce pressure on a small set of central hotels, and surface better-distributed inventory — all with <strong>minimal operational cost</strong> and deterministic analytics.</p>
</section>
<section id="the-problem-isnt-price" class="level2">
<h2 class="anchored" data-anchor-id="the-problem-isnt-price">The problem isn’t price</h2>
<p>Most travel today is leisure travel. And leisure travelers don’t optimize for the cheapest possible stay, but for the overall experience.</p>
<p>Price still matters, of course. But once price competition is saturated, reducing it further yields diminishing returns. At that point, the real differentiation comes from helping users make <em>better decisions</em>, not just cheaper ones.</p>
<p>One of the biggest hidden costs of a trip is time.</p>
<ul>
<li><p>Time spent commuting.</p></li>
<li><p>Time spent navigating unfamiliar transit systems.</p></li>
<li><p>Time spent realizing that a hotel was “well located” only on a map.</p></li>
</ul>
</section>
<section id="optimization-problem" class="level2">
<h2 class="anchored" data-anchor-id="optimization-problem">Optimization problem</h2>
<p>When searching for a hotel, users are already solving a complex optimization problem.</p>
<p>They juggle price, amenities, aesthetics, availability, and location. On top of that, they carry a mental list of places they want to visit: attractions, museums, neighborhoods, restaurants.</p>
<p>Now add three constraints:</p>
<ul>
<li><p>unfamiliar city,</p></li>
<li><p>multiple points of interest,</p></li>
<li><p>spatial reasoning under uncertainty.</p></li>
</ul>
<p>This quickly exceeds human cognitive limits. Most people cannot reliably reason about more than a handful of entities at once — and certainly not while doing spatial calculations in their head.</p>
<p>As a result, users fall back to heuristics. The most common one is proximity.</p>
</section>
<section id="beyond-the-proximity-paradigm" class="level2">
<h2 class="anchored" data-anchor-id="beyond-the-proximity-paradigm">Beyond the proximity paradigm</h2>
<p>“Close to the center” is the default shortcut, but proximity is a poor proxy for convenience:</p>
<ul>
<li><p><strong>Distance</strong> <img src="https://latex.codecogs.com/png.latex?%5Cne"> travel time</p></li>
<li><p><strong>City center</strong> <img src="https://latex.codecogs.com/png.latex?%5Cne"> optimal access</p></li>
<li><p><strong>Straight-line proximity ignores transit topology</strong></p></li>
</ul>
<p>Two hotels equally distant from an attraction can have completely different travel times depending on transfers, line frequency, or network connectivity.</p>
<p>If travelers care about how much time they spend moving through a city (and they do) then optimizing for coordinates is the wrong abstraction.</p>
<p>We should optimize for <em>time</em>, not for distance.</p>
</section>
<section id="different-concept-total-transit-time" class="level2">
<h2 class="anchored" data-anchor-id="different-concept-total-transit-time">Different concept: total transit time</h2>
<p>Instead of asking <em>how far from the city center</em> a hotel is, we can ask a more relevant question:</p>
<blockquote class="blockquote">
<p><strong>How much time will a traveler spend getting to the places they want to visit?</strong></p>
</blockquote>
<p>The idea is simple:</p>
<ul>
<li><p>Let users select multiple points of interest.</p></li>
<li><p>Calculate public transit time between hotels and those POIs.</p></li>
<li><p>Rank accommodations by <strong>aggregate transit time.</strong></p></li>
</ul>
<p>There is no heavy machine learning in background involved. Just a deterministic algorithm precomputing transit times, and a simple aggregation function to rank hotels based on user-selected POIs.</p>
</section>
<section id="proof-of-concept" class="level2">
<h2 class="anchored" data-anchor-id="proof-of-concept">Proof of concept</h2>
<p>To test whether this idea is practical and meaningful, I built a proof of concept using Berlin as an example. The goal was to understand whether transit-time–based ranking reveals meaningful structure and whether it contradicts common assumptions.</p>
<p>I’ve gathered data on 455 hotels (Figure&nbsp;1) and top 50 most popular points of interest in Berlin (Figure&nbsp;2) and calculated 9,444 transit times between hotels and those attractions.</p>
<div class="cell">
<div class="cell-output-display">
<div id="fig-hotels-cluster-map" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-hotels-cluster-map-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="index_files/figure-html/fig-hotels-cluster-map-1.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Figure&nbsp;1: Hotel density in Berlin (H3 hexagons). Red circles show 1km, 3km, and 5km radiuses from city center. These distances are typically used in filters when users search for hotels. However, they do not correspond to actual transit times, which are influenced by the city’s transit network topology."><img src="https://frequentist.org/posts/20260207-transit-time-hotel-search/index_files/figure-html/fig-hotels-cluster-map-1.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-hotels-cluster-map-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Hotel density in Berlin (H3 hexagons). Red circles show 1km, 3km, and 5km radiuses from city center. These distances are typically used in filters when users search for hotels. However, they do not correspond to actual transit times, which are influenced by the city’s transit network topology.
</figcaption>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<div id="fig-popular-pois-map" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-popular-pois-map-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="index_files/figure-html/fig-popular-pois-map-1.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="Figure&nbsp;2: Popular POIs in Berlin (top 50 by user ratings). Shapes and colors indicate categories. Note that the most popular POIs are not necessarily the closest to the city center, and they are distributed across the city, which highlights the importance of transit connectivity over simple proximity."><img src="https://frequentist.org/posts/20260207-transit-time-hotel-search/index_files/figure-html/fig-popular-pois-map-1.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-popular-pois-map-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Popular POIs in Berlin (top 50 by user ratings). Shapes and colors indicate categories. Note that the most popular POIs are not necessarily the closest to the city center, and they are distributed across the city, which highlights the importance of transit connectivity over simple proximity.
</figcaption>
</figure>
</div>
</div>
</div>
<p>Then I simulated travelers visiting attractions based on different interest profiles. Assuming trip duration is 7 days, and traveler starts with a new location each day, I picked 7 random locations and calculated average transit time from each hotel to chosen locations (Figure&nbsp;3, Figure&nbsp;4, Figure&nbsp;5).</p>
<div class="cell">
<div class="cell-output-display">
<div id="fig-scenario-1" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-scenario-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="index_files/figure-html/fig-scenario-1-1.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="Figure&nbsp;3: Scenario 1: Traveler with random interests. Red circles show 1km, 3km, and 5km radius from city center. Hexagons are colored by average transit time to the selected POIs. Black isolines show travel time in 5-minute intervals."><img src="https://frequentist.org/posts/20260207-transit-time-hotel-search/index_files/figure-html/fig-scenario-1-1.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-scenario-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Scenario 1: Traveler with random interests. Red circles show 1km, 3km, and 5km radius from city center. Hexagons are colored by average transit time to the selected POIs. Black isolines show travel time in 5-minute intervals.
</figcaption>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<div id="fig-scenario-2" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-scenario-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="index_files/figure-html/fig-scenario-2-1.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="Figure&nbsp;4: Scenario 2: Traveler interested in museums and theaters. Background map tiles are hidden for clarity. Distance-based filters (1km, 3km, 5km) do not align with optimal transit times. Note that there are several well-connected areas outside the 5km radius."><img src="https://frequentist.org/posts/20260207-transit-time-hotel-search/index_files/figure-html/fig-scenario-2-1.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-scenario-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Scenario 2: Traveler interested in museums and theaters. Background map tiles are hidden for clarity. Distance-based filters (1km, 3km, 5km) do not align with optimal transit times. Note that there are several well-connected areas outside the 5km radius.
</figcaption>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<div id="fig-scenario-3" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-scenario-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="index_files/figure-html/fig-scenario-3-1.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-5" title="Figure&nbsp;5: Scenario 3: Traveler interested in nightclubs and attractions. Note that the isolines of travel time do not correspond to distance-based filters. While the area within 1km radius is well connected, there are several areas with equal travel time outside this circle."><img src="https://frequentist.org/posts/20260207-transit-time-hotel-search/index_files/figure-html/fig-scenario-3-1.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-scenario-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Scenario 3: Traveler interested in nightclubs and attractions. Note that the isolines of travel time do not correspond to distance-based filters. While the area within 1km radius is well connected, there are several areas with equal travel time outside this circle.
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="key-findings" class="level2">
<h2 class="anchored" data-anchor-id="key-findings">Key findings</h2>
<p>Looking at the results, several key insights emerge:</p>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div id="col1" class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>First, the area with minimal to moderate transit times is much wider than expected. Efficient access is not confined to the city center.</p>
</div>
</div>
</div>
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Second, distance from the center is not a good predictor for the transit time. Areas with equal transit times are far from circular shape.</p>
</div>
</div>
</div>
</div>
<div id="col2" class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Third, booking a hotel in a poorly connected location can easily <strong>triple</strong> total transit time over the course of a trip.</p>
</div>
</div>
</div>
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Finally, neighboring areas can differ dramatically. Two hotels just a few streets apart may lead to completely different experiences of the same city.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<p>These effects are invisible when using distance-based filters, but immediately obvious when optimizing for time.</p>
</section>
<section id="product-implications" class="level2">
<h2 class="anchored" data-anchor-id="product-implications">Product implications</h2>
<p>From a product perspective, the implications are straightforward. This approach improves both user experience and marketplace balance.</p>
<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Higher conversion rates
</div>
</div>
<div class="callout-body-container callout-body">
<p>Better decision support leads to higher conversion rates. When users understand the trade-offs, they commit with more confidence.</p>
</div>
</div>
<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>More value for travelers
</div>
</div>
<div class="callout-body-container callout-body">
<p>Reducing transit time creates real value for travelers, not a perceived value, but actual hours saved during a trip. This can lead to higher satisfaction and repeat bookings.</p>
</div>
</div>
<div class="callout callout-style-simple callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Better marketplace balance
</div>
</div>
<div class="callout-body-container callout-body">
<p>At the platform level, this approach reduces pressure on a small set of “top” hotels and improves visibility for well-connected, non-central accommodations that are currently underrepresented.</p>
</div>
</div>
</section>
<section id="operational-costs" class="level2">
<h2 class="anchored" data-anchor-id="operational-costs">Operational costs</h2>
<p>From an engineering and analytics standpoint, this approach is lightweight.</p>
<ul>
<li><p>Transit times are precomputed.</p></li>
<li><p>Routing APIs are inexpensive.</p></li>
<li><p>Results are deterministic and predictable.</p></li>
</ul>
<p>There is no real-time inference, no complex infrastructure, and no hidden operational risk. Running costs are close to zero and, more importantly, easy to forecast.</p>
</section>
<section id="risks-and-second-order-effects" class="level2">
<h2 class="anchored" data-anchor-id="risks-and-second-order-effects">Risks and second-order effects</h2>
<p>Any ranking change shifts market dynamics.</p>
<ul>
<li><p>Central hotels may face stronger competition as their implicit advantage erodes.</p></li>
<li><p>Poorly connected hotels may lose visibility once transit time becomes explicit.</p></li>
</ul>
<p>These are not unintended consequences. The filter surfaces information that already affects the user experience, whether platforms acknowledge it or not.</p>
</section>
<section id="next-steps" class="level2">
<h2 class="anchored" data-anchor-id="next-steps">Next steps</h2>
<p>Several steps would be needed to move from concept to product:</p>
<ol type="1">
<li><p>User research to validate perceived value (e.g.&nbsp;using <a href="https://frequentist.org/posts/20240805-kano-model/" target="_blank">Kano method</a>).</p></li>
<li><p>Replicating the analysis across multiple cities.</p></li>
<li><p>A pilot implementation with a limited user segment.</p></li>
<li><p>A/B testing the impact on conversion, engagement, and satisfaction.</p></li>
</ol>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>Optimizing hotel search for transit time rather than distance offers a more accurate and user-centric way to help travelers make informed decisions. It reveals hidden structure in the city that distance-based filters miss, and it has the potential to improve user experience and marketplace balance with minimal operational cost. As travel platforms seek new ways to differentiate, this approach provides a compelling opportunity to enhance decision support and create real value for users.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some of my other posts related to product analytics, geospatial analysis, and data visualization:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="U2FsZXMlMkNTdHJhdGVneQ==" data-listing-date-sort="1774224000000" data-listing-file-modified-sort="1774302780174" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="855" data-listing-title-sort="Practical Concepts for AI Driven Sales Coaching" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260323-ai-sales-coaching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260323-ai-sales-coaching/index.html" class="title listing-title">Practical Concepts for AI Driven Sales Coaching</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="1" data-categories="S1BJJTIwRGVzaWduJTJDQkklMkNTdHJhdGVneQ==" data-listing-date-sort="1769558400000" data-listing-file-modified-sort="1772988050630" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2630" data-listing-title-sort="The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260128-seven-step-kpi-blueprint/index.html" class="title listing-title">The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3MlMkNS" data-listing-date-sort="1754524800000" data-listing-file-modified-sort="1759267192458" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1187" data-listing-title-sort="Minimum Detectable Effect (MDE) Calculation" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250807-mde/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250807-mde/index.html" class="title listing-title">Minimum Detectable Effect (MDE) Calculation</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3M=" data-listing-date-sort="1753747200000" data-listing-file-modified-sort="1770450295226" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="37" data-listing-word-count-sort="7205" data-listing-title-sort="A/B Testing: Concepts and Techniques" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250729-ab-testing/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250729-ab-testing/index.html" class="title listing-title">A/B Testing: Concepts and Techniques</a>
</td>
<td>
<span class="listing-reading-time">37 min</span>
</td>

</tr>

<tr data-index="4" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="5" data-categories="TWFya2V0aW5nJTJDUHJvZHVjdCUyQ1B5dGhvbg==" data-listing-date-sort="1722816000000" data-listing-file-modified-sort="1770626733464" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1892" data-listing-title-sort="Kano Method for Prioritization of Features" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240805-kano-model/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240805-kano-model/index.html" class="title listing-title">Kano Method for Prioritization of Features</a>
</td>
<td>
<span class="listing-reading-time">10 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Product</category>
  <category>Marketing</category>
  <category>Spatial</category>
  <category>Geospatial</category>
  <category>Strategy</category>
  <guid>https://frequentist.org/posts/20260207-transit-time-hotel-search/</guid>
  <pubDate>Sat, 07 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20260207-transit-time-hotel-search/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Sales Signals app (Agentic AI)</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/projects/app-sales-signals/</link>
  <description><![CDATA[ 






<section id="project-overview" class="level2">
<h2 class="anchored" data-anchor-id="project-overview">Project Overview</h2>
<p>The <a href="https://www.salessignals.app/" class="salessignals" target="_blank">Sales Signals</a> app analyzes sales conversations using LLMs enriched with long-term customer context via a specialized RAG pipeline. By synthesizing current calls with historical interactions, it delivers continuity-aware coaching that reflects relationship momentum rather than isolated incidents. The result is consistent, data-driven revenue enablement grounded in real-world commercial outcomes.</p>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Role
</div>
</div>
<div class="callout-body-container callout-body">
<p>End-to-end owner: system architecture, LLM orchestration, RAG-driven context injection, and interactive analytics dashboard.</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Tools
</div>
</div>
<div class="callout-body-container callout-body">
<p>FastAPI • Python • LLM APIs • Retrieval-Augmented Generation (RAG) • Altair Viz • Async Event Processing • PostgreSQL/SQLAlchemy</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Domain
</div>
</div>
<div class="callout-body-container callout-body">
<p>B2B Sales Analytics • Revenue Enablement • Conversational Analytics • Performance Management</p>
</div>
</div>
</section>
<section id="key-features-components" class="level2">
<h2 class="anchored" data-anchor-id="key-features-components">Key Features &amp; Components</h2>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="context-aware-coaching" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="context-aware-coaching">Context-Aware Coaching</h3>
<p>Generates targeted observations and next-step recommendations, dynamically adjusted based on retrieved historical client context and previous commitments.</p>
</section>
<section id="automated-periodic-synthesis" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="automated-periodic-synthesis">Automated Periodic Synthesis</h3>
<p>Orchestrates daily and operator-level “roll-up” summaries, distilling high-volume call data into actionable executive insights and individual performance trends.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="advanced-sales-kpi-tracking" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="advanced-sales-kpi-tracking">Advanced Sales KPI Tracking</h3>
<p>Calculates sophisticated metrics including <strong>Objection Intensity</strong>, <strong>Discovery Depth</strong>, <strong>Customer Sentiment</strong>, and <strong>Explicit Close Attempt Rates</strong> to identify behavioral gaps.</p>
</section>
<section id="historical-relationship-rag" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="historical-relationship-rag">Historical Relationship RAG</h3>
<p>Heuristic-driven retrieval of caller history ensures the LLM “remembers” previous objections, pricing discussions, and deal momentum.</p>
</section>
</div>
</div>
<div class="light-content">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="architecture.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="System architecture diagram"><img src="https://frequentist.org/projects/app-sales-signals/architecture.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%" alt="System architecture diagram"></a></p>
</figure>
</div>
<figcaption>System architecture diagram</figcaption>
</figure>
</div>
</div>
<div class="dark-content">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="architecture-dark.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="System architecture diagram"><img src="https://frequentist.org/projects/app-sales-signals/architecture-dark.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%" alt="System architecture diagram"></a></p>
</figure>
</div>
<figcaption>System architecture diagram</figcaption>
</figure>
</div>
</div>
</section>
<section id="implementation" class="level2">
<h2 class="anchored" data-anchor-id="implementation">Implementation</h2>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="bimodal-llm-orchestration" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="bimodal-llm-orchestration">Bimodal LLM Orchestration</h3>
<p>Lightweight models handle classification and routing, while higher-reasoning models generate nuanced coaching insights.</p>
</section>
<section id="asynchronous-pipeline" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="asynchronous-pipeline">Asynchronous Pipeline</h3>
<p>Event-driven architecture with FastAPI background tasks ensures seamless webhook ingestion from transcription providers.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="interactive-performance-dashboard" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="interactive-performance-dashboard">Interactive Performance Dashboard</h3>
<p>Custom Altair-based visualizations allow stakeholders to filter performance by operator, deal stage, and behavioral signals.</p>
</section>
<section id="automated-insights-delivery" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="automated-insights-delivery">Automated Insights Delivery</h3>
<p>Markdown-based coaching analyses are automatically rendered into HTML and distributed via email for immediate consumption by sales teams.</p>
</section>
</div>
</div>
</section>
<section id="outcomes-impact" class="level2">
<h2 class="anchored" data-anchor-id="outcomes-impact">Outcomes &amp; Impact</h2>
<div class="quarto-layout-panel" data-layout-ncol="3">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Near-Real-Time Feedback
</div>
</div>
<div class="callout-body-container callout-body">
<p>Coaching latency reduced from days to minutes, enabling faster behavioral adjustment.</p>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Churn Risk Detection
</div>
</div>
<div class="callout-body-container callout-body">
<p>Early identification of sentiment shifts, objections, and signals tied to retention risk.</p>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Scalable Sales Expertise
</div>
</div>
<div class="callout-body-container callout-body">
<p>Encodes senior-level B2B sales knowledge into a consistent, automated coaching framework.</p>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="skills-demonstrated" class="level2">
<h2 class="anchored" data-anchor-id="skills-demonstrated">Skills Demonstrated</h2>
<p>LLM Orchestration • Retrieval-Augmented Generation (RAG) • Context-Aware Synthesis • Sales &amp; Revenue Analytics • B2B Sales Methodology • Asynchronous API Design • Prompt Engineering • Data Visualization (Altair) • Multi-model LLM Architectures • Automated Insight Generation • End-to-End System Design</p>
</section>
<section id="apply-this-to-your-business" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="apply-this-to-your-business">Apply This to Your Business</h2>
<p>Want to scale sales coaching without it becoming generic? <a href="../../contact.html">Let’s talk</a> about how I help turn real sales conversations into actionable, context-aware insights.</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>Check out the <a href="https://www.salessignals.app/" class="salessignals" target="_blank">Sales Signals</a> app website.</p>
</div></div></section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<div id="listing-projects" class="quarto-listing quarto-listing-container-grid">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-categories="UGlwZWxpbmUlMkNMZWFkJTIwR2VuZXJhdGlvbiUyQ0RhdGElMjBFbmdpbmVlcmluZyUyQ1B5dGhvbg==" data-listing-date-sort="1777161600000" data-listing-file-modified-sort="1777217381211" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="288">
<a href="../../projects/report-european-environmental-companies/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/report-european-environmental-companies/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
European Environmental Company Intelligence Pipeline
</h5>
<div class="card-text listing-description delink">
<p>An automated pipeline that discovers EU-funded environmental companies, maps their C-suite leadership via LinkedIn, and delivers HubSpot-ready contact intelligence with…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="1" data-categories="V2ViYXBwJTJDU2hpbnklMjBmb3IlMjBQeXRob24lMkNQeXRob24lMkNGZWF0dXJlZA==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770399872011" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="431">
<a href="../../projects/webapp-linkedin-analytics/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/webapp-linkedin-analytics/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
LinkedIn Analytics Web Application
</h5>
<div class="card-text listing-description delink">
<p>A local-first web application that transforms LinkedIn Takeout exports into structured analytics on roles, industries, and geographic reach using NLP, unsupervised learning…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="2" data-categories="QXBwJTJDQWdlbnRpYyUyMEFJJTJDUHl0aG9u" data-listing-date-sort="1765584000000" data-listing-file-modified-sort="1767872588515" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="391">
<a href="../../projects/app-autonomous-career-agent/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/app-autonomous-career-agent/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Autonomous Career Agent (Agentic AI Application)
</h5>
<div class="card-text listing-description delink">
<p>A multi-agent AI system that automates the job search and application process, demonstrating LLM orchestration and autonomous agent patterns.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="3" data-categories="V2ViYXBwJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1748649600000" data-listing-file-modified-sort="1767873003569" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="481">
<a href="../../projects/webapp-content-mate/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/webapp-content-mate/image.svg" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
E-commerce Content Automation Platform
</h5>
<div class="card-text listing-description delink">
<p>A web application that automates the generation of e-commerce product cards using asynchronous pipelines and LLM-assisted content creation.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="4" data-categories="QkklMkNQb3dlciUyMEJJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1767886951027" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="462">
<a href="../../projects/bi-system-telecamera/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/bi-system-telecamera/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Company-Wide Business Intelligence System
</h5>
<div class="card-text listing-description delink">
<p>An end-to-end BI system consolidating operational, financial, marketing, and sales data into a single decision-support layer.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="5" data-categories="TGlicmFyeSUyQ1BhY2thZ2UlMkNQeXRob24=" data-listing-date-sort="1724284800000" data-listing-file-modified-sort="1767872644185" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="310">
<a href="../../projects/python-sophisthse/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/python-sophisthse/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Python Library for Russian Macroeconomic Time Series
</h5>
<div class="card-text listing-description delink">
<p>A Python package that simplifies access to Russian macroeconomic time-series data from the Higher School of Economics (HSE) sophist.hse.ru repository.</p>
</div>
</div>
</div></a>
</div>
</div>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>App</category>
  <category>Agentic AI</category>
  <category>Python</category>
  <category>Featured</category>
  <guid>https://frequentist.org/projects/app-sales-signals/</guid>
  <pubDate>Thu, 05 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/projects/app-sales-signals/image.png" medium="image" type="image/png" height="76" width="144"/>
</item>
<item>
  <title>The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>For a business intelligence specialist, it is important to remember that their real responsibility is not dashboards, charts, or even metrics, but decision quality.</p>
<p>A strong KPI system is the mechanism that connects strategy to execution through data. When KPIs are poorly designed, analytics turns into reporting theater. When they are designed well, analytics becomes a lever for business change.</p>
<p>In this article, I outline a seven-step KPI methodology from a practitioner’s perspective, grounded in real operational constraints.</p>
<p>These are the seven steps we follow:</p>
<ol type="1">
<li>Create objectives.</li>
<li>Describe results.</li>
<li>Identify measures.</li>
<li>Define thresholds.</li>
<li>Model structure and data.</li>
<li>Interpret results.</li>
<li>Drive action.</li>
</ol>
<p>This is not a rigid checklist. It describes a mature end state, though I often find that organizations operate in partial or iterative versions of this flow.</p>
</section>
<section id="create-objectives" class="level2">
<h2 class="anchored" data-anchor-id="create-objectives">1. Create Objectives</h2>
<p>This is where business intelligence analyst adds strategic value. The golden rule of KPI design I follow is simple:</p>
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>No KPI should exist unless it supports a clearly defined business objective.</p>
</div>
</div>
</div>
<p>From a data perspective, we can treat objectives as filters. They limit what deserves to be measured and prevent metrics overgrowth.</p>
<p>I look for three things in a good objective:</p>
<ul>
<li>It explicitly supports business strategy.</li>
<li>It is material enough to influence decisions.</li>
<li>It can be influenced by the organization (rather than external noise).</li>
</ul>
<p>This step may require pushing back if stakeholders request KPIs before objectives are articulated. I treat that as a signal to pause and reframe.</p>
<div class="callout callout-style-simple callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>The “so what?” test
</div>
</div>
<div class="callout-body-container callout-body">
<p>If a stakeholder requests a KPI, I would ask: “If this number drops by 20% tomorrow, what specific meeting gets called?” If they can’t answer, we are looking at <em>a metric</em>, not a KPI.</p>
</div>
</div>
</section>
<section id="describe-results" class="level2">
<h2 class="anchored" data-anchor-id="describe-results">2. Describe Results</h2>
<p>In this step, we translate strategy into observable outcomes.</p>
<p>A common pitfall I see in KPI design is confusing activities with results.</p>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-warning">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>“Launch a retention initiative” is an activity.</p>
</div>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>“Increase 90-day repeat purchase rate” is a result.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<p>I aim to describe results in language that is:</p>
<ul>
<li>Outcome-oriented.</li>
<li>Free of vague terms like <em>optimized</em>, <em>improved</em>, or <em>efficient</em>.</li>
<li>Observable and interpretable in the real world.</li>
</ul>
<p>For analytics specialists, this step is critical because ambiguous results lead to ambiguous measures. If we cannot clearly observe success, we cannot reliably model it.</p>
<div class="cell" data-file="mermaid/kpi-hierarchy.mmd" data-layout-align="default">
<div class="cell-output-display">
<div id="fig-hierarchy" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-hierarchy-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div>
<pre class="mermaid mermaid-js" data-label="fig-hierarchy">graph TD
    subgraph Strategic ["Level 1: Strategic"]
        A[Net Profit / ARR]
    end

    subgraph Tactical ["Level 2: Tactical"]
        B1[Customer Acquisition Cost]
        B2[Churn Rate]
        B3[Average Order Value]
    end

    subgraph Operational ["Level 3: Operational"]
        C1[Ad Spend / Clicks]
        C2[Support Ticket Vol]
        C3[Discount Usage]
        C4[Landing Page Conv %]
    end

    A --- B1
    A --- B2
    A --- B3

    B1 --- C1
    B1 --- C4
    B2 --- C2
    B3 --- C3

    style Strategic fill:#F8CECC,stroke:#333,stroke-width:2px
    style Tactical fill:#E1D5E7,stroke:#333,stroke-width:1px
    style Operational fill:#D5E8D4,stroke:#333,stroke-width:1px
</pre>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-hierarchy-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: KPI hierarchy
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="identify-measures" class="level2">
<h2 class="anchored" data-anchor-id="identify-measures">3. Identify Measures</h2>
<p>I believe a KPI should be expressible in one sentence containing countable entities. At this stage, I explicitly consider Lead vs.&nbsp;Lag vs.&nbsp;Diagnostic indicators.</p>
<div id="tbl-lag-lead-diagnostic" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-lag-lead-diagnostic-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;1: A robust KPI system combines all three types of indicators.
</figcaption>
<div aria-describedby="tbl-lag-lead-diagnostic-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Category</th>
<th style="text-align: left;">Indicator</th>
<th style="text-align: left;">Metric Example</th>
<th style="text-align: left;">BI Value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><strong>Lag</strong></td>
<td style="text-align: left;">Outcome</td>
<td style="text-align: left;">Annual Recurring Revenue (ARR)</td>
<td style="text-align: left;">Confirms what happened.</td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Lead</strong></td>
<td style="text-align: left;">Predictive</td>
<td style="text-align: left;">Product Qualified Leads (PQLs)</td>
<td style="text-align: left;">Predicts future ARR.</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Diagnostic</strong></td>
<td style="text-align: left;">Root Cause</td>
<td style="text-align: left;">Feature Adoption Rate</td>
<td style="text-align: left;">Explains <em>why</em> PQLs dropped.</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<section id="measure-quality" class="level3">
<h3 class="anchored" data-anchor-id="measure-quality">Measure Quality</h3>
<p>Candidate measures can be evaluated based on:</p>
<ul>
<li>Alignment with the objective.</li>
<li>Business relevance.</li>
<li>Data availability and reliability.</li>
</ul>
</section>
<section id="ownership-and-definition" class="level3">
<h3 class="anchored" data-anchor-id="ownership-and-definition">Ownership and Definition</h3>
<p>I’ve found that every KPI requires three factors to survive:</p>
<ul>
<li><strong>A business owner</strong> (accountable for outcomes).</li>
<li><strong>A data owner</strong> (responsible for logic and updates).</li>
<li><strong>A stable, explicit formula.</strong></li>
</ul>
<p>From a BI perspective, unclear ownership and changing definitions as bigger risks than imperfect data.</p>
</section>
</section>
<section id="define-thresholds" class="level2">
<h2 class="anchored" data-anchor-id="define-thresholds">4. Define Thresholds</h2>
<p>A KPI without a threshold is just a data point; a KPI with a threshold is a <strong>call to action</strong>. For an analyst, the challenge is defining “Normal” vs.&nbsp;“Critical” without relying on gut feeling.</p>
<section id="beyond-static-targets" class="level3">
<h3 class="anchored" data-anchor-id="beyond-static-targets">Beyond Static Targets</h3>
<p>Many organizations use “flat” thresholds (e.g., <em>Red if Sales &lt; $100k</em>). However, businesses are rarely static. I prefer a more mature BI approach:</p>
<section id="historical-baselines" class="level4">
<h4 class="anchored" data-anchor-id="historical-baselines">Historical Baselines</h4>
<p>Comparing performance against a rolling average or the same period last year (YoY) to account for seasonality.</p>
</section>
<section id="statistical-process-control-spc" class="level4">
<h4 class="anchored" data-anchor-id="statistical-process-control-spc">Statistical Process Control (SPC)</h4>
<p>Using standard deviations of the mean to define “Natural Variation”. If a metric falls within 1..2 SD, it’s a noise. If it crosses the 3rd, it’s a signal.</p>
<div class="cell">
<div class="cell-output-display">
<div id="fig-thresholds" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-thresholds-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/index_files/figure-html/fig-thresholds-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-thresholds-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: XmR chart: separating signal from noise
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="dynamic-thresholds" class="level4">
<h4 class="anchored" data-anchor-id="dynamic-thresholds">Dynamic Thresholds</h4>
<p>Adjusting targets based on external variables (e.g., lower conversion targets during a known website migration).</p>
<div class="cell">
<div class="cell-output-display">
<div id="fig-dynamic-thresholds" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-dynamic-thresholds-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/index_files/figure-html/fig-dynamic-thresholds-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-dynamic-thresholds-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Dynamic thresholds vs.&nbsp;static targets
</figcaption>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="rag-model" class="level3">
<h3 class="anchored" data-anchor-id="rag-model">RAG Model</h3>
<p>The most common approach is the RAG model (Red, Amber, Green), where specific values are set to trigger a status change:</p>
<ul>
<li>Green: An acceptable result or on-target performance.</li>
<li>Amber: A warning sign that requires investigation.</li>
<li>Red: An unacceptable result requiring rectification.</li>
</ul>
<div id="tbl-rag" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-rag-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;2: An example of the RAG model implementation.
</figcaption>
<div aria-describedby="tbl-rag-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Status</th>
<th style="text-align: left;">Threshold Logic</th>
<th style="text-align: left;">BI Implementation</th>
<th style="text-align: left;">Business Action</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><strong>Green</strong></td>
<td style="text-align: left;"><img src="https://latex.codecogs.com/png.latex?%3E%2095%5C%25"> of Target</td>
<td style="text-align: left;">Automated “Good News” report</td>
<td style="text-align: left;">Maintain current strategy</td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Amber</strong></td>
<td style="text-align: left;"><img src="https://latex.codecogs.com/png.latex?80%5C%25%20-%2095%5C%25"> (Warning)</td>
<td style="text-align: left;">Trendline analysis &amp; breakdown</td>
<td style="text-align: left;">Investigate root cause</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Red</strong></td>
<td style="text-align: left;"><img src="https://latex.codecogs.com/png.latex?%3C%2080%5C%25"> (Critical)</td>
<td style="text-align: left;">Real-time Slack/Email alert</td>
<td style="text-align: left;">Immediate tactical intervention</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="normalization" class="level3">
<h3 class="anchored" data-anchor-id="normalization">Normalization</h3>
<p>When we have 50 different KPIs with different units, we cannot roll them up into a “Health Score” unless we normalize them.</p>
<p>By converting every KPI into a percentage of its target (<img src="https://latex.codecogs.com/png.latex?Actual%20/%20Target">), we can create a <strong>Weighted Health Index</strong>. This allows a CEO to see a single “Operations Score” mathematically derived from all of underlying metrics.</p>
<div class="callout callout-style-simple callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Benchmark traps
</div>
</div>
<div class="callout-body-container callout-body">
<p>I would recommend avoiding industry benchmarks because they are averages of companies we aren’t competing with. I prefer thresholds derived from our specific unit economics and historical capability.</p>
</div>
</div>
</section>
</section>
<section id="model-structure-and-data" class="level2">
<h2 class="anchored" data-anchor-id="model-structure-and-data">5. Model Structure and Data</h2>
<p>This is where we see many KPI initiatives fail. Inexperienced teams build KPIs directly inside a visualization tool instead of building them into the <strong>data architecture</strong>.</p>
<section id="semantic-layer-vs.-visualization-layer" class="level3">
<h3 class="anchored" data-anchor-id="semantic-layer-vs.-visualization-layer">Semantic Layer vs.&nbsp;Visualization Layer</h3>
<p>To provide a <strong>Single Source of Truth</strong>, I decouple logic from the dashboard. Whether using dbt (Semantic Layer), Looker (LookML), or Power BI (Tabular Models), the goal is to define the metric <strong>once</strong> in code and reference it <strong>everywhere</strong>.</p>
</section>
<section id="key-architectural-requirements" class="level3">
<h3 class="anchored" data-anchor-id="key-architectural-requirements">Key Architectural Requirements</h3>
<ul>
<li><strong>Granularity and Grain:</strong> We must define the lowest level of detail the KPI can be sliced by. If the grain is inconsistent across the model, our KPI aggregations will be wrong.</li>
<li><strong>History and Snapshots:</strong> I determine if our model needs to support <em>point-in-time</em> reporting versus just showing the current state.</li>
<li><strong>The KPI Hierarchy:</strong> I structure data to support a <em>Drill-Down</em> path:
<ul>
<li><strong>Level 1 (Executive):</strong> The North Star KPI (e.g., Total Revenue).</li>
<li><strong>Level 2 (Operational):</strong> The drivers (e.g., Average Order Value).</li>
<li><strong>Level 3 (Diagnostic):</strong> The raw attributes (e.g., Discount Code usage).</li>
</ul></li>
</ul>
</section>
<section id="technical-governance" class="level3">
<h3 class="anchored" data-anchor-id="technical-governance">Technical Governance</h3>
<p>We must ensure every KPI in the model is accompanied by:</p>
<ol type="1">
<li><strong>The SQL/Code definition:</strong> e.g. <code>SUM(net_revenue) / NULLIF(COUNT(DISTINCT user_id), 0)</code>.</li>
<li><strong>Update Frequency:</strong> Clearly defined as real-time, hourly, or daily.</li>
<li><strong>Upstream Lineage:</strong> I map exactly which raw tables feed the KPI so I can perform impact analysis when a source system changes.</li>
</ol>
<div class="callout callout-style-simple callout-warning">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>If your KPI logic lives in a hidden calculated field inside a specific dashboard, you haven’t built a framework; you’ve built a technical debt trap.</p>
</div>
</div>
</div>
</section>
</section>
<section id="interpret-results" class="level2">
<h2 class="anchored" data-anchor-id="interpret-results">6. Interpret Results</h2>
<p>Interpretation is where we move from being a “data provider” to a “business partner.” Most dashboards fail because they show the <em>what</em> but leave the <em>why</em> as an exercise for the reader.</p>
<section id="variance-analysis" class="level3">
<h3 class="anchored" data-anchor-id="variance-analysis">Variance Analysis</h3>
<p>To interpret results, there is a <strong>Variance Analysis</strong>. If Revenue is down, we check if it is because we sold fewer units (<strong>Volume Variance</strong>) or because we sold them at a lower price (<strong>Price/Mix Variance</strong>).</p>
</section>
<section id="three-pillars-of-interpretation" class="level3">
<h3 class="anchored" data-anchor-id="three-pillars-of-interpretation">Three Pillars of Interpretation:</h3>
<ol type="1">
<li><strong>Contextual Benchmarking:</strong> Never present a number in isolation.
<ul>
<li><em>Bad:</em> “Churn is 5%.”</li>
<li><em>Good:</em> “Churn is 5%, which is a 12% increase MoM, primarily driven by the Enterprise segment.”</li>
</ul></li>
<li><strong>Cohort Analysis:</strong> Aggregates lie. A KPI might look stable while a specific cohort is collapsing. Always look beneath the surface to see if a segment is skewing the average.</li>
<li><strong>Correlation vs.&nbsp;Causality:</strong> I use BI tools to overlay external events. Did the dip in “Engagement” happen exactly when the new UI was deployed? This helps us transform a correlation into a testable hypothesis.</li>
</ol>
<div class="cell">
<div class="cell-output-display">
<div id="fig-cohorts" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-cohorts-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/index_files/figure-html/fig-cohorts-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-cohorts-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Cohort retention heatmap: revealing structural changes
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="avoiding-reporting-theater" class="level3">
<h3 class="anchored" data-anchor-id="avoiding-reporting-theater">Avoiding “Reporting Theater”</h3>
<p>I’ve seen teams spend hours explaining tiny fluctuations. My solution is <strong>Exception Reporting</strong>: I build views that only highlight KPIs that have breached their thresholds (using <strong>XmR charts</strong> to distinguish noise from signals). This forces the conversation to stay focused on what actually requires attention.</p>
<div class="callout callout-style-simple callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Analyst’s Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>My goal is to reduce the “Time to Insight.” If a stakeholder has to click five filters to understand why a KPI is red, my interpretation layer has failed.</p>
</div>
</div>
</section>
</section>
<section id="drive-action" class="level2">
<h2 class="anchored" data-anchor-id="drive-action">7. Drive Action</h2>
<p>The final stage is ensuring the data actually changes the trajectory of the business. I aim to move from <strong>Passive Monitoring</strong> to <strong>Active Orchestration</strong>.</p>
<section id="linking-metrics-to-decision-rights" class="level3">
<h3 class="anchored" data-anchor-id="linking-metrics-to-decision-rights">Linking Metrics to Decision Rights</h3>
<p>A KPI framework only works if there is an agreement on who acts when a threshold is breached. I facilitate this by embedding “Action Triggers” in our reporting:</p>
<ul>
<li><strong>Remedial Actions:</strong> Short-term “fixes” (e.g., “If inventory falls below X, trigger a reorder”).</li>
<li><strong>Strategic Pivots:</strong> Long-term shifts (e.g., “If CAC stays above LTV for two quarters, we re-evaluate the channel mix”).</li>
</ul>
</section>
<section id="tracking-the-action-roi" class="level3">
<h3 class="anchored" data-anchor-id="tracking-the-action-roi">Tracking the “Action ROI”</h3>
<p>I make it a point to <strong>measure the impact of the actions taken.</strong> We shouldn’t just report that a KPI went from Red to Green. I create analysis to track if the “Retention Initiative” actually caused the move, or if it was just seasonal noise.</p>
</section>
<section id="building-the-decision-log" class="level3">
<h3 class="anchored" data-anchor-id="building-the-decision-log">Building the “Decision Log”</h3>
<p>I am seeing more teams move toward “Decision Intelligence.” We keep a log of actions taken in response to KPI signals. Over time, this allows us to:</p>
<ul>
<li>Evaluate the effectiveness of past decisions.</li>
<li>Onboard new leaders by showing them the “playbook”.</li>
<li>Fine-tune thresholds based on whether past “Red” alerts actually required intervention.</li>
</ul>
<div class="callout callout-style-simple callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Analyst’s Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Success isn’t measured by dashboard views, but by how many business decisions were influenced by the data.</p>
</div>
</div>
</section>
</section>
<section id="kpi-definition-template" class="level2">
<h2 class="anchored" data-anchor-id="kpi-definition-template">KPI Definition Template</h2>
<p>To put it all together, here is the template for documenting every core metric in a semantic layer. This ensures that the logic is transparent and the accountability is clear.</p>
<div id="tbl-kpi-definition-template" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-kpi-definition-template-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;3: KPI definition template for consistent documentation.
</figcaption>
<div aria-describedby="tbl-kpi-definition-template-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<tbody>
<tr class="odd">
<td style="text-align: left;"><strong>Section</strong></td>
<td style="text-align: left;"><strong>Field</strong></td>
<td style="text-align: left;"><strong>Description / Example</strong></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>1. Strategic Context</strong></td>
<td style="text-align: left;"><strong>Business Objective</strong></td>
<td style="text-align: left;"><em>e.g., Increase long-term customer value.</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>The “So What?”</strong></td>
<td style="text-align: left;"><em>If this drops 15%, we pause ad spend and audit the onboarding funnel.</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>2. Definition</strong></td>
<td style="text-align: left;"><strong>KPI Name</strong></td>
<td style="text-align: left;"><em>e.g., 90-Day Repeat Purchase Rate</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Indicator Type</strong></td>
<td style="text-align: left;"><em>Lead / Lag / Diagnostic</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Formula (Plain English)</strong></td>
<td style="text-align: left;"><em>(Customers with &gt;1 order in 90 days) / (Total customers acquired in period)</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>3. Technical Logic</strong></td>
<td style="text-align: left;"><strong>SQL / Code Snippet</strong></td>
<td style="text-align: left;"><code>COUNT(DISTINCT CASE WHEN order_count &gt; 1...)</code></td>
</tr>
<tr class="even">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Data Grain</strong></td>
<td style="text-align: left;"><em>Daily by Region, Category, and Customer Segment.</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Update Frequency</strong></td>
<td style="text-align: left;"><em>Daily (T+1)</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>4. Thresholds</strong></td>
<td style="text-align: left;"><strong>Green (Healthy)</strong></td>
<td style="text-align: left;"><em>Baseline + 5% (moving average)</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Amber (Warning)</strong></td>
<td style="text-align: left;"><em>Within 2 Standard Deviations of Mean</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Red (Critical)</strong></td>
<td style="text-align: left;"><em>Outside 3 Standard Deviations (XmR Signal)</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>5. Governance</strong></td>
<td style="text-align: left;"><strong>Business Owner</strong></td>
<td style="text-align: left;"><em>VP of Marketing (Accountable for the outcome)</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Data Owner</strong></td>
<td style="text-align: left;"><em>BI Team / Aleksei (Responsible for logic integrity)</em></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>6. Interpretation</strong></td>
<td style="text-align: left;"><strong>Common Variances</strong></td>
<td style="text-align: left;"><em>Is fluctuations driven by “Mix” (new vs.&nbsp;old users) or “Volume”?</em></td>
</tr>
<tr class="even">
<td style="text-align: left;"></td>
<td style="text-align: left;"><strong>Action Trigger</strong></td>
<td style="text-align: left;"><em>If Red: Notify CRM team to launch Win-back sequence.</em></td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<p>Download printable table: <a href="../../posts/20260128-seven-step-kpi-blueprint/kpi-definition-template.pdf" target="_blank">kpi-definition-template.pdf</a></p>
</section>
<section id="final-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="final-thoughts">Final Thoughts</h2>
<p>KPI design is not a reporting task but a form of systems design. A strong framework creates alignment between strategy, data, and decisions.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some of my other posts related to Business Intelligence, KPI design, and data visualization:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="U2FsZXMlMkNTdHJhdGVneQ==" data-listing-date-sort="1774224000000" data-listing-file-modified-sort="1774302780174" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="855" data-listing-title-sort="Practical Concepts for AI Driven Sales Coaching" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260323-ai-sales-coaching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260323-ai-sales-coaching/index.html" class="title listing-title">Practical Concepts for AI Driven Sales Coaching</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="1" data-categories="UHJvZHVjdCUyQ01hcmtldGluZyUyQ1NwYXRpYWwlMkNHZW9zcGF0aWFsJTJDU3RyYXRlZ3k=" data-listing-date-sort="1770422400000" data-listing-file-modified-sort="1770660370665" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2663" data-listing-title-sort="Using Transit Time to Rethink Hotel Search" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260207-transit-time-hotel-search/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260207-transit-time-hotel-search/index.html" class="title listing-title">Using Transit Time to Rethink Hotel Search</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="4" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QkklMkNFVEw=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1769683733598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="397" data-listing-title-sort="BI System Blueprint" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250106-bi-flowchart/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250106-bi-flowchart/index.html" class="title listing-title">BI System Blueprint</a>
</td>
<td>
<span class="listing-reading-time">2 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>KPI Design</category>
  <category>BI</category>
  <category>Strategy</category>
  <guid>https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/</guid>
  <pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Building a Privacy-First LinkedIn Analytics Platform</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20251227-linkedin-analytics/</link>
  <description><![CDATA[ 






<section id="misaligned-objective-function" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="misaligned-objective-function">Misaligned Objective Function</h2>
<p>LinkedIn’s recommendation system might be an engineering masterpiece, but its objective function is fundamentally misaligned with mine. From the engineering point of view, every system optimizes for a specific target variable. LinkedIn’s algorithms solve for engagement metrics like time on site, scroll depth, and ad impressions. They prioritize content that triggers a reaction, often at the expense of substance.</p>
<p>In contrast, I try solve for meaningful connection. My goal is to foster genuine professional alignment, identify mentorship opportunities, and strengthen ties within specific industry clusters.</p>
<p>When two optimization problems have different loss functions, they inevitably yield different results. Relying solely on the platform’s feed means delegating your professional network’s growth to an algorithm designed to keep you addicted, not necessarily to help you succeed. I decided I wanted to choose who to interact with deliberately, moving from a passive consumer of a feed to an active architect of my network.</p>
<p>To achieve this, I started building a tool that takes the LinkedIn™ data export and transforms it into an intelligence layer. It turns that raw CSV data into a clear, visual overview of professional relationships, helping users:</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>Explore the application:</p>
<p><a href="https://linked.frequentist.org/" target="_blank">My Professional Network</a></p>
</div></div><ul>
<li>See who they’re connected with through interactive visualizations.</li>
<li>Identify meaningful professional clusters using unsupervised learning.</li>
<li>Prioritize important contacts who may have slipped through the cracks.</li>
<li>Clean up outdated connections that add noise to the signal.</li>
<li>Discover new people worth engaging with based on geographic and professional proximity.</li>
</ul>
<p>This article details how I architected this solution using Python, Shiny, and a local inference stack to bring observability and agency back to professional networking.</p>
</section>
<section id="microservices-approach" class="level2">
<h2 class="anchored" data-anchor-id="microservices-approach">Microservices Approach</h2>
<p>Building a local-first application that handles data munging, heavy UI rendering, and asynchronous secondary data fetching requires a robust architecture. I opted for a microservices-based approach orchestrated via <strong>Docker Compose</strong>. This ensures that the ML inference and data scraping doesn’t starve the web server of resources.</p>
<section id="high-level-system-design" class="level3">
<h3 class="anchored" data-anchor-id="high-level-system-design">High-Level System Design</h3>
<p>The architecture is built on five pillars:</p>
<ol type="1">
<li><p><strong>Orchestration:</strong> Nginx serves as a reverse proxy and load balancer.</p></li>
<li><p><strong>UI Layer:</strong> Shiny for Python handles the reactive frontend.</p></li>
<li><p><strong>Database:</strong> PostgreSQL stores connection metadata, parsed profiles, and transaction logs.</p></li>
<li><p><strong>Asynchronous Workers:</strong> Celery + Redis handle long-running tasks like profile scraping and geocoding.</p></li>
<li><p><strong>Local Inference Service:</strong> A dedicated Text Embeddings Inference (TEI) service runs locally to provide high-performance vector representation of professional data.</p></li>
</ol>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20251227-linkedin-analytics/architecture.svg" class="img-fluid figure-img"></p>
<figcaption>Application architecture</figcaption>
</figure>
</div>
</section>
<section id="why-shiny-for-python" class="level3">
<h3 class="anchored" data-anchor-id="why-shiny-for-python">Why Shiny for Python?</h3>
<p>Choosing the right frontend framework was critical. Traditional SPAs (Single Page Applications) often feel disconnected from the data science lifecycle. Shiny for Python bridges this gap.</p>
<div class="quarto-layout-panel" data-layout-ncol="3">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Reactive Programming
</div>
</div>
<div class="callout-body-container callout-body">
<p>I can define dependencies between UI filters (like geography or industry clusters) and the underlying data without manually managing state in JavaScript.</p>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Native Integration
</div>
</div>
<div class="callout-body-container callout-body">
<p>Since the logic stays in Python, I can directly call onto Pandas, Scikit-learn, and ML services without building REST APIs for every filter action.</p>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Enterprise-Grade UI
</div>
</div>
<div class="callout-body-container callout-body">
<p>By applying curated themes and vanilla CSS, I was able to create a refined aesthetic while preserving the performance of an efficient website.</p>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="decoupling-with-celery-and-redis" class="level3">
<h3 class="anchored" data-anchor-id="decoupling-with-celery-and-redis">Decoupling with Celery and Redis</h3>
<p>One of the requirements for this project was “No-Hang UI”. Parsing thousands of profiles or geocoding hundreds of locations can take minutes or even hours. If these were synchronous calls, the application would be unusable.</p>
<p>I implemented a robust Celery task group (chord/group pattern). When a user uploads their ZIP, the system immediately recognizes which profiles are missing enriched data. It spins up a chord of tasks:</p>
<ul>
<li><p><strong>Trigger</strong>: Dispatch batches of URLs to the scraper.</p></li>
<li><p><strong>Poll</strong>: Periodically check for result readiness.</p></li>
<li><p><strong>Finalize</strong>: Once all batches are parsed, update the database and notify the UI via a reactive signal.</p></li>
</ul>
<p>This separation of concerns ensures that the web server remains responsive to user interactions while the background workers handle the heavy I/O and processing.</p>
</section>
</section>
<section id="technical-highlights" class="level2">
<h2 class="anchored" data-anchor-id="technical-highlights">Technical Highlights</h2>
<section id="semantic-understanding-of-job-titles" class="level3">
<h3 class="anchored" data-anchor-id="semantic-understanding-of-job-titles">Semantic Understanding of Job Titles</h3>
<p>The core value of this application lies in its ability to turn various forms of job titles into meaningful groupings.</p>
<div class="callout callout-style-simple callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Local ML Inference with Vector Embeddings
</div>
</div>
<div class="callout-body-container callout-body">
<p>The very first implementation used TF/IDF vectorization combined with KMeans clustering. While this provided a basic grouping, it failed to capture the semantic nuances. For example, “Data Scientist” and “ML Engineer” would be placed in separate clusters despite their close relationship.</p>
<p>To achieve semantic understanding, I moved toward <strong>Vector Embeddings</strong>. Instead of treating words as discrete tokens, I represent each job title as a 768-dimensional vector in a continuous semantic space.</p>
</div>
</div>
<p>Instead of relying on costly external APIs (like OpenAI), I hosted a local Text Embeddings Inference (TEI) service using Hugging Face’s <code>huggingface/text-embeddings-inference</code> container. This provides:</p>
<ul>
<li><p><strong>Privacy</strong>: No professional data ever leaves the local environment.</p></li>
<li><p><strong>Zero Latency/Cost</strong>: High-speed inference without per-token billing.</p></li>
<li><p><strong>Semantic Accuracy</strong>: Using multilingual-mpnet-base-v2, the system handles professional jargon across multiple languages.</p></li>
</ul>
<p>To maintain performance, I implemented a multi-layered caching strategy:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">@lru_cache</span>(maxsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1024</span>)</span>
<span id="cb1-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_embedding(text: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>]:</span>
<span id="cb1-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> text.strip():</span>
<span id="cb1-4">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">768</span>)]</span>
<span id="cb1-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb1-6">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.post(</span>
<span id="cb1-7">            TEXT_EMBEDDINGS_URL,</span>
<span id="cb1-8">            json<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"inputs"</span>: [text]},</span>
<span id="cb1-9">        )</span>
<span id="cb1-10">        response.raise_for_status()</span>
<span id="cb1-11">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> response.json()[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb1-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">Exception</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> e:</span>
<span id="cb1-13">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fallback to zero-vector or log error</span></span>
<span id="cb1-14">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">768</span>)]</span></code></pre></div></div>
<p>This caching ensures that common titles (e.g., “Founder”, “Engineer”) only hit the inference service once per session.</p>
</section>
<section id="unsupervised-clustering-pipeline" class="level3">
<h3 class="anchored" data-anchor-id="unsupervised-clustering-pipeline">Unsupervised Clustering Pipeline</h3>
<p>Once we have vectors for every connection, the task is to group them. I built a pipeline that combines Latent Semantic Analysis (LSA) for dimensionality reduction and KMeans for clustering.</p>
<p>The pipeline looks like this:</p>
<ul>
<li><p><strong>Preprocessing</strong>: Normalize job titles and resolve common abbreviations (e.g., “Sr.” to “Senior”).</p></li>
<li><p><strong>LSA (TruncatedSVD + Normalizer)</strong>: Reduces the 768 dimensions to a denser representation (96 components), focusing on the most significant semantic variance and reducing computational overhead for the clustering step.</p></li>
<li><p><strong>KMeans Clustering</strong>: Groups the denser vectors into <img src="https://latex.codecogs.com/png.latex?N"> clusters.</p></li>
</ul>
<p>A key challenge was dynamic cluster naming. An unsupervised model only gives you cluster IDs (e.g., Cluster #4), which are useless for a user. I implemented a heuristic that automatically names each cluster by identifying the most frequently occurring job title closest to the cluster’s centroid:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Identifying the representative name for Cluster #X</span></span>
<span id="cb2-2">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PosFreq"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Position"</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cluster"</span>].transform(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"count"</span>)</span>
<span id="cb2-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter and sort to find the dominant title that isn't too short</span></span>
<span id="cb2-4">positions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.query(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PosFreq == MaxFreq"</span>).drop_duplicates(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cluster"</span>)</span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The resulting Cluster Name provides context like "Engineering Manager" or "Product Design"</span></span></code></pre></div></div>
</section>
<section id="geocoding-at-scale" class="level3">
<h3 class="anchored" data-anchor-id="geocoding-at-scale">Geocoding at Scale</h3>
<p>Finally, to power the geographical groupings and visualizations, I integrated a geocoding service. To keep the system efficient, I implemented a persistence-layer cache. Instead of geocoding every connection’s location string individually, I maintain a locations table. The Celery worker only hits the Google Maps Geocoding API for location strings that haven’t been resolved before, significantly reducing API usage and improving processing speed for subsequent data uploads.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20251227-linkedin-analytics/geography.png" class="img-fluid figure-img" width="640"></p>
<figcaption>Geographic locations of connections are displayed as grouped sections in a treemap and also shown on a world map</figcaption>
</figure>
</div>
</section>
</section>
<section id="system-robustness" class="level2">
<h2 class="anchored" data-anchor-id="system-robustness">System Robustness</h2>
<p>An engineer’s work is defined not just by the “happy path”, but by how the system handles state, persistence, and deployment.</p>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="data-integrity" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="data-integrity">Data Integrity</h3>
<p>I utilized PostgreSQL to manage the application’s state. This allows for complex relation management (e.g., linking a Connection to their Geolocation and their Clusters). To handle the “Generative Credits” system, I implemented atomic transactions to ensure that balance deductions only occur after a successful profile parse, preventing data inconsistency.</p>
</section>
<section id="local-observability" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="local-observability">Local Observability</h3>
<p>Running a complex stack locally can be a “black box” challenge. By utilizing Docker Compose, I centralized log management and service health checks. If the Text Embeddings Inference (TEI) service or the Redis broker fails, the Nginx load balancer and Shiny app provide immediate feedback rather than obscure Python stack traces.</p>
</section>
</div>
</div>
</section>
<section id="future-roadmap" class="level2">
<h2 class="anchored" data-anchor-id="future-roadmap">Future Roadmap</h2>
<p>RAG (Retrieval-Augmented Generation): By indexing the parsed profiles into a vector database (like Chroma or Faiss), I could implement a private LLM chat interface to ask questions like: “Who in my network has experience with Kubernetes who been active in the last months?”</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>The <a href="https://linked.frequentist.org" target="_blank">My Professional Network</a> project is more than just a dashboard; it’s a prototype for data agency. By moving the analytics layer from a centralized platform to a local, user-controlled environment, we reclaim the ability to navigate our professional lives with intent.</p>
<p>By combining modern frontend reactivity (Shiny), asynchronous infrastructure (Celery/Redis), and local ML inference (TEI), I’ve built a system that respects privacy while delivering the kind of “superpowers” usually reserved for big-tech internal tools.</p>
<p>In the end, your network is your most valuable professional asset — it’s time you had the tools to actually manage it.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Below are some of my other posts related to building applications, natural language processing, and data visualization:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="QWdlbnRzJTJDTExNJTJDQXBwJTJDUHl0aG9u" data-listing-date-sort="1765584000000" data-listing-file-modified-sort="1773058068416" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1393" data-listing-title-sort="Agentic vs Deterministic Workflows: Designing a Reliable AI Application" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251213-agentic-vs-deterministic/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251213-agentic-vs-deterministic/index.html" class="title listing-title">Agentic vs Deterministic Workflows: Designing a Reliable AI Application</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="1" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="4" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QXBwJTJDTExNJTJDUHl0aG9u" data-listing-date-sort="1748649600000" data-listing-file-modified-sort="1767875135219" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="1" data-listing-word-count-sort="110" data-listing-title-sort="Product Cards Creation Application" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250531-content-mate/mark-konig-Tl8mDaue_II-unsplash_square.jpg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250531-content-mate/index.html" class="title listing-title">Product Cards Creation Application</a>
</td>
<td>
<span class="listing-reading-time">1 min</span>
</td>

</tr>

<tr data-index="6" data-categories="TkxQJTJDUHl0aG9u" data-listing-date-sort="1746230400000" data-listing-file-modified-sort="1750795873147" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1111" data-listing-title-sort="Creating Anki Flashcards From List of Words" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250503-anki-part-1/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250503-anki-part-1/index.html" class="title listing-title">Creating Anki Flashcards From List of Words</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="7" data-categories="UkFHJTJDTkxQJTJDTExNJTJDUHl0aG9u" data-listing-date-sort="1742515200000" data-listing-file-modified-sort="1748974609635" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2344" data-listing-title-sort="Implementing a Local Retrieval-Augmented Generation System" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250321-rag/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250321-rag/index.html" class="title listing-title">Implementing a Local Retrieval-Augmented Generation System</a>
</td>
<td>
<span class="listing-reading-time">12 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>App</category>
  <category>Visualization</category>
  <category>NLP</category>
  <category>Python</category>
  <guid>https://frequentist.org/posts/20251227-linkedin-analytics/</guid>
  <pubDate>Sat, 27 Dec 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20251227-linkedin-analytics/image.png" medium="image" type="image/png" height="77" width="144"/>
</item>
<item>
  <title>LinkedIn Analytics Web Application</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/projects/webapp-linkedin-analytics/</link>
  <description><![CDATA[ 






<section id="project-overview" class="level2">
<h2 class="anchored" data-anchor-id="project-overview">Project Overview</h2>
<p><strong>My Professional Network</strong> is a local-first privacy-preserving analytics platform that transforms raw LinkedIn Takeout data into an interactive dashboard. The project demonstrates full-stack data science skills by combining ETL, NLP, unsupervised learning, and asynchronous web architecture to help users understand and strategically grow their professional networks.</p>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Role
</div>
</div>
<div class="callout-body-container callout-body">
<p>Full-Stack Data Scientist / Systems Engineer</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Domain
</div>
</div>
<div class="callout-body-container callout-body">
<p>Network Analysis, NLP, Personal Analytics</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Tools
</div>
</div>
<div class="callout-body-container callout-body">
<p>Python, Shiny for Python, PostgreSQL, Redis, Celery, Docker, Hugging Face embeddings, Google Maps API</p>
</div>
</div>
</section>
<section id="key-features-problems-solved" class="level2">
<h2 class="anchored" data-anchor-id="key-features-problems-solved">Key Features &amp; Problems Solved</h2>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="personal-network-intelligence-from-raw-platform-exports" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="personal-network-intelligence-from-raw-platform-exports">Personal network intelligence from raw platform exports</h3>
<p>Converts static archives into structured data without relying on third-party SaaS or cloud lock-in.</p>
</section>
<section id="semantic-role-clustering" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="semantic-role-clustering">Semantic role clustering</h3>
<p>Uses vector embeddings and unsupervised learning to group thousands of heterogeneous job titles into meaningful professional clusters.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="local-first-privacy-preserving" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="local-first-privacy-preserving">Local-first, privacy-preserving</h3>
<p>All data processing, modeling, and visualization run locally, keeping sensitive career data fully under user control.</p>
</section>
<section id="analysis-of-network-structure" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="analysis-of-network-structure">Analysis of network structure</h3>
<p>Reveals dominant industries, role distributions, and underrepresented areas within a personal network.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="geographic-visualization" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="geographic-visualization">Geographic visualization</h3>
<p>Maps global connections by resolving ambiguous location strings into precise coordinates.</p>
</section>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Follow the link to explore the application:</p>
<p><a href="https://linked.frequentist.org/" target="_blank">My Professional Network</a></p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="architecture.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="The application design is based on microservices orchestrated via Docker Compose, ensuring scalability and isolation."><img src="https://frequentist.org/projects/webapp-linkedin-analytics/architecture.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%" alt="The application design is based on microservices orchestrated via Docker Compose, ensuring scalability and isolation."></a></p>
</figure>
</div>
<figcaption>The application design is based on microservices orchestrated via Docker Compose, ensuring scalability and isolation.</figcaption>
</figure>
</div>
</section>
<section id="implementation-highlights" class="level2">
<h2 class="anchored" data-anchor-id="implementation-highlights">Implementation Highlights</h2>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="microservices-based-local-architecture" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="microservices-based-local-architecture">Microservices-based local architecture</h3>
<p>Orchestrated with Docker Compose, separating web UI, background workers, database, cache, and ML inference services for scalability and isolation.</p>
</section>
<section id="asynchronous-processing-with-celery-redis" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="asynchronous-processing-with-celery-redis">Asynchronous processing with Celery &amp; Redis</h3>
<p>Long-running tasks (parsing profiles, embedding generation, geocoding) execute in the background, keeping the UI responsive.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="local-nlp-with-vector-embeddings" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="local-nlp-with-vector-embeddings">Local NLP with vector embeddings</h3>
<p>Job titles are embedded using a locally hosted Hugging Face inference service, with aggressive caching to reduce recomputation and latency.</p>
</section>
<section id="unsupervised-clustering-pipeline" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="unsupervised-clustering-pipeline">Unsupervised clustering pipeline</h3>
<p>Combines dimensionality reduction (LSA / SVD) with K-Means, including a custom heuristic for automatic cluster labeling and long-tail handling.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="optimized-geocoding-strategy" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="optimized-geocoding-strategy">Optimized geocoding strategy</h3>
<p>Deduplicates and persists resolved locations to minimize API usage and control operational costs.</p>
</section>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Read the blog post detailing the technical implementation: <a href="https://frequentist.org/posts/20251227-linkedin-analytics/" target="_blank">Building a Privacy-First LinkedIn Analytics Platform</a>.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="outcomes-impact" class="level2">
<h2 class="anchored" data-anchor-id="outcomes-impact">Outcomes &amp; Impact</h2>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div id="col1" class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<ul>
<li><p>Turned opaque, static LinkedIn exports into a living analytical system.</p></li>
<li><p>Demonstrated how data science, backend engineering, and UX can coexist in a single cohesive product.</p></li>
</ul>
</div>
<div id="col2" class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<ul>
<li><p>Showcased trade-offs around performance, privacy, scalability, and maintainability.</p></li>
<li><p>Created a foundation for future extensions such as graph databases and conversational network analysis.</p></li>
</ul>
</div>
</div>
</div>
</section>
<section id="skills-demonstrated" class="level2">
<h2 class="anchored" data-anchor-id="skills-demonstrated">Skills Demonstrated</h2>
<p>Data Engineering • NLP &amp; Embeddings • Unsupervised Learning • Network Analysis • Asynchronous Systems • Docker &amp; DevOps • Full-Stack Data Science • Privacy-First System Design • System Architecture</p>
</section>
<section id="apply-this-to-your-business" class="level2">
<h2 class="anchored" data-anchor-id="apply-this-to-your-business">Apply This to Your Business</h2>
<p>If you have a business problem that requires data-driven solutions, feel free to reach out via <a href="../../contact.html">contact page</a> to discuss how I can help leverage data science, analytics, and automation to drive value for your organization.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<div id="listing-projects" class="quarto-listing quarto-listing-container-grid">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-categories="UGlwZWxpbmUlMkNMZWFkJTIwR2VuZXJhdGlvbiUyQ0RhdGElMjBFbmdpbmVlcmluZyUyQ1B5dGhvbg==" data-listing-date-sort="1777161600000" data-listing-file-modified-sort="1777217381211" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="288">
<a href="../../projects/report-european-environmental-companies/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/report-european-environmental-companies/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
European Environmental Company Intelligence Pipeline
</h5>
<div class="card-text listing-description delink">
<p>An automated pipeline that discovers EU-funded environmental companies, maps their C-suite leadership via LinkedIn, and delivers HubSpot-ready contact intelligence with…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="1" data-categories="QXBwJTJDQWdlbnRpYyUyMEFJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1770249600000" data-listing-file-modified-sort="1774474774886" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="358">
<a href="../../projects/app-sales-signals/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/app-sales-signals/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Sales Signals app (Agentic AI)
</h5>
<div class="card-text listing-description delink">
<p>Automated sales coaching engine that turns B2B call transcripts into real-time, context-aware feedback, combining LLMs and historical customer data to surface revenue and…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="2" data-categories="QXBwJTJDQWdlbnRpYyUyMEFJJTJDUHl0aG9u" data-listing-date-sort="1765584000000" data-listing-file-modified-sort="1767872588515" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="391">
<a href="../../projects/app-autonomous-career-agent/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/app-autonomous-career-agent/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Autonomous Career Agent (Agentic AI Application)
</h5>
<div class="card-text listing-description delink">
<p>A multi-agent AI system that automates the job search and application process, demonstrating LLM orchestration and autonomous agent patterns.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="3" data-categories="RGFzaGJvYXJkJTJDUG93ZXIlMjBCSSUyQ1IlMkNGZWF0dXJlZA==" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767872628244" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="378">
<a href="../../projects/dashboard-e-commerce/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/dashboard-e-commerce/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
E-Commerce Analytics Dashboard (Power BI + R)
</h5>
<div class="card-text listing-description delink">
<p>An interactive BI dashboard combining customer RFM segmentation, ABC/XYZ product analysis, and revenue forecasting using R models embedded in Power BI.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="4" data-categories="RGFzaGJvYXJkJTJDUG93ZXIlMjBCSSUyQ1IlMkNGZWF0dXJlZA==" data-listing-date-sort="1761523200000" data-listing-file-modified-sort="1767872662287" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="332">
<a href="../../projects/dashboard-financial-complaints/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/dashboard-financial-complaints/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Consumer Financial Complaints Dashboard (Power BI + R)
</h5>
<div class="card-text listing-description delink">
<p>An interactive Power BI dashboard analyzing CFPB consumer complaints with trend forecasting, geographic and product breakdowns, and causal factor analysis using R models.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="5" data-categories="V2ViYXBwJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1748649600000" data-listing-file-modified-sort="1767873003569" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="481">
<a href="../../projects/webapp-content-mate/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/webapp-content-mate/image.svg" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
E-commerce Content Automation Platform
</h5>
<div class="card-text listing-description delink">
<p>A web application that automates the generation of e-commerce product cards using asynchronous pipelines and LLM-assisted content creation.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="6" data-categories="QkklMkNQb3dlciUyMEJJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1767886951027" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="462">
<a href="../../projects/bi-system-telecamera/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/bi-system-telecamera/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Company-Wide Business Intelligence System
</h5>
<div class="card-text listing-description delink">
<p>An end-to-end BI system consolidating operational, financial, marketing, and sales data into a single decision-support layer.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="7" data-categories="TGlicmFyeSUyQ1BhY2thZ2UlMkNQeXRob24=" data-listing-date-sort="1724284800000" data-listing-file-modified-sort="1767872644185" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="310">
<a href="../../projects/python-sophisthse/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/python-sophisthse/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Python Library for Russian Macroeconomic Time Series
</h5>
<div class="card-text listing-description delink">
<p>A Python package that simplifies access to Russian macroeconomic time-series data from the Higher School of Economics (HSE) sophist.hse.ru repository.</p>
</div>
</div>
</div></a>
</div>
</div>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Webapp</category>
  <category>Shiny for Python</category>
  <category>Python</category>
  <category>Featured</category>
  <guid>https://frequentist.org/projects/webapp-linkedin-analytics/</guid>
  <pubDate>Sat, 27 Dec 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/projects/webapp-linkedin-analytics/image.png" medium="image" type="image/png" height="77" width="144"/>
</item>
<item>
  <title>Agentic vs Deterministic Workflows: Designing a Reliable AI Application</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20251213-agentic-vs-deterministic/</link>
  <description><![CDATA[ 






<p>In this article, I’ll walk through the architecture of the <strong>Autonomous Career Agent (ACA)</strong>, a system I built to automate the tedious process of job hunting. Instead of just chatting, this agent acts as a specialized recruiter: it searches for live job vacancies, assesses your resume against them, and generates tailored resume and cover letter ready for submission.</p>
<p>Jump to the Application Demo to see the application in action.</p>
<p>We’ll dive into the <strong>Orchestrator Pattern</strong>, managing state across multiple specialized agents, and solving production challenges like secure artifact delivery using Google Cloud Storage (GCS).</p>
<p>Many parts of job search automation can be implemented without LLM autonomy, so this project became a practical comparison between <em>explicit control flow</em> and <em>delegated reasoning</em>.</p>
<section id="architecture" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="architecture">Architecture</h2>
<p>When building complex agentic workflows, a common pitfall is the “God Prompt” — stuffing every possible instruction (search logic, evaluation criteria, formatting rules) into a single system prompt. This leads to fragile systems that are hard to debug and even harder to scale.</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>This application was created as a capstone project for the <a href="https://www.kaggle.com/learn-guide/5-day-agents" target="_blank">5-Day AI Agents Intensive Course with Google</a>.</p>
</div></div><p>For this application, I adopted a <strong>Multi-Agent Orchestration</strong> architecture using the <strong>Google Agents Development Kit (ADK)</strong>. This design separates high-level reasoning from low-level execution.</p>
<section id="orchestrator-pattern" class="level3 page-columns page-full">
<h3 class="anchored" data-anchor-id="orchestrator-pattern">Orchestrator Pattern</h3>
<p>At the core is the <strong>Orchestrator Agent</strong>. Think of it as the project manager. It doesn’t know how to scrape LinkedIn or how to format a resume. Its job is to understand the user’s intent, maintain the state of the workflow, and delegate tasks to specialists.</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>Explore the full source code on GitHub: <a href="https://github.com/AxesAccess/Autonomous-Career-Agent" target="_blank">AxesAccess / Autonomous-Career-Agent</a>.</p>
</div></div><div id="fig-workflow" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-workflow-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251213-agentic-vs-deterministic/workflow.svg" class="img-fluid quarto-figure quarto-figure-center figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-workflow-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: The Application Workflow
</figcaption>
</figure>
</div>
</section>
<section id="why-this-approach" class="level3">
<h3 class="anchored" data-anchor-id="why-this-approach">Why This Approach?</h3>
<ol type="1">
<li><strong>Separation of Concerns</strong>: The Search Agent can be iterated on (e.g., swapping a mock API for a real LinkedIn integration) without touching the Orchestrator’s logic.</li>
<li><strong>Context Management</strong>: The Orchestrator keeps the “big picture” (the user’s goal), while sub-agents only need the context for their specific task. This saves tokens and reduces hallucinations.</li>
<li><strong>Testability</strong>: We can unit test the Assessment Agent’s scoring logic independently of the Job Search results.</li>
</ol>
</section>
<section id="what-a-deterministic-version-would-look-like" class="level3 column-screen-inset-shaded page-columns">
<h3 class="anchored" data-anchor-id="what-a-deterministic-version-would-look-like">What a Deterministic Version Would Look Like</h3>
<p>To understand the trade-offs, it’s helpful to imagine a deterministic version of this system:</p>
<ul>
<li><p><strong>Hard-coded Pipeline</strong>: <code>Input -&gt; Search() -&gt; Loop(Results) -&gt; Assess() -&gt; Generate() -&gt; Output</code>.</p></li>
<li><p><strong>Rule-based Scoring</strong>: “If ‘Python’ in Resume and ‘Python’ in Job Description, Score += 10”.</p></li>
<li><p><strong>Explicit Transitions</strong>: The code strictly dictates that step B always follows step A.</p></li>
</ul>
<p>While efficient, this approach requires us to pre-define every possible filter criteria. If a user asks for “a job with a great engineering culture,” a deterministic system fails unless we have an explicit “culture_score” column. An agent, however, can reason about the <em>unstructured</em> text in a job description to infer cultural values without requiring a schema change. The trade-off is clear: do we value raw efficiency (deterministic) or the ability to handle open-ended, ambiguous requirements (agentic)?</p>
</section>
</section>
<section id="orchestration-logic" class="level2">
<h2 class="anchored" data-anchor-id="orchestration-logic">Orchestration Logic</h2>
<p>The Orchestrator’s intelligence comes from its ability to maintain a ‘Context Loop’ — remembering the results of Step 1 (Search) to inform Step 2 (Assessment).</p>
<p>In <code>src/agents/orchestrator.py</code>, the system instruction acts as a state machine. It doesn’t just say “Help the user”, it defines a strict protocol:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1">orchestrator_agent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LlmAgent(</span>
<span id="cb1-2">    name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"career_orchestrator"</span>,</span>
<span id="cb1-3">    instruction<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb1-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    Your responsibilities:</span></span>
<span id="cb1-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    ...</span></span>
<span id="cb1-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    3. Use the `recruitment_search_agent` to find relevant vacancies.</span></span>
<span id="cb1-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    4. Provide user with the list of relevant vacancies.</span></span>
<span id="cb1-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    5. Ask user to provide information about their skills and experiences.</span></span>
<span id="cb1-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    6. For each relevant vacancy chosen by the user:</span></span>
<span id="cb1-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">        a. Use the `skills_assessment_agent` to analyze the fit...</span></span>
<span id="cb1-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">        b. If the fit is good ... generate the tailored resume...</span></span>
<span id="cb1-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb1-13">)</span></code></pre></div></div>
<p>By explicitly numbering the steps and restricting available tools at each phase, we reduce the degrees of freedom available to the LLM. The goal is not to make the model “smarter”, but to make incorrect behavior harder to express. This reliability is crucial for user trust.</p>
</section>
<section id="production-grade-integrations" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="production-grade-integrations">Production-Grade Integrations</h2>
<p>Building a demo is one thing; building something that works in a restricted environment is another. One major challenge I faced was <strong>Artifact Delivery</strong>.</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>Explore the full source code on GitHub: <a href="https://github.com/AxesAccess/Autonomous-Career-Agent" target="_blank">AxesAccess / Autonomous-Career-Agent</a>.</p>
</div></div><section id="challenge-where-do-the-files-go" class="level3">
<h3 class="anchored" data-anchor-id="challenge-where-do-the-files-go">Challenge: “Where do the files go?”</h3>
<p>When the agent generates a Resume (in this version Markdown), we need to give it to the user.</p>
<ul>
<li><p><strong>Local File System</strong>: In a containerized web deployment, local files aren’t accessible to the user’s browser.</p></li>
<li><p><strong>Chat Attachment</strong>: The ADK UI showed an error for the attached files at that time.</p></li>
</ul>
</section>
<section id="solution-hybrid-cloud-storage" class="level3">
<h3 class="anchored" data-anchor-id="solution-hybrid-cloud-storage">Solution: Hybrid Cloud Storage</h3>
<p>I implemented a hybrid tooling strategy (<code>src/tools/hybrid_artifact_tools.py</code>) that satisfies both the agent’s memory needs and the user’s UX needs.</p>
<ol type="1">
<li><strong>Internal Memory</strong>: The file is saved to the Agent’s internal artifact store so it can “remember” what it wrote.</li>
<li><strong>Public Delivery</strong>: The file is simultaneously uploaded to a private <strong>Google Cloud Storage (GCS)</strong> bucket to deliver it to the user.</li>
<li><strong>Secure Access</strong>: The app generates <strong>Signed URL</strong> (valid for 24 hours) and present <em>that</em> link to the user in the chat.</li>
</ol>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># From src/tools/hybrid_artifact_tools.py</span></span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1. Upload to GCS</span></span>
<span id="cb2-4">blob <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bucket.blob(gcs_filename)</span>
<span id="cb2-5">blob.upload_from_string(content, content_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>mime_type)</span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2. Generate Signed URL</span></span>
<span id="cb2-8">signed_url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> blob.generate_signed_url(</span>
<span id="cb2-9">    version<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"v4"</span>, </span>
<span id="cb2-10">    expiration<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>timedelta(hours<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">24</span>), </span>
<span id="cb2-11">    method<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GET"</span></span>
<span id="cb2-12">)</span>
<span id="cb2-13"></span>
<span id="cb2-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 3. Return to Agent to show the user</span></span>
<span id="cb2-15"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"📥 Download Link: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>signed_url<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span></code></pre></div></div>
<p>This approach demonstrates how to bridge the gap between AI generation and standard web infrastructure which is a key aspect for moving agents from prototype to production. This pattern proved essential because it decouples generation from delivery, allowing the agent to operate in restricted execution environments without leaking infrastructure concerns into the prompting layer.</p>
</section>
</section>
<section id="observability-evaluation" class="level2">
<h2 class="anchored" data-anchor-id="observability-evaluation">Observability &amp; Evaluation</h2>
<p>A reliable agent system requires more than just code; it requires rigorous testing.</p>
<section id="evaluation-with-golden-datasets" class="level3">
<h3 class="anchored" data-anchor-id="evaluation-with-golden-datasets">Evaluation with Golden Datasets</h3>
<p>We don’t trust the agent blindly. We use an automated evaluation script (<code>tests/evaluation/evaluate.py</code>) that runs the agent against a <code>golden_dataset.json</code>. This dataset contains typical user scenarios (e.g., “Find Python jobs in Berlin”) and verifies:</p>
<ol type="1">
<li><p><strong>Safety</strong>: Did the agent error out?</p></li>
<li><p><strong>Correctness</strong>: Did the response contain expected keywords (e.g., job titles found)?</p></li>
<li><p><strong>Tool Usage</strong>: Did it call the Search Tool?</p></li>
</ol>
</section>
<section id="observability" class="level3">
<h3 class="anchored" data-anchor-id="observability">Observability</h3>
<p>Using ADK’s built-in observability features, I trace every step of the orchestration. This allows for inspection of raw prompts and responses, helping to debug why an agent might have “hallucinated” a step or missed a user instruction.</p>
<p>Notably, evaluation becomes more important as autonomy increases; unlike deterministic pipelines, agentic systems require behavioral testing rather than simple output validation.</p>
</section>
</section>
<section id="application-demo" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="application-demo">Application Demo</h2>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/o8VLHrvs3z8" title="Autonomous Career Agent Demo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>

<div class="no-row-height column-margin column-container"><div class="">
<p>Read the hackathon writeup on Kaggle: <a href="https://www.kaggle.com/competitions/agents-intensive-capstone-project/writeups/aca-autonomous-career-agent" target="_blank">ACA aka Autonomous Career Agent</a>.</p>
</div></div></section>
<section id="challenges-lessons-learned" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="challenges-lessons-learned">Challenges &amp; Lessons Learned</h2>
<ul>
<li><strong>Latency vs.&nbsp;Accuracy</strong>: Splitting tasks into sub-agents improves accuracy but adds latency because of multiple LLM round-trips. I optimized this by having the Orchestrator handle simple info-gathering without delegation where possible.</li>
<li><strong>Tool Hallucinations</strong>: Early versions often tried to invent search parameters. Strict typing in the Tool classes <code>pydantic</code> models solved 90% of these issues.</li>
</ul>
<section id="where-a-deterministic-approach-was-better" class="level3 column-screen-inset-shaded page-columns">
<h3 class="anchored" data-anchor-id="where-a-deterministic-approach-was-better">Where a Deterministic Approach Was Better</h3>
<p>It’s important to acknowledge where the agentic approach introduced friction compared to a traditional script:</p>
<ul>
<li><strong>Latency</strong>: The “thought process” of an LLM is orders of magnitude slower than a function call.</li>
<li><strong>Reproducibility</strong>: Even with temperature=0, minor variations in phrasing can occur, making regression testing harder.</li>
<li><strong>Cost Predictability</strong>: A loop in a script costs nothing; a loop in an agent consumes tokens with every iteration.</li>
<li><strong>Debugging Difficulty</strong>: You can’t just set a breakpoint in a prompt; you have to trace the semantic flow.</li>
</ul>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>The Autonomous Career Agent app demonstrates that building useful GenAI applications goes beyond prompt engineering. It requires solid software engineering principles: modular architecture, secure integration patterns, and automated testing.</p>
<p>This project reaffirmed an important engineering reality: for the specific task of matching keywords and filling templates, a deterministic pipeline offers distinct advantages in speed and predictability. The agentic overhead here is non-trivial.</p>
<p>However, this architecture lays the groundwork for a much broader application; for example, <strong>Career Consulting</strong>. By grasping the observability and evaluation frameworks probed here, we can safely expand the agentic application to handle unstructured tasks like helping a user figure out <em>what</em> they want to do or pivoting strategies mid-search, where no deterministic application could succeed.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some related articles you might find interesting:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="QXBwJTJDVmlzdWFsaXphdGlvbiUyQ05MUCUyQ1B5dGhvbg==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770067837480" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1445" data-listing-title-sort="Building a Privacy-First LinkedIn Analytics Platform" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251227-linkedin-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251227-linkedin-analytics/index.html" class="title listing-title">Building a Privacy-First LinkedIn Analytics Platform</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="1" data-categories="QXBwJTJDTExNJTJDUHl0aG9u" data-listing-date-sort="1748649600000" data-listing-file-modified-sort="1767875135219" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="1" data-listing-word-count-sort="110" data-listing-title-sort="Product Cards Creation Application" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250531-content-mate/mark-konig-Tl8mDaue_II-unsplash_square.jpg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250531-content-mate/index.html" class="title listing-title">Product Cards Creation Application</a>
</td>
<td>
<span class="listing-reading-time">1 min</span>
</td>

</tr>

<tr data-index="2" data-categories="TkxQJTJDUHl0aG9u" data-listing-date-sort="1746230400000" data-listing-file-modified-sort="1750795873147" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1111" data-listing-title-sort="Creating Anki Flashcards From List of Words" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250503-anki-part-1/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250503-anki-part-1/index.html" class="title listing-title">Creating Anki Flashcards From List of Words</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="3" data-categories="UkFHJTJDTkxQJTJDTExNJTJDUHl0aG9u" data-listing-date-sort="1742515200000" data-listing-file-modified-sort="1748974609635" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2344" data-listing-title-sort="Implementing a Local Retrieval-Augmented Generation System" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250321-rag/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250321-rag/index.html" class="title listing-title">Implementing a Local Retrieval-Augmented Generation System</a>
</td>
<td>
<span class="listing-reading-time">12 min</span>
</td>

</tr>

<tr data-index="4" data-categories="UHl0aG9uJTJDUiUyQ01hdGxhYg==" data-listing-date-sort="1739491200000" data-listing-file-modified-sort="1770926078462" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="607" data-listing-title-sort="Nerdy Valentine's in Python, R, and Matlab" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250214-valentines/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250214-valentines/index.html" class="title listing-title">Nerdy Valentine’s in Python, R, and Matlab</a>
</td>
<td>
<span class="listing-reading-time">4 min</span>
</td>

</tr>

<tr data-index="5" data-categories="VGltZS1TZXJpZXMlMkNNYWNyb2Vjb25vbWljcyUyQ1B5dGhvbg==" data-listing-date-sort="1724284800000" data-listing-file-modified-sort="1756304275864" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1064" data-listing-title-sort="Python Library for Russian Macroeconomics Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240821-sophisthse/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240821-sophisthse/index.html" class="title listing-title">Python Library for Russian Macroeconomics Data</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="6" data-categories="TWFya2V0aW5nJTJDUHJvZHVjdCUyQ1B5dGhvbg==" data-listing-date-sort="1722816000000" data-listing-file-modified-sort="1770626733464" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1892" data-listing-title-sort="Kano Method for Prioritization of Features" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240805-kano-model/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240805-kano-model/index.html" class="title listing-title">Kano Method for Prioritization of Features</a>
</td>
<td>
<span class="listing-reading-time">10 min</span>
</td>

</tr>

<tr data-index="7" data-categories="UHl0aG9uJTJDR3JhcGhz" data-listing-date-sort="1722384000000" data-listing-file-modified-sort="1735422870721" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1431" data-listing-title-sort="Merging Customers Records Using Graphs in Python" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240731-customers-graphs/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240731-customers-graphs/index.html" class="title listing-title">Merging Customers Records Using Graphs in Python</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Agents</category>
  <category>LLM</category>
  <category>App</category>
  <category>Python</category>
  <guid>https://frequentist.org/posts/20251213-agentic-vs-deterministic/</guid>
  <pubDate>Sat, 13 Dec 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20251213-agentic-vs-deterministic/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Building an E-Commerce Dashboard with Power BI and R</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20251126-e-commerce-dashboard/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>This project showcases my process of building analytical dashboard for business decision-making, specifically in the e-commerce domain. The dashboard is built in Power BI and integrates R scripts for advanced analytics. The dashboard was created as part of the November 2025 DataDNA Dataset Challenge and was selected as the winner in both overall and accessibility categories.</p>
</section>
<section id="key-features" class="level2">
<h2 class="anchored" data-anchor-id="key-features">Key Features</h2>
<p>The dashboard includes several analytical features. The implementation of these involved data processing and modeling techniques, which I will outline below.</p>
<section id="revenue-forecast" class="level3">
<h3 class="anchored" data-anchor-id="revenue-forecast">Revenue Forecast</h3>
<p>For the executives it’s important to know ahead of time whether the revenue will meet the target for the set period. To address this, I implemented a time series forecasting model. The model takes into account seasonality and trends to provide accurate revenue predictions. Due to the limited data span of less than two years, I opted for a simpler forecasting approach using <code>auto.arima</code> function from R’s <code>forecast</code> package.</p>
<div id="fig-revenue-forecast" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-revenue-forecast-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/revenue-forecast.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-revenue-forecast-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Total &amp; Forecasted Revenue bar plot created in the Deneb custom visual using the Vega-Lite grammar
</figcaption>
</figure>
</div>
<p>The model was used to generate revenue forecasts for various segments, including country, category, and channel. I integrated the forecasting results into the Power BI dashboard, enabling users to easily view and interact with forecasts alongside historical data, with smooth segment switching when applying cross-filtering.</p>
</section>
<section id="abcxyz-analysis" class="level3">
<h3 class="anchored" data-anchor-id="abcxyz-analysis">ABC/XYZ Analysis</h3>
<p>A classic way to segment products is by using ABC analysis. This method classifies products based on their revenue, placing items that contribute the most to revenue in category A, those with moderate contribution in category B, and the rest in category C.</p>
<div id="fig-abc-xyz-matrix" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-abc-xyz-matrix-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/abc-xyz-matrix.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-abc-xyz-matrix-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Number of SKU by ABC/XYZ Segment visualized using Matrix visual from standard Power BI library
</figcaption>
</figure>
</div>
<p>To complement this, I applied XYZ analysis, which segments products based on their sales variability. Products with stable demand are classified as X, those with moderate variability as Y, and highly variable products as Z. Combining ABC and XYZ analyses provides a comprehensive view of product performance, enabling more informed inventory and marketing strategies.</p>
</section>
<section id="rfm-segmentation" class="level3">
<h3 class="anchored" data-anchor-id="rfm-segmentation">RFM Segmentation</h3>
<p>Another widely used customer segmentation technique is RFM analysis, which focuses on three key metrics: Recency (how recently a customer made a purchase), Frequency (how often they purchase), and Monetary value (how much they spend). By scoring customers on these dimensions, businesses can identify their most valuable customers and tailor marketing efforts accordingly.</p>
<div id="fig-rfm" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-rfm-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/rfm.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-rfm-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: RFM segments are three-dimensional, which is why it’s common practice to combine Frequency and Monetary into a single FM axis (on the left).
</figcaption>
</figure>
</div>
<p>To make the segmentation more intuitive, I defined nine segments (levels) based on RFM scores: “Champions”, “Loyal Customers”, “New Customers”, “Potential Loyalists”, “Promising”, “Needs Attention”, “Cannot Lose Them”, “At Risk”, “Hibernating”. This categorization helps in devising targeted strategies for customer retention and engagement.</p>
</section>
</section>
<section id="details-of-implementation" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="details-of-implementation">Details of Implementation</h2>
<section id="data-cleaning-and-transformation" class="level3 page-columns page-full">
<h3 class="anchored" data-anchor-id="data-cleaning-and-transformation">Data Cleaning and Transformation</h3>
<p>The most data cleaning and transformation steps were performed using R scripts running in Power Query, but first I explored the dataset in RStudio. Here are the documents produced by Quarto during the data analysis phase:</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>You can find source code in the form of the R markdown documents in the <a href="https://github.com/AxesAccess/DataDNA-Dataset-Challenge-E-commerce-Dataset-November-2025" target="_blank">project repository</a>.</p>
</div></div><table class="caption-top table">
<thead>
<tr class="header">
<th>Document</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><a href="01_EDA.html" target="_blank">Exploratory Data Analysis</a></td>
<td>The initial steps to explore dataset structure, factors and distributions.</td>
</tr>
<tr class="even">
<td><a href="02_Preprocess_in_Power_Query.html" target="_blank">Preprocess in Power Query</a></td>
<td>The R code for processing dataset in Power Query, including RFM and ABC/XYZ segmentations.</td>
</tr>
<tr class="odd">
<td><a href="03_TS_Forecasts.html" target="_blank">Time-Series Forecasting</a></td>
<td>The separate document containing R code for time-series forecasting, also designed to run in Power Query.</td>
</tr>
<tr class="even">
<td><a href="04_Refund_Prediction.html" target="_blank">Refund Prediction</a></td>
<td>An attempt to create a model to predict refunds; no variables with statistical significance found.</td>
</tr>
</tbody>
</table>
</section>
<section id="power-bi-data-model" class="level3">
<h3 class="anchored" data-anchor-id="power-bi-data-model">Power BI Data Model</h3>
<p>Tables and relations in the Power BI data model are organized following a “star” schema, with fact tables at the center connected to dimension tables. This structure optimizes query performance and simplifies data analysis.</p>
<div id="fig-data-model" class="lightbox quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-data-model-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="power-bi-data-model.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Figure&nbsp;4: The Power BI data model may appear extensive but it’s organized following a “star” schema"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/power-bi-data-model.png" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-data-model-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: The Power BI data model may appear extensive but it’s organized following a “star” schema
</figcaption>
</figure>
</div>
</section>
<section id="power-bi-visuals" class="level3">
<h3 class="anchored" data-anchor-id="power-bi-visuals">Power BI Visuals</h3>
<p>Power BI visualizations are built using visuals from the standard Power BI library, except for a few bar charts created with the Deneb custom visual, which allows for advanced charting using the Vega-Lite grammar.</p>
</section>
</section>
<section id="dashboard-overview" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="dashboard-overview">Dashboard Overview</h2>
<p>The dashboard is structured into five key sections.</p>

<div class="no-row-height column-margin column-container"><div class="">
<p>You can try interactive version of the dashboard <a href="https://app.powerbi.com/view?r=eyJrIjoiNzY5NjZhMDAtZjNjNS00NmYxLTkzNWUtNGJkZWZlNWMzOWIxIiwidCI6ImZmYzg3OTVlLTAxODUtNDg5Yi05ZGE2LTQ5MDI0MTJmMDNhMCIsImMiOjl9" target="_blank">here</a></p>
</div></div><section id="summary-page" class="level3">
<h3 class="anchored" data-anchor-id="summary-page">Summary Page</h3>
<p>The dashboard immediately shows key metrics.</p>
<div id="fig-summary" class="lightbox quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-summary-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0001.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="Figure&nbsp;5: Summary"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-summary-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Summary
</figcaption>
</figure>
</div>
</section>
<section id="loyalty-section" class="level3">
<h3 class="anchored" data-anchor-id="loyalty-section">Loyalty Section</h3>
<p>This section presents customer loyalty metrics, including Repeat Buyers Share, Life-time Value, and Purchase Frequency.</p>
<div id="fig-loyalty" class="lightbox quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-loyalty-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0002.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="Figure&nbsp;6: Loyalty"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0002.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-loyalty-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6: Loyalty
</figcaption>
</figure>
</div>
</section>
<section id="products-section" class="level3">
<h3 class="anchored" data-anchor-id="products-section">Products Section</h3>
<p>This section provides descriptive statistics about products. It includes ABC/XYZ analysis, Total Revenue breakdown by Category and Vendor, and Top Products.</p>
<div id="fig-products" class="lightbox quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-products-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0003.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="Figure&nbsp;7: Products"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0003.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-products-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;7: Products
</figcaption>
</figure>
</div>
</section>
<section id="pricing-section" class="level3">
<h3 class="anchored" data-anchor-id="pricing-section">Pricing Section</h3>
<p>Pricing section provides insights into product pricing strategies, including Discount Penetration, Average Discount, and Revenue Lift from Discounts, as well as time series of Average Discount and Discount Penetration.</p>
<div id="fig-pricing" class="lightbox quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-pricing-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0004.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-5" title="Figure&nbsp;8: Pricing"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0004.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-pricing-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;8: Pricing
</figcaption>
</figure>
</div>
</section>
<section id="customers-section" class="level3">
<h3 class="anchored" data-anchor-id="customers-section">Customers Section</h3>
<p>This tab presents the RFM segmentation results, showing the distribution of customers across different segments and their contribution to total revenue.</p>
<div id="fig-customers" class="lightbox quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-customers-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0005.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-6" title="Figure&nbsp;9: Customers"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0005.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-customers-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;9: Customers
</figcaption>
</figure>
</div>
</section>
</section>
<section id="areas-to-improve" class="level2">
<h2 class="anchored" data-anchor-id="areas-to-improve">Areas to Improve</h2>
<section id="dynamic-time-periods" class="level3">
<h3 class="anchored" data-anchor-id="dynamic-time-periods">Dynamic Time Periods</h3>
<p>The latest data point in the dataset is <code>2025-10-21</code>. All visuals in the dashboard reflect the data for the calendar year 2025, using time-series forecasting for the remaining months. This makes the dashboard unsuitable for use outside of the narrow period of October-December 2025. To make this dashboard a working instrument, I would need to implement a slicer for selecting the time period and adjust all measures accordingly. The year slicer would not be a good choice for this task, I’d rather opt for a date slicer with “Relative date filtering” option set to “in the last 12 months” plus 1..2 month forecast.</p>
</section>
<section id="web-and-marketing-data" class="level3">
<h3 class="anchored" data-anchor-id="web-and-marketing-data">Web and Marketing Data</h3>
<p>The dataset lacks web-analytics data, which would enable more comprehensive customer behavior analysis. Additionally, the absence of marketing data (except for the acquisition channel) prevents the evaluation of campaign effectiveness and customer acquisition strategies. Incorporating these data sources would significantly enhance the analytical capabilities of the dashboard.</p>
</section>
<section id="causal-modeling" class="level3">
<h3 class="anchored" data-anchor-id="causal-modeling">Causal Modeling</h3>
<p>The synthetic nature of the dataset also limits the depth of analysis: the data is too random to uncover meaningful patterns or trends. Access to real-world data would allow for causal modeling and insights.</p>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>I hope this overview of the dashboard construction process, from initial data cleaning to final model integration, gives the reader a clear picture of the analytical work involved.</p>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="winner-certificate.png" class="lightbox" data-gallery="quarto-lightbox-gallery-7" title="This dashboard was selected as winning in the November 2025 DataDNA Dataset Challenge in the overall standings"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/winner-certificate.png" class="img-fluid figure-img" alt="This dashboard was selected as winning in the November 2025 DataDNA Dataset Challenge in the overall standings"></a></p>
<figcaption>This dashboard was selected as winning in the November 2025 DataDNA Dataset Challenge in the overall standings</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="accessibility-certificate.png" class="lightbox" data-gallery="quarto-lightbox-gallery-8" title="…and also in the Accessibility category, thanks to the carefully chosen palette"><img src="https://frequentist.org/posts/20251126-e-commerce-dashboard/accessibility-certificate.png" class="img-fluid figure-img" alt="…and also in the Accessibility category, thanks to the carefully chosen palette"></a></p>
<figcaption>…and also in the Accessibility category, thanks to the carefully chosen palette</figcaption>
</figure>
</div>
</div>
</div>
</div>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p><a href="https://github.com/AxesAccess/DataDNA-Dataset-Challenge-E-commerce-Dataset-November-2025" target="_blank">Project Repository</a></p></li>
<li><p><a href="https://app.powerbi.com/view?r=eyJrIjoiNzY5NjZhMDAtZjNjNS00NmYxLTkzNWUtNGJkZWZlNWMzOWIxIiwidCI6ImZmYzg3OTVlLTAxODUtNDg5Yi05ZGE2LTQ5MDI0MTJmMDNhMCIsImMiOjl9" target="_blank">Interactive Dashboard</a></p></li>
</ul>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some of my other posts related to BI, statistics, machine learning, and data visualization:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="TUwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1771027200000" data-listing-file-modified-sort="1770982202443" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="9" data-listing-word-count-sort="1687" data-listing-title-sort="Implementing a Neural Network in Base R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260214-backpropagating-love/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260214-backpropagating-love/index.html" class="title listing-title">Implementing a Neural Network in Base R</a>
</td>
<td>
<span class="listing-reading-time">9 min</span>
</td>

</tr>

<tr data-index="1" data-categories="S1BJJTIwRGVzaWduJTJDQkklMkNTdHJhdGVneQ==" data-listing-date-sort="1769558400000" data-listing-file-modified-sort="1772988050630" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2630" data-listing-title-sort="The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260128-seven-step-kpi-blueprint/index.html" class="title listing-title">The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QXBwJTJDVmlzdWFsaXphdGlvbiUyQ05MUCUyQ1B5dGhvbg==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770067837480" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1445" data-listing-title-sort="Building a Privacy-First LinkedIn Analytics Platform" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251227-linkedin-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251227-linkedin-analytics/index.html" class="title listing-title">Building a Privacy-First LinkedIn Analytics Platform</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDU3RhdGlzdGljcyUyQ1I=" data-listing-date-sort="1761868800000" data-listing-file-modified-sort="1767873908846" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1178" data-listing-title-sort="Propensity Score Matching for Causal Analysis" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251028-propensity-score-matching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251028-propensity-score-matching/index.html" class="title listing-title">Propensity Score Matching for Causal Analysis</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="4" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="6" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3MlMkNS" data-listing-date-sort="1754524800000" data-listing-file-modified-sort="1759267192458" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1187" data-listing-title-sort="Minimum Detectable Effect (MDE) Calculation" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250807-mde/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250807-mde/index.html" class="title listing-title">Minimum Detectable Effect (MDE) Calculation</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="7" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3M=" data-listing-date-sort="1753747200000" data-listing-file-modified-sort="1770450295226" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="37" data-listing-word-count-sort="7205" data-listing-title-sort="A/B Testing: Concepts and Techniques" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250729-ab-testing/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250729-ab-testing/index.html" class="title listing-title">A/B Testing: Concepts and Techniques</a>
</td>
<td>
<span class="listing-reading-time">37 min</span>
</td>

</tr>

<tr data-index="8" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="9" data-categories="QkklMkNFVEw=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1769683733598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="397" data-listing-title-sort="BI System Blueprint" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250106-bi-flowchart/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250106-bi-flowchart/index.html" class="title listing-title">BI System Blueprint</a>
</td>
<td>
<span class="listing-reading-time">2 min</span>
</td>

</tr>

<tr data-index="10" data-categories="Q29tcFZpcyUyQ01M" data-listing-date-sort="1734480000000" data-listing-file-modified-sort="1743537159492" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1958" data-listing-title-sort="CV Week 2024" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20241218-cv-week/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20241218-cv-week/index.html" class="title listing-title">CV Week 2024</a>
</td>
<td>
<span class="listing-reading-time">10 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>BI</category>
  <category>Statistics</category>
  <category>ML</category>
  <category>Visualization</category>
  <guid>https://frequentist.org/posts/20251126-e-commerce-dashboard/</guid>
  <pubDate>Wed, 26 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>E-Commerce Analytics Dashboard (Power BI + R)</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/projects/dashboard-e-commerce/</link>
  <description><![CDATA[ 






<section id="project-overview" class="level2">
<h2 class="anchored" data-anchor-id="project-overview">Project Overview</h2>
<p>The dashboard integrates Power BI’s interactive reporting capabilities with R’s advanced analytics, enabling both descriptive and predictive analysis in a single, user-friendly interface. It was designed to help decision-makers monitor performance, segment customers, forecast revenue, and inform strategic decisions.</p>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Role
</div>
</div>
<div class="callout-body-container callout-body">
<p>BI Analyst &amp; Developer</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Tools
</div>
</div>
<div class="callout-body-container callout-body">
<p>Power BI, R (forecast, RFM &amp; ABC/XYZ analysis), Power Query, Deneb/Vega-Lite visualizations</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Context
</div>
</div>
<div class="callout-body-container callout-body">
<p>Winner of the November 2025 DataDNA Dataset Challenge (Overall &amp; Accessibility categories)</p>
</div>
</div>
</section>
<section id="key-features-components" class="level2">
<h2 class="anchored" data-anchor-id="key-features-components">Key Features &amp; Components</h2>
<section id="time-series-revenue-forecast" class="level3">
<h3 class="anchored" data-anchor-id="time-series-revenue-forecast">Time Series Revenue Forecast</h3>
<p>Implemented a forecasting model using R (auto.arima) to project future revenue based on historical trends and seasonality. Forecasts are embedded directly into Power Query.</p>
</section>
<section id="product-segmentation-abcxyz-analysis" class="level3">
<h3 class="anchored" data-anchor-id="product-segmentation-abcxyz-analysis">Product Segmentation (ABC/XYZ Analysis)</h3>
<p>Combined traditional ABC revenue contribution analysis with XYZ demand variability to classify products by both importance and consistency. This dual segmentation empowers better inventory, pricing, and merchandising decisions.</p>
</section>
<section id="customer-rfm-segmentation" class="level3">
<h3 class="anchored" data-anchor-id="customer-rfm-segmentation">Customer RFM Segmentation</h3>
<p>Performed Recency-Frequency-Monetary (RFM) analysis to uncover customer value tiers (e.g., “Champions”, “Loyal”, “At-Risk”). This drives targeted retention and growth strategies by identifying high-value and at-risk customers.</p>
</section>
<section id="visual-interactive-reporting" class="level3">
<h3 class="anchored" data-anchor-id="visual-interactive-reporting">Visual &amp; Interactive Reporting</h3>
<p>Designed with accessibility and clarity in mind. The dashboard includes:</p>
<ul>
<li><p>Summary KPIs and trends.</p></li>
<li><p>Loyalty and customer behavior insights.</p></li>
<li><p>Product performance breakdowns.</p></li>
<li><p>Pricing and discount impact visualizations.</p></li>
</ul>
<p>All visuals support interactive filtering and drill-downs to support ad hoc analysis.</p>
</section>
</section>
<section id="dashboard-design" class="level2">
<h2 class="anchored" data-anchor-id="dashboard-design">Dashboard Design</h2>
<div class="quarto-layout-panel" data-layout="[[1,1,1], [1,1,1]]">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-summary" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-summary-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0001.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;1: Summary"><img src="https://frequentist.org/projects/dashboard-e-commerce/0001.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-summary-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Summary
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-loyalty" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-loyalty-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0002.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;2: Loyalty"><img src="https://frequentist.org/projects/dashboard-e-commerce/0002.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-loyalty-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Loyalty
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-products" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-products-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0003.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;3: Products"><img src="https://frequentist.org/projects/dashboard-e-commerce/0003.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-products-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Products
</figcaption>
</figure>
</div>
</div>
</div>
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-pricing" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-pricing-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0004.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;4: Pricing"><img src="https://frequentist.org/projects/dashboard-e-commerce/0004.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-pricing-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Pricing
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-customers" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-customers-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0005.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;5: Customers"><img src="https://frequentist.org/projects/dashboard-e-commerce/0005.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-customers-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Customers
</figcaption>
</figure>
</div>
</div>
</div>
</div>
</section>
<section id="implementation" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="implementation">Implementation</h2>
<section id="data-prep-modeling" class="level3 page-columns page-full">
<h3 class="anchored" data-anchor-id="data-prep-modeling">Data Prep &amp; Modeling</h3>

<div class="no-row-height column-margin column-container"><div class="">
<p>See details of implementation in the blog post: <a href="https://frequentist.org/posts/20251126-e-commerce-dashboard/">Building an E-Commerce Dashboard with Power BI and R</a></p>
</div></div><p>Cleaned and shaped data; structured the model with a “star schema” to optimize performance.</p>
</section>
<section id="custom-visuals" class="level3">
<h3 class="anchored" data-anchor-id="custom-visuals">Custom Visuals</h3>
<p>Leveraged Deneb with Vega-Lite to create advanced visualizations beyond the standard Power BI library.</p>
</section>
<section id="r-integration" class="level3">
<h3 class="anchored" data-anchor-id="r-integration">R Integration</h3>
<p>Integrated R directly into Power BI workflows for forecasting and segmentation logic.</p>
</section>
</section>
<section id="outcomes-impact" class="level2">
<h2 class="anchored" data-anchor-id="outcomes-impact">Outcomes &amp; Impact</h2>
<div class="quarto-layout-panel" data-layout-ncol="3">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Insights
</div>
</div>
<div class="callout-body-container callout-body">
<p>The dashboard allows users to explore revenue trends, understand customer segments, and assess portfolio performance.</p>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Predictive Capability
</div>
</div>
<div class="callout-body-container callout-body">
<p>Forecasting equips business leaders with forward-looking insights that support planning and target-setting.</p>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Important</span>Recognition
</div>
</div>
<div class="callout-body-container callout-body">
<p>Selected as winner in both overall and accessibility categories in the November 2025 DataDNA Dataset Challenge.</p>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="skills-demonstrated" class="level2">
<h2 class="anchored" data-anchor-id="skills-demonstrated">Skills Demonstrated</h2>
<p>Power BI report development • R analytics &amp; forecasting • Data modeling &amp; ETL • Advanced segmentation • Interactive visualization • BI storytelling</p>
</section>
<section id="apply-this-to-your-business" class="level2">
<h2 class="anchored" data-anchor-id="apply-this-to-your-business">Apply This to Your Business</h2>
<p>If your organization could benefit from enhanced e-commerce analytics, you’re welcome to reach out via <a href="../../contact.html">contact page</a> to discuss how I can help leverage data science, analytics, and automation to drive value for your business.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<div id="listing-projects" class="quarto-listing quarto-listing-container-grid">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-categories="QXBwJTJDQWdlbnRpYyUyMEFJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1770249600000" data-listing-file-modified-sort="1774474774886" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="358">
<a href="../../projects/app-sales-signals/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/app-sales-signals/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Sales Signals app (Agentic AI)
</h5>
<div class="card-text listing-description delink">
<p>Automated sales coaching engine that turns B2B call transcripts into real-time, context-aware feedback, combining LLMs and historical customer data to surface revenue and…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="1" data-categories="V2ViYXBwJTJDU2hpbnklMjBmb3IlMjBQeXRob24lMkNQeXRob24lMkNGZWF0dXJlZA==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770399872011" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="431">
<a href="../../projects/webapp-linkedin-analytics/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/webapp-linkedin-analytics/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
LinkedIn Analytics Web Application
</h5>
<div class="card-text listing-description delink">
<p>A local-first web application that transforms LinkedIn Takeout exports into structured analytics on roles, industries, and geographic reach using NLP, unsupervised learning…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="2" data-categories="RGFzaGJvYXJkJTJDUG93ZXIlMjBCSSUyQ1IlMkNGZWF0dXJlZA==" data-listing-date-sort="1761523200000" data-listing-file-modified-sort="1767872662287" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="332">
<a href="../../projects/dashboard-financial-complaints/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/dashboard-financial-complaints/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Consumer Financial Complaints Dashboard (Power BI + R)
</h5>
<div class="card-text listing-description delink">
<p>An interactive Power BI dashboard analyzing CFPB consumer complaints with trend forecasting, geographic and product breakdowns, and causal factor analysis using R models.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="3" data-categories="V2ViYXBwJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1748649600000" data-listing-file-modified-sort="1767873003569" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="481">
<a href="../../projects/webapp-content-mate/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/webapp-content-mate/image.svg" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
E-commerce Content Automation Platform
</h5>
<div class="card-text listing-description delink">
<p>A web application that automates the generation of e-commerce product cards using asynchronous pipelines and LLM-assisted content creation.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="4" data-categories="QkklMkNQb3dlciUyMEJJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1767886951027" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="462">
<a href="../../projects/bi-system-telecamera/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/bi-system-telecamera/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Company-Wide Business Intelligence System
</h5>
<div class="card-text listing-description delink">
<p>An end-to-end BI system consolidating operational, financial, marketing, and sales data into a single decision-support layer.</p>
</div>
</div>
</div></a>
</div>
</div>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Dashboard</category>
  <category>Power BI</category>
  <category>R</category>
  <category>Featured</category>
  <guid>https://frequentist.org/projects/dashboard-e-commerce/</guid>
  <pubDate>Wed, 26 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/projects/dashboard-e-commerce/image.png" medium="image" type="image/png" height="77" width="144"/>
</item>
<item>
  <title>Propensity Score Matching for Causal Analysis</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20251028-propensity-score-matching/</link>
  <description><![CDATA[ 






<p>In marketing analytics, one of the most common questions is:</p>
<blockquote class="blockquote">
<p>Did our campaign actually cause people to subscribe, or the subscribers were already more likely to do so?</p>
</blockquote>
<p>When we only have observational data — not a randomized experiment — it’s tricky to separate correlation from causation. This is where Propensity Score Matching (PSM) comes in.</p>
<p>In this tutorial, we’ll use the <a href="https://archive.ics.uci.edu/dataset/222/bank+marketing" target="_blank">Bank Marketing dataset</a> from the UCI Machine Learning Repository to estimate the causal effect of being previously contacted on the probability of subscribing to a term deposit.</p>
<p>We’ll use the R package <a href="https://cloud.r-project.org/web/packages/MatchIt/index.html" target="_blank">MatchIt</a> to perform matching and evaluate balance.</p>
<section id="data-overview" class="level2">
<h2 class="anchored" data-anchor-id="data-overview">Data Overview</h2>
<p>First, let’s load the necessary libraries and the dataset.</p>
<section id="load-packages" class="level3">
<h3 class="anchored" data-anchor-id="load-packages">Load Packages</h3>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(MatchIt)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lubridate)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(readr)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(skimr)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(here)</span></code></pre></div></div>
</div>
</section>
<section id="load-data" class="level3">
<h3 class="anchored" data-anchor-id="load-data">Load Data</h3>
<p>Here we download the UCI Bank Marketing dataset if it’s not already present in the working directory.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bank-full.csv"</span>))) {</span>
<span id="cb2-2">  dataset <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bank-full.csv"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">";"</span>)</span>
<span id="cb2-3">} <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb2-4">  url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://archive.ics.uci.edu/static/public/222/bank+marketing.zip"</span></span>
<span id="cb2-5">  temp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempfile</span>()</span>
<span id="cb2-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download.file</span>(url, temp)</span>
<span id="cb2-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unzip</span>(temp, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bank.zip"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exdir =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>())</span>
<span id="cb2-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unzip</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bank.zip"</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bank-full.csv"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exdir =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>())</span>
<span id="cb2-9">  dataset <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bank-full.csv"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">";"</span>)</span>
<span id="cb2-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(temp)</span>
<span id="cb2-11">}</span>
<span id="cb2-12">dataset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 45,211
Columns: 17
$ age       &lt;int&gt; 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, …
$ job       &lt;chr&gt; "management", "technician", "entrepreneur", "blue-collar", "…
$ marital   &lt;chr&gt; "married", "single", "married", "married", "single", "marrie…
$ education &lt;chr&gt; "tertiary", "secondary", "secondary", "unknown", "unknown", …
$ default   &lt;chr&gt; "no", "no", "no", "no", "no", "no", "no", "yes", "no", "no",…
$ balance   &lt;int&gt; 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71…
$ housing   &lt;chr&gt; "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes", "yes"…
$ loan      &lt;chr&gt; "no", "no", "yes", "no", "no", "no", "yes", "no", "no", "no"…
$ contact   &lt;chr&gt; "unknown", "unknown", "unknown", "unknown", "unknown", "unkn…
$ day       &lt;int&gt; 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
$ month     &lt;chr&gt; "may", "may", "may", "may", "may", "may", "may", "may", "may…
$ duration  &lt;int&gt; 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,…
$ campaign  &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ pdays     &lt;int&gt; -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, …
$ previous  &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ poutcome  &lt;chr&gt; "unknown", "unknown", "unknown", "unknown", "unknown", "unkn…
$ y         &lt;chr&gt; "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", …</code></pre>
</div>
</div>
</section>
<section id="data-summary" class="level3">
<h3 class="anchored" data-anchor-id="data-summary">Data Summary</h3>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">dataset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">skim</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<caption>Data summary</caption>
<tbody>
<tr class="odd">
<td style="text-align: left;">Name</td>
<td style="text-align: left;">dataset</td>
</tr>
<tr class="even">
<td style="text-align: left;">Number of rows</td>
<td style="text-align: left;">45211</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Number of columns</td>
<td style="text-align: left;">17</td>
</tr>
<tr class="even">
<td style="text-align: left;">_______________________</td>
<td style="text-align: left;"></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Column type frequency:</td>
<td style="text-align: left;"></td>
</tr>
<tr class="even">
<td style="text-align: left;">character</td>
<td style="text-align: left;">10</td>
</tr>
<tr class="odd">
<td style="text-align: left;">numeric</td>
<td style="text-align: left;">7</td>
</tr>
<tr class="even">
<td style="text-align: left;">________________________</td>
<td style="text-align: left;"></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Group variables</td>
<td style="text-align: left;">None</td>
</tr>
</tbody>
</table>
<p><strong>Variable type: character</strong></p>
<table class="caption-top table table-sm table-striped small">
<colgroup>
<col style="width: 19%">
<col style="width: 13%">
<col style="width: 19%">
<col style="width: 5%">
<col style="width: 5%">
<col style="width: 8%">
<col style="width: 12%">
<col style="width: 15%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">skim_variable</th>
<th style="text-align: right;">n_missing</th>
<th style="text-align: right;">complete_rate</th>
<th style="text-align: right;">min</th>
<th style="text-align: right;">max</th>
<th style="text-align: right;">empty</th>
<th style="text-align: right;">n_unique</th>
<th style="text-align: right;">whitespace</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">job</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">13</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">marital</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">education</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">default</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">housing</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">loan</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">contact</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">month</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="odd">
<td style="text-align: left;">poutcome</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">5</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">0</td>
</tr>
<tr class="even">
<td style="text-align: left;">y</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">0</td>
</tr>
</tbody>
</table>
<p><strong>Variable type: numeric</strong></p>
<table class="caption-top table table-sm table-striped small">
<colgroup>
<col style="width: 16%">
<col style="width: 11%">
<col style="width: 16%">
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 6%">
<col style="width: 4%">
<col style="width: 4%">
<col style="width: 5%">
<col style="width: 8%">
<col style="width: 6%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">skim_variable</th>
<th style="text-align: right;">n_missing</th>
<th style="text-align: right;">complete_rate</th>
<th style="text-align: right;">mean</th>
<th style="text-align: right;">sd</th>
<th style="text-align: right;">p0</th>
<th style="text-align: right;">p25</th>
<th style="text-align: right;">p50</th>
<th style="text-align: right;">p75</th>
<th style="text-align: right;">p100</th>
<th style="text-align: left;">hist</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">age</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">40.94</td>
<td style="text-align: right;">10.62</td>
<td style="text-align: right;">18</td>
<td style="text-align: right;">33</td>
<td style="text-align: right;">39</td>
<td style="text-align: right;">48</td>
<td style="text-align: right;">95</td>
<td style="text-align: left;">▅▇▃▁▁</td>
</tr>
<tr class="even">
<td style="text-align: left;">balance</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1362.27</td>
<td style="text-align: right;">3044.77</td>
<td style="text-align: right;">-8019</td>
<td style="text-align: right;">72</td>
<td style="text-align: right;">448</td>
<td style="text-align: right;">1428</td>
<td style="text-align: right;">102127</td>
<td style="text-align: left;">▇▁▁▁▁</td>
</tr>
<tr class="odd">
<td style="text-align: left;">day</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">15.81</td>
<td style="text-align: right;">8.32</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">16</td>
<td style="text-align: right;">21</td>
<td style="text-align: right;">31</td>
<td style="text-align: left;">▇▆▇▆▆</td>
</tr>
<tr class="even">
<td style="text-align: left;">duration</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">258.16</td>
<td style="text-align: right;">257.53</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">103</td>
<td style="text-align: right;">180</td>
<td style="text-align: right;">319</td>
<td style="text-align: right;">4918</td>
<td style="text-align: left;">▇▁▁▁▁</td>
</tr>
<tr class="odd">
<td style="text-align: left;">campaign</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">2.76</td>
<td style="text-align: right;">3.10</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">63</td>
<td style="text-align: left;">▇▁▁▁▁</td>
</tr>
<tr class="even">
<td style="text-align: left;">pdays</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">40.20</td>
<td style="text-align: right;">100.13</td>
<td style="text-align: right;">-1</td>
<td style="text-align: right;">-1</td>
<td style="text-align: right;">-1</td>
<td style="text-align: right;">-1</td>
<td style="text-align: right;">871</td>
<td style="text-align: left;">▇▁▁▁▁</td>
</tr>
<tr class="odd">
<td style="text-align: left;">previous</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0.58</td>
<td style="text-align: right;">2.30</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">275</td>
<td style="text-align: left;">▇▁▁▁▁</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
</section>
<section id="correlation-matrix" class="level2">
<h2 class="anchored" data-anchor-id="correlation-matrix">Correlation Matrix</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(PerformanceAnalytics)</span>
<span id="cb5-2"></span>
<span id="cb5-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb5-4"></span>
<span id="cb5-5">dataset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_n</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_if</span>(is.numeric) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chart.Correlation</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">histogram =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pch =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"+"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-correlation-matrix" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-correlation-matrix-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-correlation-matrix-1.png" class="img-fluid figure-img" width="672">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-correlation-matrix-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Distributions and correlations of numeric variables in the dataset.
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="defining-treatment-and-outcome" class="level2">
<h2 class="anchored" data-anchor-id="defining-treatment-and-outcome">Defining Treatment and Outcome</h2>
<p>Our treatment variable will be whether the client was previously contacted (<code>pdays != -1</code>), and the outcome variable will be whether the client subscribed to a term deposit (<code>y == "yes"</code>).</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">dataset <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dataset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb6-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">treat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(pdays <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>),    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># previously contacted</span></span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">outcome =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"yes"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb6-5">  )</span></code></pre></div></div>
</div>
<ul>
<li><p><code>treat = 1</code>: client was previously contacted (<code>pdays != -1</code>);</p></li>
<li><p><code>treat = 0</code>: new client, not contacted before;</p></li>
<li><p><code>outcome = 1</code>: client subscribed to term deposit;</p></li>
<li><p><code>outcome = 0</code>: client did not subscribe.</p></li>
</ul>
<p>Let’s check the basic rates:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">dataset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(treat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb7-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(),</span>
<span id="cb7-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subscription_rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(outcome)</span>
<span id="cb7-6">  )</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 3
  treat     n subscription_rate
  &lt;dbl&gt; &lt;int&gt;             &lt;dbl&gt;
1     0 36954            0.0916
2     1  8257            0.231 </code></pre>
</div>
</div>
<p>We can clearly see that previously contacted clients (treat = 1) have a higher subscription rate — but this might be due to other factors like income or engagement.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">t_test_all <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(outcome <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> treat, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dataset)</span>
<span id="cb9-2">t_test_all</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Welch Two Sample t-test

data:  outcome by treat
t = -28.552, df = 10051, p-value &lt; 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.1486926 -0.1295874
sample estimates:
mean in group 0 mean in group 1 
     0.09157331      0.23071333 </code></pre>
</div>
</div>
</section>
<section id="covariates-and-model-formula" class="level2">
<h2 class="anchored" data-anchor-id="covariates-and-model-formula">Covariates and Model Formula</h2>
<p>We include demographic and financial variables that can influence both being re-contacted and subscribing. We need to exclude <code>campaign</code> (number of contacts performed during this campaign and for this client), <code>contact</code> (contact communication type), <code>previous</code> (number of contacts performed before this campaign), and <code>poutcome</code> (outcome of the previous marketing campaign) to avoid leaking information about treatment assignment.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">formula <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> treat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> job <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> marital <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> education <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> default <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-2">  balance <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> housing <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> loan </span></code></pre></div></div>
</div>
</section>
<section id="estimating-propensity-scores-and-matching" class="level2">
<h2 class="anchored" data-anchor-id="estimating-propensity-scores-and-matching">Estimating Propensity Scores and Matching</h2>
<p>We now fit the PSM model using nearest-neighbor matching:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">psm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matchit</span>(</span>
<span id="cb12-2">  formula,</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> dataset,</span>
<span id="cb12-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nearest"</span>,</span>
<span id="cb12-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ratio =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb12-6">)</span></code></pre></div></div>
</div>
<p>Let’s inspect the summary. We won’t include the whole output of the summary function, just the number of matched pairs:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(psm)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>nn</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>              Control Treated
All (ESS)       36954    8257
All             36954    8257
Matched (ESS)    8257    8257
Matched          8257    8257
Unmatched       28697       0
Discarded           0       0</code></pre>
</div>
</div>
</section>
<section id="visualizing-balance" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-balance">Visualizing Balance</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(psm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"jitter"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">interactive =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-balance-plot" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-balance-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-balance-plot-1.png" class="img-fluid figure-img" width="672">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-balance-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Plot of propensity scores before and after matching.
</figcaption>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(psm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"density"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">interactive =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb16-2">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which.xs =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> job <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> marital <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> education <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> default <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> balance <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb16-3">    housing <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> loan)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-density-plots-1" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-density-plots-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-density-plots-1.png" class="img-fluid figure-img" width="672">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-density-plots-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Density plots of variables before and after matching.
</figcaption>
</figure>
</div>
</div>
<div class="cell-output-display">
<div id="fig-density-plots-2" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-density-plots-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-density-plots-2.png" class="img-fluid figure-img" width="672">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-density-plots-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Density plots of variables before and after matching.
</figcaption>
</figure>
</div>
</div>
<div class="cell-output-display">
<div id="fig-density-plots-3" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-density-plots-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-density-plots-3.png" class="img-fluid figure-img" width="672">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-density-plots-3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Density plots of variables before and after matching.
</figcaption>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(cobalt)</span>
<span id="cb17-2"></span>
<span id="cb17-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">love.plot</span>(psm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">threshold =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-cobalt-love-plot" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-cobalt-love-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-cobalt-love-plot-1.png" class="img-fluid figure-img" width="672">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-cobalt-love-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6: Love plot showing standardized mean differences before and after matching.
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="estimating-the-treatment-effect" class="level2">
<h2 class="anchored" data-anchor-id="estimating-the-treatment-effect">Estimating the Treatment Effect</h2>
<p>Extract the matched data and estimate the effect on subscription:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">matched_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">match.data</span>(psm)  </span>
<span id="cb18-2">t_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(outcome <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> treat, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> matched_data)</span>
<span id="cb18-3">t_test</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Welch Two Sample t-test

data:  outcome by treat
t = -18.501, df = 15563, p-value &lt; 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.1201447 -0.0971255
sample estimates:
mean in group 0 mean in group 1 
      0.1220782       0.2307133 </code></pre>
</div>
</div>
<p>The difference in means represents the <strong>Average Treatment Effect on the Treated (ATT)</strong> — how much more likely previously contacted customers are to subscribe, compared to similar new customers.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">diff_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unname</span>(t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>estimate[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>estimate[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb20-2">diff_means</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.1086351</code></pre>
</div>
</div>
</section>
<section id="results-and-interpretation" class="level2">
<h2 class="anchored" data-anchor-id="results-and-interpretation">Results and Interpretation</h2>
<p>Before matching, customers who were previously contacted were <img src="https://latex.codecogs.com/png.latex?13.91"> percentage points more likely to subscribe to a term deposit. After controlling for demographics and financial variables via propensity score matching, previously contacted customers were <img src="https://latex.codecogs.com/png.latex?10.86"> percentage points more likely to subscribe than comparable new customers (<img src="https://latex.codecogs.com/png.latex?p%20=%201.3128662%5Ctimes%2010%5E%7B-75%7D">), which is <img src="https://latex.codecogs.com/png.latex?3.05"> percentage points (<img src="https://latex.codecogs.com/png.latex?21.92%5C%25">) less than the initial estimate.</p>
<ul>
<li><p><strong>Before matching</strong>: previously contacted customers have a much higher subscription rate.</p></li>
<li><p><strong>After matching</strong>: the difference decreases, indicating that part of the initial gap was due to selection bias.</p></li>
</ul>
<div class="cell">
<div class="cell-output-display">
<div id="fig-subscription-rates" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-subscription-rates-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251028-propensity-score-matching/index_files/figure-html/fig-subscription-rates-1.png" class="img-fluid figure-img" width="576">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-subscription-rates-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;7: Subscription rates before and after matching.
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="discussion-and-limitations" class="level2">
<h2 class="anchored" data-anchor-id="discussion-and-limitations">Discussion and Limitations</h2>
<p><strong>Unobserved confounding</strong>: we only matched on observed variables; factors like personality or spending habits might still bias the result.</p>
<p><strong>Choice of treatment</strong>: we assumed “previous contact” is the cause; other definitions (e.g., contact channel, number of calls) could be explored.</p>
<p><strong>Generalization</strong>: results apply to customers similar to those treated (ATT), not necessarily to all clients.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>In this tutorial, we’ve demonstrated how to estimate a causal effect using Propensity Score Matching with real marketing data.</p>
<section id="the-key-takeaways" class="level3">
<h3 class="anchored" data-anchor-id="the-key-takeaways">The key takeaways:</h3>
<ul>
<li><p>PSM helps approximate experimental conditions in observational settings.</p></li>
<li><p>Always check covariate balance before interpreting results.</p></li>
<li><p>Proper treatment and covariate definitions are critical.</p></li>
</ul>
<p>The full R code can be easily adapted to other business questions, for example, measuring the effect of marketing emails, app notifications, or loyalty programs when randomization is not possible.</p>
</section>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Ho D, Imai K, King G, Stuart E (2011). “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” <em>Journal of Statistical Software</em>, <em>42</em>(8), 1-28. doi:10.18637/jss.v042.i08 <a href="https://doi.org/10.18637/jss.v042.i08" class="uri">https://doi.org/10.18637/jss.v042.i08</a>.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some of my other posts related to A/B testing, statistics, and R:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="TUwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1771027200000" data-listing-file-modified-sort="1770982202443" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="9" data-listing-word-count-sort="1687" data-listing-title-sort="Implementing a Neural Network in Base R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260214-backpropagating-love/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260214-backpropagating-love/index.html" class="title listing-title">Implementing a Neural Network in Base R</a>
</td>
<td>
<span class="listing-reading-time">9 min</span>
</td>

</tr>

<tr data-index="1" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="4" data-categories="VGltZS1TZXJpZXMlMkNDbHVzdGVyaW5nJTJDUg==" data-listing-date-sort="1756252800000" data-listing-file-modified-sort="1767875578057" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1558" data-listing-title-sort="Time-Series Clustering with R's dtwclust Package" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250827-time-series-clustering/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250827-time-series-clustering/index.html" class="title listing-title">Time-Series Clustering with R’s dtwclust Package</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3MlMkNS" data-listing-date-sort="1754524800000" data-listing-file-modified-sort="1759267192458" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1187" data-listing-title-sort="Minimum Detectable Effect (MDE) Calculation" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250807-mde/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250807-mde/index.html" class="title listing-title">Minimum Detectable Effect (MDE) Calculation</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="6" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3M=" data-listing-date-sort="1753747200000" data-listing-file-modified-sort="1770450295226" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="37" data-listing-word-count-sort="7205" data-listing-title-sort="A/B Testing: Concepts and Techniques" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250729-ab-testing/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250729-ab-testing/index.html" class="title listing-title">A/B Testing: Concepts and Techniques</a>
</td>
<td>
<span class="listing-reading-time">37 min</span>
</td>

</tr>

<tr data-index="7" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="8" data-categories="UHl0aG9uJTJDUiUyQ01hdGxhYg==" data-listing-date-sort="1739491200000" data-listing-file-modified-sort="1770926078462" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="607" data-listing-title-sort="Nerdy Valentine's in Python, R, and Matlab" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250214-valentines/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250214-valentines/index.html" class="title listing-title">Nerdy Valentine’s in Python, R, and Matlab</a>
</td>
<td>
<span class="listing-reading-time">4 min</span>
</td>

</tr>

<tr data-index="9" data-categories="TW9uZXklMkNS" data-listing-date-sort="1727395200000" data-listing-file-modified-sort="1735422934971" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="19" data-listing-word-count-sort="3793" data-listing-title-sort="European Tech Salaries" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240927-euro-tech-money/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240927-euro-tech-money/index.html" class="title listing-title">European Tech Salaries</a>
</td>
<td>
<span class="listing-reading-time">19 min</span>
</td>

</tr>

<tr data-index="10" data-categories="UiUyQ0dlbw==" data-listing-date-sort="1721865600000" data-listing-file-modified-sort="1735422851553" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="931" data-listing-title-sort="Exploring Geospatial Insights with R and rnaturalearth" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240725-views-of-russia/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240725-views-of-russia/index.html" class="title listing-title">Exploring Geospatial Insights with R and rnaturalearth</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>A/B Testing</category>
  <category>Statistics</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20251028-propensity-score-matching/</guid>
  <pubDate>Fri, 31 Oct 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20251028-propensity-score-matching/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Consumer Financial Complaints Dashboard (Power BI + R)</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/projects/dashboard-financial-complaints/</link>
  <description><![CDATA[ 






<section id="project-overview" class="level2">
<h2 class="anchored" data-anchor-id="project-overview">Project Overview</h2>
<p>A Power BI dashboard analyzing CFPB consumer complaint data with built-in forecasting and causal analysis. The project focuses on turning public regulatory data into forward-looking, decision-ready insights for risk and compliance analysis.</p>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Role
</div>
</div>
<div class="callout-body-container callout-body">
<p>BI Analyst &amp; Developer</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Tools
</div>
</div>
<div class="callout-body-container callout-body">
<p>Power BI, Power Query, DAX, R (time-series analysis), custom visuals</p>
</div>
</div>
<div class="callout callout-style-simple callout-note no-icon callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Domain
</div>
</div>
<div class="callout-body-container callout-body">
<p>Financial services, consumer protection, regulatory analytics</p>
</div>
</div>
</section>
<section id="key-features-components" class="level2">
<h2 class="anchored" data-anchor-id="key-features-components">Key Features &amp; Components</h2>
<section id="trend-analysis-of-consumer-complaints" class="level3">
<h3 class="anchored" data-anchor-id="trend-analysis-of-consumer-complaints">Trend analysis of consumer complaints</h3>
<p>Identifies long-term patterns and structural changes in complaint volumes across financial products and institutions.</p>
</section>
<section id="forecasting-of-complaint-volumes" class="level3">
<h3 class="anchored" data-anchor-id="forecasting-of-complaint-volumes">Forecasting of complaint volumes</h3>
<p>Uses time-series modeling to anticipate future complaint levels, enabling proactive risk, compliance, and capacity planning.</p>
</section>
<section id="causal-analysis-of-complaint-drivers" class="level3">
<h3 class="anchored" data-anchor-id="causal-analysis-of-complaint-drivers">Causal analysis of complaint drivers</h3>
<p>Applies causal inference techniques to understand why complaint volumes change, separating underlying drivers from surface correlations.</p>
</section>
<section id="company-geographic-concentration-analysis" class="level3">
<h3 class="anchored" data-anchor-id="company-geographic-concentration-analysis">Company &amp; geographic concentration analysis</h3>
<p>Highlights institutions and regions that disproportionately contribute to complaint volumes, supporting prioritization and oversight.</p>
</section>
<section id="product-level-issue-diagnostics" class="level3">
<h3 class="anchored" data-anchor-id="product-level-issue-diagnostics">Product-level issue diagnostics</h3>
<p>Reveals which financial products and issue types generate the most consumer friction, guiding targeted investigation and remediation.</p>
</section>
</section>
<section id="dashboard-design" class="level2">
<h2 class="anchored" data-anchor-id="dashboard-design">Dashboard Design</h2>
<div class="quarto-layout-panel" data-layout="[[1,1,1], [1,1,1]]">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-overview" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-overview-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0001.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;1: Overview"><img src="https://frequentist.org/projects/dashboard-financial-complaints/0001.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-overview-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Overview
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-geography" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-geography-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0002.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;2: Geography"><img src="https://frequentist.org/projects/dashboard-financial-complaints/0002.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-geography-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Geography
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-products" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-products-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0003.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;3: Products"><img src="https://frequentist.org/projects/dashboard-financial-complaints/0003.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-products-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Products
</figcaption>
</figure>
</div>
</div>
</div>
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-companies" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-companies-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0004.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;4: Companies"><img src="https://frequentist.org/projects/dashboard-financial-complaints/0004.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-companies-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Companies
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div id="fig-factors" class="lightbox quarto-float quarto-figure quarto-figure-center anchored" data-group="dashboard">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-factors-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="0005.svg" class="lightbox" data-gallery="dashboard" title="Figure&nbsp;5: Factor Analysis"><img src="https://frequentist.org/projects/dashboard-financial-complaints/0005.svg" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-factors-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Factor Analysis
</figcaption>
</figure>
</div>
</div>
</div>
</div>
</section>
<section id="implementation" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="implementation">Implementation</h2>

<div class="no-row-height column-margin column-container"><div class="">
<p>See details of implementation in the <a href="https://frequentist.org/posts/20251026-cfpb-dashboard/">blog post</a></p>
</div></div><div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<section id="data-preparation" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="data-preparation">Data Preparation</h3>
<p>Cleaned and standardized raw CFPB complaint data using Power Query, focusing on temporal consistency, categorical normalization, and analytical readiness.</p>
</section>
<section id="embedded-analytics" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="embedded-analytics">Embedded Analytics</h3>
<p>Implemented forecasting and causal analysis using R scripts running directly inside Power Query.</p>
</section>
</div>
<div class="quarto-layout-row">
<section id="data-modeling" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="data-modeling">Data Modeling</h3>
<p>Built a performant semantic model optimized for time-series analysis, forecast comparison, and dimensional slicing.</p>
</section>
<section id="visualization-design" class="level3 quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<h3 class="anchored" data-anchor-id="visualization-design">Visualization Design</h3>
<p>Emphasized clarity and decision-readiness, with clear separation between historical data, modeled forecasts, and causal insights.</p>
</section>
</div>
</div>
</section>
<section id="outcomes-impact" class="level2">
<h2 class="anchored" data-anchor-id="outcomes-impact">Outcomes &amp; Impact</h2>
<div class="quarto-layout-panel" data-layout-ncol="3">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Enables forward-looking monitoring of complaint trends.</p>
</div>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Helps identify products and institutions driving complaint growth.</p>
</div>
</div>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="callout callout-style-simple callout-tip">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>Demonstrates how advanced analytics can be embedded directly into BI workflows.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="skills-demonstrated" class="level2">
<h2 class="anchored" data-anchor-id="skills-demonstrated">Skills Demonstrated</h2>
<p>Business Intelligence dashboarding • Public data analytics • Financial services domain knowledge • Data modeling &amp; ETL • Power BI &amp; DAX • Analytical storytelling • Regulatory and compliance analytics</p>
</section>
<section id="apply-this-to-your-business" class="level2">
<h2 class="anchored" data-anchor-id="apply-this-to-your-business">Apply This to Your Business</h2>
<p>If your organization could benefit from advanced analytics embedded into Power BI dashboards to support risk, compliance, or operational decision-making, <a href="contact.html">let’s talk</a>. I can help design and build tailored BI solutions that turn data into actionable insights.</p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<div id="listing-projects" class="quarto-listing quarto-listing-container-grid">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-categories="QXBwJTJDQWdlbnRpYyUyMEFJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1770249600000" data-listing-file-modified-sort="1774474774886" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="358">
<a href="../../projects/app-sales-signals/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/app-sales-signals/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Sales Signals app (Agentic AI)
</h5>
<div class="card-text listing-description delink">
<p>Automated sales coaching engine that turns B2B call transcripts into real-time, context-aware feedback, combining LLMs and historical customer data to surface revenue and…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="1" data-categories="V2ViYXBwJTJDU2hpbnklMjBmb3IlMjBQeXRob24lMkNQeXRob24lMkNGZWF0dXJlZA==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770399872011" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="431">
<a href="../../projects/webapp-linkedin-analytics/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/webapp-linkedin-analytics/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
LinkedIn Analytics Web Application
</h5>
<div class="card-text listing-description delink">
<p>A local-first web application that transforms LinkedIn Takeout exports into structured analytics on roles, industries, and geographic reach using NLP, unsupervised learning…</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="2" data-categories="RGFzaGJvYXJkJTJDUG93ZXIlMjBCSSUyQ1IlMkNGZWF0dXJlZA==" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767872628244" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="378">
<a href="../../projects/dashboard-e-commerce/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" src="https://frequentist.org/projects/dashboard-e-commerce/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
E-Commerce Analytics Dashboard (Power BI + R)
</h5>
<div class="card-text listing-description delink">
<p>An interactive BI dashboard combining customer RFM segmentation, ABC/XYZ product analysis, and revenue forecasting using R models embedded in Power BI.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="3" data-categories="V2ViYXBwJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1748649600000" data-listing-file-modified-sort="1767873003569" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="481">
<a href="../../projects/webapp-content-mate/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/webapp-content-mate/image.svg" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
E-commerce Content Automation Platform
</h5>
<div class="card-text listing-description delink">
<p>A web application that automates the generation of e-commerce product cards using asynchronous pipelines and LLM-assisted content creation.</p>
</div>
</div>
</div></a>
</div>
<div class="g-col-1" data-index="4" data-categories="QkklMkNQb3dlciUyMEJJJTJDUHl0aG9uJTJDRmVhdHVyZWQ=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1767886951027" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="462">
<a href="../../projects/bi-system-telecamera/index.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top">
<img loading="lazy" data-src="../../projects/bi-system-telecamera/image.png" class="thumbnail-image card-img" style="height: 150px;">
</p>
<div class="card-body post-contents">
<h5 class="no-anchor card-title listing-title">
Company-Wide Business Intelligence System
</h5>
<div class="card-text listing-description delink">
<p>An end-to-end BI system consolidating operational, financial, marketing, and sales data into a single decision-support layer.</p>
</div>
</div>
</div></a>
</div>
</div>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Dashboard</category>
  <category>Power BI</category>
  <category>R</category>
  <category>Featured</category>
  <guid>https://frequentist.org/projects/dashboard-financial-complaints/</guid>
  <pubDate>Mon, 27 Oct 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/projects/dashboard-financial-complaints/image.png" medium="image" type="image/png" height="77" width="144"/>
</item>
<item>
  <title>Building the Analytical Dashboard with Power BI and R</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20251026-cfpb-dashboard/</link>
  <description><![CDATA[ 






<p>In this article I’m going to share the process I followed in developing this analytical dashboard. My goal was to take the Consumer Financial Complaints Dataset and distill it into a set of visualizations, integrating statistical modeling and time series analysis.</p>
<p>For those who enjoy the technical deep dives, the repository containing Exploratory Data Analysis, Time Series Forecasting, and Causal Analysis documents is linked at the end of this post. Here is an overview of how the dashboard came together.</p>
<section id="data-preparation-and-exploratory-data-analysis-eda" class="level2">
<h2 class="anchored" data-anchor-id="data-preparation-and-exploratory-data-analysis-eda">Data Preparation and Exploratory Data Analysis (EDA)</h2>
<p>The entire project started with ensuring the data was robust and ready for analysis. I performed Exploratory Data Analysis (EDA) to check the data structure, look at distributions, and establish correlations.</p>
<ul>
<li><p><strong>Loading and Cleaning:</strong> I loaded two main sheets from the Excel file: the <code>complaints</code> data, which contained 62,516 rows and 19 columns, and the <code>companies</code> data, which contained 1,081 rows and 9 columns.</p></li>
<li><p><strong>Joining Tables:</strong> to create a single comprehensive analytical dataset, I joined the <code>complaints</code> data and the <code>companies</code> data. This resulted in a dataset comprising 62,516 rows and 27 variables.</p></li>
</ul>
<section id="visualizing-distributions-and-relationships" class="level3">
<h3 class="anchored" data-anchor-id="visualizing-distributions-and-relationships">Visualizing Distributions and Relationships</h3>
<p>During EDA, I needed to check the univariate distributions of the numeric variables (Figure 1) and examine correlations between them to identify relationships and potential multicollinearity (Figure 2).</p>
<div id="fig-numeric-distributions-1" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Distributions of Numeric Variables">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-numeric-distributions-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/01_eda_files/figure-html/fig-numeric-distributions-1.svg" class="img-fluid figure-img" alt="Distributions of Numeric Variables">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-numeric-distributions-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Distributions of Numeric Variables
</figcaption>
</figure>
</div>
<div id="fig-corrplot-1" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Distributions of Numeric Variables">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-corrplot-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/01_eda_files/figure-html/fig-corrplot-1.svg" class="img-fluid figure-img" alt="Distributions of Numeric Variables">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-corrplot-1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Correlation Matrix of Numeric Variables
</figcaption>
</figure>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>See the document <a href="../../posts/20251026-cfpb-dashboard/01_eda.html" target="_blank">Exploratory Data Analysis</a> for the detailed description of the EDA process.</p>
</div>
</div>
</section>
</section>
<section id="forecasting-complaint-trends-time-series-analysis" class="level2">
<h2 class="anchored" data-anchor-id="forecasting-complaint-trends-time-series-analysis">Forecasting Complaint Trends (Time Series Analysis)</h2>
<p>My next step was to create time series forecasts of complaint volumes segmented by state, which is essential for identifying upcoming trends.</p>
<section id="granularity-adjustment" class="level3">
<h3 class="anchored" data-anchor-id="granularity-adjustment">Granularity Adjustment</h3>
<p>My initial approach using daily granularity proved to be <strong>too slow</strong> for the necessary integration within the Power BI environment using PowerQuery. To achieve faster processing, I switched the analysis to <strong>monthly granularity</strong>.</p>
</section>
<section id="modeling-approach" class="level3">
<h3 class="anchored" data-anchor-id="modeling-approach">Modeling Approach</h3>
<p>I aggregated the data by state and month. The modeling function I used relied on the <code>auto.arima</code> approach for time series analysis, with a fallback to ETS if ARIMA failed, and the forecast horizon was set to 6 months. This produced actual complaint counts, forecasts, and confidence intervals that could be successfully integrated into Power BI.</p>
<p>I aggregated the data to the monthly level before running the forecast models:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">monthly <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> dataset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb1-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">month =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">floor_date</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(date_received), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"month"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(state, month) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>   </span>
<span id="cb1-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">complaints =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.groups =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"drop"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>   </span>
<span id="cb1-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(state, month) </span></code></pre></div></div>
<p>Using <code>floor_date</code> from the <code>lubridate</code> package, which sets the date to the start of a month, allowed me to create a “many-to-one” relationship between the resulting table and the <code>Calendar</code> table in Power BI, using the <code>month</code> column as the key.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>See the full Time Series Analysis document <a href="../../posts/20251026-cfpb-dashboard/02_time_series_forecast.html" target="_blank">here</a> for details.</p>
</div>
</div>
</section>
<section id="dashboard-visual-complaints-over-time" class="level3">
<h3 class="anchored" data-anchor-id="dashboard-visual-complaints-over-time">Dashboard Visual: Complaints Over Time</h3>
<p>The resulting forecast is presented visually in the dashboard’s “Overview” and “Geography” sections, showing the actual and forecasted complaint volumes over time.</p>
<div id="fig-time-series-by-census-division" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Time series by census division">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-time-series-by-census-division-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/time-series-by-census-division.svg" class="img-fluid figure-img" alt="Time series by census division">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-time-series-by-census-division-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Time series by census division
</figcaption>
</figure>
</div>
</section>
</section>
<section id="factor-analysis-causal-modeling" class="level2">
<h2 class="anchored" data-anchor-id="factor-analysis-causal-modeling">Factor Analysis (Causal Modeling)</h2>
<p>I performed causal analysis to estimate the effect of different company characteristics, such as enforcement history and size, on critical performance metrics like the complaint rate.</p>
<section id="full-linear-model" class="level3">
<h3 class="anchored" data-anchor-id="full-linear-model">Full Linear Model</h3>
<p>I built a linear regression model that included multiple covariates: enforcement status, company size tier, reputation score, and timely response rate, targeting the log of <code>complaints_per_1pct_share</code>.</p>
<ul>
<li><strong>Key Finding:</strong> After running the full model, I determined that <strong>company size</strong> was the only variable that was statistically significant.</li>
</ul>
</section>
<section id="reduced-model" class="level3">
<h3 class="anchored" data-anchor-id="reduced-model">Reduced Model</h3>
<p>I then created a simplified, reduced model using only company size as a predictor. This model exhibited strong quality metrics: adjusted R² of 0.548, and a highly significant p-value of 1.143e-188. This suggested that company size alone explains a significant portion of the variance in the complaint rate, generally indicating that <strong>larger companies tend to experience lower complaint rates</strong> compared to smaller ones.</p>
<p>The model metrics confirmed the quality of the reduced regression model:</p>
<table class="caption-top table">
<caption>Reduced Model Quality Metrics</caption>
<thead>
<tr class="header">
<th>r_squared</th>
<th>adj_r_squared</th>
<th>f_statistic</th>
<th>p_value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>0.5487796</td>
<td>0.5483614</td>
<td>1312.292</td>
<td>1.142991e-188</td>
</tr>
</tbody>
</table>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>See the full <a href="../../posts/20251026-cfpb-dashboard/03_causal_inference.html" target="_blank">Causal Analysis</a> document for details.</p>
</div>
</div>
</section>
<section id="dashboard-visuals-model-comparison" class="level3">
<h3 class="anchored" data-anchor-id="dashboard-visuals-model-comparison">Dashboard Visuals: Model Comparison</h3>
<p>The dashboard’s “Factor Analysis” tab transparently presents the modeling results, comparing the full and reduced models and highlighting the significant variable.</p>
<table class="caption-top table">
<caption>Models Comparison Table</caption>
<thead>
<tr class="header">
<th><strong>Variable</strong></th>
<th>Full</th>
<th>Reduced</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>F statistic</td>
<td>3.286E+002</td>
<td>1.312E+003</td>
</tr>
<tr class="even">
<td>p value</td>
<td>9.721E-185</td>
<td>1.143E-188</td>
</tr>
<tr class="odd">
<td>R²</td>
<td>5.498E-001</td>
<td>5.488E-001</td>
</tr>
<tr class="even">
<td>R² adj.</td>
<td>5.482E-001</td>
<td>5.484E-001</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="advanced-techniques-anomaly-detection-and-clustering" class="level2">
<h2 class="anchored" data-anchor-id="advanced-techniques-anomaly-detection-and-clustering">Advanced Techniques (Anomaly Detection and Clustering)</h2>
<p>I also explored advanced techniques, though I ultimately chose to exclude their visuals from the final report to maintain focus and brevity.</p>
<section id="anomaly-detection" class="level3">
<h3 class="anchored" data-anchor-id="anomaly-detection"><strong>Anomaly Detection</strong></h3>
<p>I used STL decomposition and z-score methods to identify anomalies in the monthly complaint volumes. This approach helped to detect only a couple of anomalies (one spike and one drop). I decided to <strong>skip including this visual in the final Power BI report</strong> because the low number of detected anomalies didn’t add substantial new information to the report at this stage.</p>
<div id="fig-anomaly-plot" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Time Series with Anomalies Highlighted">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-anomaly-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/04_anomaly_detection_files/figure-html/anomaly_plot-1.svg" class="img-fluid figure-img" alt="Time Series with Anomalies Highlighted">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-anomaly-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Time Series with Anomalies Highlighted
</figcaption>
</figure>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>See <a href="../../posts/20251026-cfpb-dashboard/04_anomaly_detection.html" target="_blank">Anomaly Detection</a> for full details.</p>
</div>
</div>
</section>
<section id="clustering-analysis" class="level3">
<h3 class="anchored" data-anchor-id="clustering-analysis"><strong>Clustering Analysis</strong></h3>
<p>I performed K-means clustering on various company features (like enforcement history, size, and complaint rate). I found that the optimal number of clusters was <strong>3</strong>, based on the silhouette scores. These clusters effectively separated companies:</p>
<ul>
<li><p>Cluster 1 (Large/Medium, low complaints, near-zero enforcement),</p></li>
<li><p>Cluster 2 (Small, high complaints, zero enforcement), and</p></li>
<li><p>Cluster 3 (Medium/Small, high complaints, some enforcement history).</p></li>
</ul>
<p>However, since visualizing these clusters simply reinforced the patterns already established by the causal analysis (the importance of size), I chose to <strong>skip the clustering visualizations</strong> in the final dashboard.</p>
<div id="fig-cluster_plot" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-cluster_plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/05_clustering_files/figure-html/cluster_plot-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-cluster_plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: The plot of clusters reveals the same patterns observed in the causal analysis
</figcaption>
</figure>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>See <a href="../../posts/20251026-cfpb-dashboard/05_clustering.html" target="_blank">Clustering Analysis</a> for full details.</p>
</div>
</div>
</section>
</section>
<section id="key-dashboard-visuals" class="level2">
<h2 class="anchored" data-anchor-id="key-dashboard-visuals">Key Dashboard Visuals</h2>
<p>The final dashboard provides users with an interactive, clear overview of the financial complaints landscape, structured across tabs for Overview, Geography, Products, Companies, and Factor Analysis.</p>
<section id="overview" class="level3">
<h3 class="anchored" data-anchor-id="overview">Overview</h3>
<p>The dashboard immediately shows key metrics.</p>
<div id="fig-overview" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-overview-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-overview-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6: Overview Metrics and Complaints by Product
</figcaption>
</figure>
</div>
</section>
<section id="geography-and-product-breakdowns" class="level3">
<h3 class="anchored" data-anchor-id="geography-and-product-breakdowns">Geography and Product Breakdowns</h3>
<p>The geographical tab highlights totals by Census Region, for example, the South region accounted for 4K complaints, the West for 3K, and the Midwest for 1K.</p>
<div id="fig-geography" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-geography-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/0002.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-geography-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;7: Complaints by Region
</figcaption>
</figure>
</div>
<p>In terms of products, corresponding report section provides breakdown of complaints by product, including YoY changes.</p>
<div id="fig-product" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-product-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/0003.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-product-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;8: Complaints by Product Details
</figcaption>
</figure>
</div>
</section>
<section id="companies" class="level3">
<h3 class="anchored" data-anchor-id="companies">Companies</h3>
<p>This report section provides descriptive statistics about companies.</p>
<div id="fig-companies" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-companies-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/0004.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-companies-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;9: Company Statistics
</figcaption>
</figure>
</div>
</section>
<section id="factor-analysis" class="level3">
<h3 class="anchored" data-anchor-id="factor-analysis">Factor Analysis</h3>
<p>This tab presents the results of the causal modeling, comparing the full and reduced models.</p>
<div id="fig-factor-analysis" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-factor-analysis-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20251026-cfpb-dashboard/0005.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-factor-analysis-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;10: Factor Analysis
</figcaption>
</figure>
</div>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>I hope this overview of the dashboard construction process, from initial data cleaning to final model integration, gives the reader a clear picture of the analytical work involved.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p><a href="https://github.com/AxesAccess/DataDNA-Dataset-Challenge-Consumer-Financial-Complaints-Dataset-October-2025" target="_blank">Project Repository</a></p></li>
<li><p><a href="https://app.powerbi.com/view?r=eyJrIjoiYzgwMmRmZTEtZWJiZi00NGIwLWE3YTUtNjJiMDFjZTg2NTU2IiwidCI6ImZmYzg3OTVlLTAxODUtNDg5Yi05ZGE2LTQ5MDI0MTJmMDNhMCIsImMiOjl9" target="_blank">Interactive Dashboard</a></p></li>
</ul>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some related articles you might find interesting:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="TUwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1771027200000" data-listing-file-modified-sort="1770982202443" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="9" data-listing-word-count-sort="1687" data-listing-title-sort="Implementing a Neural Network in Base R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260214-backpropagating-love/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260214-backpropagating-love/index.html" class="title listing-title">Implementing a Neural Network in Base R</a>
</td>
<td>
<span class="listing-reading-time">9 min</span>
</td>

</tr>

<tr data-index="1" data-categories="S1BJJTIwRGVzaWduJTJDQkklMkNTdHJhdGVneQ==" data-listing-date-sort="1769558400000" data-listing-file-modified-sort="1772988050630" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2630" data-listing-title-sort="The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260128-seven-step-kpi-blueprint/index.html" class="title listing-title">The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QXBwJTJDVmlzdWFsaXphdGlvbiUyQ05MUCUyQ1B5dGhvbg==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770067837480" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1445" data-listing-title-sort="Building a Privacy-First LinkedIn Analytics Platform" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251227-linkedin-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251227-linkedin-analytics/index.html" class="title listing-title">Building a Privacy-First LinkedIn Analytics Platform</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="4" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDU3RhdGlzdGljcyUyQ1I=" data-listing-date-sort="1761868800000" data-listing-file-modified-sort="1767873908846" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1178" data-listing-title-sort="Propensity Score Matching for Causal Analysis" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251028-propensity-score-matching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251028-propensity-score-matching/index.html" class="title listing-title">Propensity Score Matching for Causal Analysis</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="6" data-categories="VGltZS1TZXJpZXMlMkNDbHVzdGVyaW5nJTJDUg==" data-listing-date-sort="1756252800000" data-listing-file-modified-sort="1767875578057" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1558" data-listing-title-sort="Time-Series Clustering with R's dtwclust Package" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250827-time-series-clustering/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250827-time-series-clustering/index.html" class="title listing-title">Time-Series Clustering with R’s dtwclust Package</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="7" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3MlMkNS" data-listing-date-sort="1754524800000" data-listing-file-modified-sort="1759267192458" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1187" data-listing-title-sort="Minimum Detectable Effect (MDE) Calculation" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250807-mde/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250807-mde/index.html" class="title listing-title">Minimum Detectable Effect (MDE) Calculation</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="8" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3M=" data-listing-date-sort="1753747200000" data-listing-file-modified-sort="1770450295226" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="37" data-listing-word-count-sort="7205" data-listing-title-sort="A/B Testing: Concepts and Techniques" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250729-ab-testing/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250729-ab-testing/index.html" class="title listing-title">A/B Testing: Concepts and Techniques</a>
</td>
<td>
<span class="listing-reading-time">37 min</span>
</td>

</tr>

<tr data-index="9" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="10" data-categories="UHl0aG9uJTJDUiUyQ01hdGxhYg==" data-listing-date-sort="1739491200000" data-listing-file-modified-sort="1770926078462" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="607" data-listing-title-sort="Nerdy Valentine's in Python, R, and Matlab" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250214-valentines/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250214-valentines/index.html" class="title listing-title">Nerdy Valentine’s in Python, R, and Matlab</a>
</td>
<td>
<span class="listing-reading-time">4 min</span>
</td>

</tr>

<tr data-index="11" data-categories="QkklMkNFVEw=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1769683733598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="397" data-listing-title-sort="BI System Blueprint" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250106-bi-flowchart/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250106-bi-flowchart/index.html" class="title listing-title">BI System Blueprint</a>
</td>
<td>
<span class="listing-reading-time">2 min</span>
</td>

</tr>

<tr data-index="12" data-categories="Q29tcFZpcyUyQ01M" data-listing-date-sort="1734480000000" data-listing-file-modified-sort="1743537159492" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1958" data-listing-title-sort="CV Week 2024" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20241218-cv-week/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20241218-cv-week/index.html" class="title listing-title">CV Week 2024</a>
</td>
<td>
<span class="listing-reading-time">10 min</span>
</td>

</tr>

<tr data-index="13" data-categories="TW9uZXklMkNS" data-listing-date-sort="1727395200000" data-listing-file-modified-sort="1735422934971" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="19" data-listing-word-count-sort="3793" data-listing-title-sort="European Tech Salaries" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240927-euro-tech-money/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240927-euro-tech-money/index.html" class="title listing-title">European Tech Salaries</a>
</td>
<td>
<span class="listing-reading-time">19 min</span>
</td>

</tr>

<tr data-index="14" data-categories="UiUyQ0dlbw==" data-listing-date-sort="1721865600000" data-listing-file-modified-sort="1735422851553" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="931" data-listing-title-sort="Exploring Geospatial Insights with R and rnaturalearth" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240725-views-of-russia/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240725-views-of-russia/index.html" class="title listing-title">Exploring Geospatial Insights with R and rnaturalearth</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>BI</category>
  <category>Statistics</category>
  <category>ML</category>
  <category>Visualization</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20251026-cfpb-dashboard/</guid>
  <pubDate>Sun, 26 Oct 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Building a Credit Risk Dashboard with Power BI and R</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20250927-credit-risk-analytics/</link>
  <description><![CDATA[ 






<p>I recently took part in the Credit Risk Analytics Challenge, where the task was to build a dynamic financial risk dashboard using Power BI. Seeing the words “Risk” and “Analytics” in the title, I focused on modeling reality: building a system that could estimate default probabilities, simulate portfolio outcomes, and demonstrate the technical side of credit risk analytics.</p>
<p>In this article, I’ll walk through the ideas behind my approach, the models I built, and how I translated them into an interactive dashboard.</p>
<p>I chose <strong>R</strong> for data processing and modeling due to its efficient handling of statistical modeling for complex datasets with numerous categorical variables. Importantly, I created integrated R scripts that were directly deployed into Power BI report using Power Query with no external data processing — everything in one <strong>.pbix</strong> file which can run <a href="https://app.powerbi.com/view?r=eyJrIjoiNTkzOTc5OTQtYmQxNS00YzlmLWE3OWYtY2JjYWUwNTI3MGEzIiwidCI6ImZmYzg3OTVlLTAxODUtNDg5Yi05ZGE2LTQ5MDI0MTJmMDNhMCIsImMiOjl9" target="_blank">in the Power BI service</a> as well.</p>
<section id="phase-1-data-cleaning-and-engineering" class="level2">
<h2 class="anchored" data-anchor-id="phase-1-data-cleaning-and-engineering">Phase 1: Data Cleaning and Engineering</h2>
<p>The project began with a Credit Risk Dataset composed of <strong>32,581 rows and 29 columns</strong>, split between 12 character (categorical) and 17 numeric variables. My first step, documented in detail in the <a href="01_eda.html" target="_blank">Exploratory Data Analysis of Credit Risk Dataset</a>, was ensuring the data was pristine for modeling.</p>
<section id="variable-selection-and-redundancy" class="level3">
<h3 class="anchored" data-anchor-id="variable-selection-and-redundancy">1. Variable Selection and Redundancy</h3>
<p>I immediately excluded variables that lacked predictive utility or caused issues in regression models. I removed the simple identifier <strong><code>client_ID</code></strong> and the geographical coordinates (<strong><code>city_latitude</code></strong> and <strong><code>city_longitude</code></strong>).</p>
<div id="fig-corrplot" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-corrplot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250927-credit-risk-analytics/fig-corrplot-1.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" width="576">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-corrplot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: The correlation plot shows a strong correlation between some of the numerical variables.
</figcaption>
</figure>
</div>
<p>A key part of cleaning involved addressing redundancy. A correlation check revealed that <strong><code>loan_percent_income</code></strong> and <strong><code>loan_to_income_ratio</code></strong> were nearly perfectly correlated (0.9989417). The decision was made to drop <strong><code>loan_percent_income</code></strong> to eliminate multicollinearity and simplify the feature set.</p>
</section>
<section id="handling-missing-data-and-outliers" class="level3">
<h3 class="anchored" data-anchor-id="handling-missing-data-and-outliers">2. Handling Missing Data and Outliers</h3>
<p>I identified missing values primarily in <strong><code>person_emp_length</code></strong> (97% complete) and <strong><code>loan_int_rate</code></strong> (90% complete). Since no obvious pattern of missingness was found, I employed a simple and robust imputation method: replacing all missing numeric values with the <strong>median</strong> of their respective columns.</p>
<div id="fig-missing-upset" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-missing-upset-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250927-credit-risk-analytics/fig-missing-upset-1.svg" class="img-fluid quarto-figure quarto-figure-center figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-missing-upset-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: The offset plot reveals no noticeable pattern in the missing data.
</figcaption>
</figure>
</div>
<p>Outlier handling required careful judgment. I found extreme values in <strong><code>person_age</code></strong> and <strong><code>person_emp_length</code></strong> that seemed like data entry errors (e.g., ages greater than 100).</p>
<div id="fig-outliers-zoom" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-outliers-zoom-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250927-credit-risk-analytics/fig-outliers-zoom-1.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" width="576">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-outliers-zoom-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: A boxplot displaying numerical variables, with outliers highlighted in red.
</figcaption>
</figure>
</div>
<p>My logic was to replace these anomalies with the median to normalize the distributions without losing valid information. However, extreme values in <strong><code>other_debt</code></strong> and <strong><code>person_income</code></strong> were retained, as they were judged to be genuine observations.</p>
</section>
</section>
<section id="phase-2-building-and-optimizing-predictive-models" class="level2">
<h2 class="anchored" data-anchor-id="phase-2-building-and-optimizing-predictive-models">Phase 2: Building and Optimizing Predictive Models</h2>
<p>The core task was to predict the <strong><code>loan_status</code></strong> variable, where <code>1</code> signifies default and <code>0</code> signifies no default. I explored multiple models, prioritizing predictive power (AUC) and, for deployment, interpretability (Marginal Effects for the dashboard). The process is described in detail in <a href="02_modelling.html" target="_blank">Building Predictive Models for Credit Risk</a>.</p>
<section id="logistic-regression-glm-for-interpretability" class="level3">
<h3 class="anchored" data-anchor-id="logistic-regression-glm-for-interpretability">1. Logistic Regression (GLM) for Interpretability</h3>
<p>I initially built a Logistic Regression model (GLM), which provides easily interpretable coefficient estimates. Due to the nature of the model and to avoid singularity issues, I excluded <strong><code>loan_grade</code></strong> and <strong><code>loan_int_rate</code></strong> (as they are likely set by the bank based on pre-assessment).</p>
<p>For categorical features with high cardinality, such as <code>city</code> and <code>country</code>, I used <strong>one-hot encoding</strong>. I then employed <strong>stepwise selection (MASS::stepAIC)</strong>, an optimization technique that systematically adds or removes variables to minimize the model’s AIC (Akaike Information Criterion), yielding a balance between fit and simplicity.</p>
<p>However, when reviewing the stepwise results, I ensured that if a categorical variable (like <code>person_home_ownership</code>) was implicitly selected via one of its encoded levels (like <code>person_home_ownershipRENT</code>), all levels were included in the final model to maintain structural integrity. The final GLM model using only selected, significant variables (<strong><code>person_home_ownership</code>, <code>person_emp_length</code>, <code>loan_intent</code>, <code>loan_amnt</code>, <code>loan_to_income_ratio</code>, <code>cb_person_default_on_file</code></strong>) showed strong performance, achieving a <strong>Cross-Validated ROC (AUC) of 0.809076</strong>.</p>
<p>The diagnostics also confirmed that the model was well-behaved, with a dispersion parameter less than 1, indicating <strong>no overdispersion</strong>.</p>
</section>
<section id="advanced-modeling-random-forest-and-xgboost" class="level3">
<h3 class="anchored" data-anchor-id="advanced-modeling-random-forest-and-xgboost">2. Advanced Modeling: Random Forest and XGBoost</h3>
<p>To benchmark and potentially surpass the GLM performance, I implemented a <strong>Random Forest</strong> model. Since Random Forests can handle correlated features well, I initially trained this model on the full dataset (excluding only the two variables noted above).</p>
<div id="fig-variable-importance" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-variable-importance-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250927-credit-risk-analytics/rf-variable-importance-1.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" width="576">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-variable-importance-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: The variable importance plot from the Random Forest model trained on the full dataset helps finding the least significant features based on both Accuracy and Gini criteria.
</figcaption>
</figure>
</div>
<p>The initial Random Forest model achieved an AUC of <strong>0.8519</strong>. Using the variable importance metrics (Mean Decrease Accuracy and Mean Decrease Gini), I identified features contributing little to predictive power, such as <code>gender</code>, <code>marital_status</code>, and <code>education_level</code>. After dropping these weak predictors, the simplified Random Forest model improved performance, reaching an AUC of <strong>0.861</strong>.</p>
<p>Finally, I tested <strong>XGBoost (Extreme Gradient Boosting)</strong>, a powerful ensemble technique known for maximizing accuracy. After converting categorical variables to numeric matrices (a necessary step for XGBoost), this model delivered the best result: an AUC of <strong>0.8963</strong>. XGBoost’s feature importance analysis indicated that <strong><code>loan_to_income_ratio</code></strong> and <strong><code>person_income</code></strong> were the most influential predictors, followed by <strong><code>person_home_ownership</code></strong>.</p>
<div id="fig-roc-plot" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-roc-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250927-credit-risk-analytics/xgboost-1.svg" class="img-fluid quarto-figure quarto-figure-center figure-img" width="576">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-roc-plot-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: The ROC plot shows relationship between the true positive rate (sensitivity) and the false positive rate (1 – specificity). The higher the line, the better the model.
</figcaption>
</figure>
</div>
<p>The superior performance of XGBoost led me to select it as the engine for the portfolio simulation component of the dashboard, while the GLM was retained for the interpretable marginal effects calculations.</p>
</section>
</section>
<section id="phase-3-operationalizing-models-via-power-query-r-scripts" class="level2">
<h2 class="anchored" data-anchor-id="phase-3-operationalizing-models-via-power-query-r-scripts">Phase 3: Operationalizing Models via Power Query R Scripts</h2>
<p>A critical technical challenge was integrating these analytical processes so the dashboard could update dynamically. I created R scripts to be embedded directly into Power Query, documented in <a href="03_power_query.html" target="_blank">Scripts for Power Query</a> and <a href="04_synthetic_data.html" target="_blank">Synthetic Data For Simulations</a>.</p>
<section id="calculating-marginal-effects" class="level3">
<h3 class="anchored" data-anchor-id="calculating-marginal-effects">1. Calculating Marginal Effects</h3>
<p>For the <strong>Default Risk Calculator</strong> section of the dashboard, I needed to show how a change in an individual factor affects the probability of default (PD). For this, I used the interpretable Logistic Regression model.</p>
<p>The script calculates <strong>Average Marginal Effects (AMEs)</strong> for standardized (to compare effects) and raw values (for PD calculation). The logic behind AMEs is that they estimate the change in the probability of default (<img src="https://latex.codecogs.com/png.latex?P">) associated with a unit change in a predictor variable (<img src="https://latex.codecogs.com/png.latex?x_j">), calculated as <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j%20%5Ccdot%20p%20%5Ccdot%20(1-p)">, where <img src="https://latex.codecogs.com/png.latex?p"> is the average predicted probability. This output helps users understand the true impact of each driver, not just the raw logistic coefficient (<img src="https://latex.codecogs.com/png.latex?%5Cbeta_j">).</p>
<p>The GLM used for this analysis achieved a Recall of <strong>0.9572</strong> and a Precision of <strong>0.8494</strong>.</p>
</section>
<section id="generating-synthetic-data-for-simulation" class="level3">
<h3 class="anchored" data-anchor-id="generating-synthetic-data-for-simulation">2. Generating Synthetic Data for Simulation</h3>
<p>To implement the <strong>Portfolio Simulation</strong> feature, I needed a massive, realistic dataset capable of reflecting various scenarios. I generated <strong>50,000 synthetic borrowers</strong>.</p>
<p>The generation process involved sampling the original data while ensuring that the proportions of categorical features, such as <code>loan_intent</code> and <code>person_home_ownership</code>, were preserved. To simulate realistic diversity, I added <strong>jitter</strong> (random noise) to numeric features like <code>person_age</code> and <code>loan_amnt</code>. Finally, I applied the pre-trained on the original data XGBoost model to these 50,000 synthetic records to predict a <strong><code>Predicted_PD</code></strong> for each one. This synthetic dataset was also enhanced with categorization bins (e.g., <code>age_bin</code>, <code>income_bin</code>) to enable interactive slicing and scenario testing within the dashboard.</p>
</section>
</section>
<section id="phase-4-building-power-bi-report" class="level2">
<h2 class="anchored" data-anchor-id="phase-4-building-power-bi-report">Phase 4: Building Power BI Report</h2>
<section id="power-query-integration" class="level3">
<h3 class="anchored" data-anchor-id="power-query-integration">1. Power Query Integration</h3>
<p>There are two ways to run R scripts in Power BI: through the R visual or via Power Query. The R visuals are mainly for creating plots and have multiple limitations. Power Query is more versatile as it runs to preprocess data before loading it into the report.</p>
<div id="fig-power-query" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-power-query-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250927-credit-risk-analytics/screenshot-power-query.png" class="img-fluid quarto-figure quarto-figure-center figure-img" width="576">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-power-query-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6: An R script can be integrated into a Power BI report using Power Query.
</figcaption>
</figure>
</div>
</section>
<section id="dashboard-design" class="level3">
<h3 class="anchored" data-anchor-id="dashboard-design">2. Dashboard Design</h3>
<p>The Power Query R scripts only run when the data is refreshed, making it impossible to adjust parameters interactively. As soon as I couldn’t utilize advanced models such as Gradient Boosting for interactive calculations, in the <strong>Default Risk Calculator</strong> I used coefficients from GLM model trained on the non-standardized data, and converted the sum of the products to a predicted probability of default by applying the inverse-logit function <img src="https://latex.codecogs.com/png.latex?1/(1%20+%20%5Cexp(-(%5Cbeta_0%20+%20%5Cbeta_1x_1%20+...+%5Cbeta_kx_k)))"> in the DAX measure.</p>
<p>In the <strong>Portfolio Simulation</strong> there was no need to interactively calculate PDs, so I created DAX measures and slicers which filter the pre-calculated synthetic dataset, effectively allowing users to simulate different portfolio compositions and see the impact on overall default rates and losses. The underlying XGBoost model metrics demonstrated exceptional classification ability, with a Sensitivity of <strong>0.9831</strong> and a Specificity of <strong>0.7012</strong>.</p>
<p>The final dashboard was structured into three main sections:</p>
<ul>
<li><p><strong>Portfolio Overview:</strong> key statistics such as total borrowers, portfolio value, and overall default rate. The users can use cross-filtering to explore how different segments (e.g., by loan intent) perform.</p></li>
<li><p><strong>Default Risk Calculator:</strong> an interactive tool allowing users to input borrower characteristics and see the predicted probability of default along with marginal effects for each factor.</p></li>
<li><p><strong>Portfolio Simulation:</strong> a scenario analysis tool where users can modify variables, such as the proportions of loans by intent or thresholds based on borrower data like loan-to-income (LTI) ratio, to observe the projected effects on portfolio default rates, losses, and profits.</p></li>
</ul>
<div class="quarto-layout-panel" data-layout="[[1,1,1], [1]]">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="September_2025_DataDNA_Credit_Risk_Analytics_Challenge_0001.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Credit Portfolio Overview"><img src="https://frequentist.org/posts/20250927-credit-risk-analytics/September_2025_DataDNA_Credit_Risk_Analytics_Challenge_0001.svg" class="img-fluid figure-img" alt="Credit Portfolio Overview"></a></p>
<figcaption>Credit Portfolio Overview</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="September_2025_DataDNA_Credit_Risk_Analytics_Challenge_0002.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="Default Risk Calculator"><img src="https://frequentist.org/posts/20250927-credit-risk-analytics/September_2025_DataDNA_Credit_Risk_Analytics_Challenge_0002.svg" class="img-fluid figure-img" alt="Default Risk Calculator"></a></p>
<figcaption>Default Risk Calculator</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 33.3%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="September_2025_DataDNA_Credit_Risk_Analytics_Challenge_0003.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="Portfolio Simulation"><img src="https://frequentist.org/posts/20250927-credit-risk-analytics/September_2025_DataDNA_Credit_Risk_Analytics_Challenge_0003.svg" class="img-fluid figure-img" alt="Portfolio Simulation"></a></p>
<figcaption>Portfolio Simulation</figcaption>
</figure>
</div>
</div>
</div>
</div>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>This project was a comprehensive exercise in data science, statistical modeling, and business intelligence. By employing R for data processing and modeling, and Power BI for reporting, I created a powerful tool for credit risk analysis.</p>
<section id="key-features" class="level3">
<h3 class="anchored" data-anchor-id="key-features">Key Features</h3>
<ul>
<li><p>All data processing and modeling are done <strong>within Power BI</strong> using embedded R scripts, ensuring a single, portable .pbix file.</p></li>
<li><p>The <strong>Default Risk Calculator</strong> uses a Logistic Regression model for interpretability, allowing users to understand the impact of various factors on default probability.</p></li>
<li><p>The <strong>Portfolio Simulation</strong> utilizes a high-performing XGBoost model to provide realistic scenario analysis based on synthetic data.</p></li>
</ul>
</section>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p><a href="https://datadna.onyxdata.co.uk/challenges/september-2025-datadna-credit-risk-analytics-challenge/" target="_blank">September 2025 DataDNA - Credit Risk Analytics Challenge - Onyx Data</a> — challenge web-page.</p></li>
<li><p><a href="https://github.com/AxesAccess/DataDNA-Challenge-Credit-Risk-Analytics-September-2025" target="_blank">DataDNA Challenge Credit Risk Analytics September 2025</a> — GitHub repository.</p></li>
<li><p><a href="https://app.powerbi.com/view?r=eyJrIjoiNTkzOTc5OTQtYmQxNS00YzlmLWE3OWYtY2JjYWUwNTI3MGEzIiwidCI6ImZmYzg3OTVlLTAxODUtNDg5Yi05ZGE2LTQ5MDI0MTJmMDNhMCIsImMiOjl9" target="_blank">Interactive dashboard running in Power BI service</a>.</p></li>
</ul>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are other related articles you might find interesting:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="TUwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1771027200000" data-listing-file-modified-sort="1770982202443" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="9" data-listing-word-count-sort="1687" data-listing-title-sort="Implementing a Neural Network in Base R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260214-backpropagating-love/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260214-backpropagating-love/index.html" class="title listing-title">Implementing a Neural Network in Base R</a>
</td>
<td>
<span class="listing-reading-time">9 min</span>
</td>

</tr>

<tr data-index="1" data-categories="S1BJJTIwRGVzaWduJTJDQkklMkNTdHJhdGVneQ==" data-listing-date-sort="1769558400000" data-listing-file-modified-sort="1772988050630" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2630" data-listing-title-sort="The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260128-seven-step-kpi-blueprint/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260128-seven-step-kpi-blueprint/index.html" class="title listing-title">The 7-Step KPI Blueprint from Business Intelligence Analytics Perspective</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QXBwJTJDVmlzdWFsaXphdGlvbiUyQ05MUCUyQ1B5dGhvbg==" data-listing-date-sort="1766793600000" data-listing-file-modified-sort="1770067837480" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1445" data-listing-title-sort="Building a Privacy-First LinkedIn Analytics Platform" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251227-linkedin-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251227-linkedin-analytics/index.html" class="title listing-title">Building a Privacy-First LinkedIn Analytics Platform</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="4" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDU3RhdGlzdGljcyUyQ1I=" data-listing-date-sort="1761868800000" data-listing-file-modified-sort="1767873908846" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1178" data-listing-title-sort="Propensity Score Matching for Causal Analysis" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251028-propensity-score-matching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251028-propensity-score-matching/index.html" class="title listing-title">Propensity Score Matching for Causal Analysis</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="6" data-categories="VGltZS1TZXJpZXMlMkNDbHVzdGVyaW5nJTJDUg==" data-listing-date-sort="1756252800000" data-listing-file-modified-sort="1767875578057" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1558" data-listing-title-sort="Time-Series Clustering with R's dtwclust Package" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250827-time-series-clustering/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250827-time-series-clustering/index.html" class="title listing-title">Time-Series Clustering with R’s dtwclust Package</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="7" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3MlMkNS" data-listing-date-sort="1754524800000" data-listing-file-modified-sort="1759267192458" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1187" data-listing-title-sort="Minimum Detectable Effect (MDE) Calculation" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250807-mde/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250807-mde/index.html" class="title listing-title">Minimum Detectable Effect (MDE) Calculation</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="8" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3M=" data-listing-date-sort="1753747200000" data-listing-file-modified-sort="1770450295226" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="37" data-listing-word-count-sort="7205" data-listing-title-sort="A/B Testing: Concepts and Techniques" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250729-ab-testing/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250729-ab-testing/index.html" class="title listing-title">A/B Testing: Concepts and Techniques</a>
</td>
<td>
<span class="listing-reading-time">37 min</span>
</td>

</tr>

<tr data-index="9" data-categories="VmlzdWFsaXphdGlvbiUyQ1NwYXRpYWwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1751587200000" data-listing-file-modified-sort="1770926628882" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1400" data-listing-title-sort="Animation of Spatial Data" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250704-animation/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250704-animation/index.html" class="title listing-title">Animation of Spatial Data</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="10" data-categories="UHl0aG9uJTJDUiUyQ01hdGxhYg==" data-listing-date-sort="1739491200000" data-listing-file-modified-sort="1770926078462" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="607" data-listing-title-sort="Nerdy Valentine's in Python, R, and Matlab" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250214-valentines/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250214-valentines/index.html" class="title listing-title">Nerdy Valentine’s in Python, R, and Matlab</a>
</td>
<td>
<span class="listing-reading-time">4 min</span>
</td>

</tr>

<tr data-index="11" data-categories="QkklMkNFVEw=" data-listing-date-sort="1736121600000" data-listing-file-modified-sort="1769683733598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="2" data-listing-word-count-sort="397" data-listing-title-sort="BI System Blueprint" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250106-bi-flowchart/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250106-bi-flowchart/index.html" class="title listing-title">BI System Blueprint</a>
</td>
<td>
<span class="listing-reading-time">2 min</span>
</td>

</tr>

<tr data-index="12" data-categories="Q29tcFZpcyUyQ01M" data-listing-date-sort="1734480000000" data-listing-file-modified-sort="1743537159492" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1958" data-listing-title-sort="CV Week 2024" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20241218-cv-week/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20241218-cv-week/index.html" class="title listing-title">CV Week 2024</a>
</td>
<td>
<span class="listing-reading-time">10 min</span>
</td>

</tr>

<tr data-index="13" data-categories="TW9uZXklMkNS" data-listing-date-sort="1727395200000" data-listing-file-modified-sort="1735422934971" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="19" data-listing-word-count-sort="3793" data-listing-title-sort="European Tech Salaries" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240927-euro-tech-money/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240927-euro-tech-money/index.html" class="title listing-title">European Tech Salaries</a>
</td>
<td>
<span class="listing-reading-time">19 min</span>
</td>

</tr>

<tr data-index="14" data-categories="UiUyQ0dlbw==" data-listing-date-sort="1721865600000" data-listing-file-modified-sort="1735422851553" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="931" data-listing-title-sort="Exploring Geospatial Insights with R and rnaturalearth" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240725-views-of-russia/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240725-views-of-russia/index.html" class="title listing-title">Exploring Geospatial Insights with R and rnaturalearth</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>BI</category>
  <category>Statistics</category>
  <category>ML</category>
  <category>Visualization</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20250927-credit-risk-analytics/</guid>
  <pubDate>Sat, 27 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20250927-credit-risk-analytics/image.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Time-Series Clustering with R’s dtwclust Package</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20250827-time-series-clustering/</link>
  <description><![CDATA[ 






<p>This article is a practical guide for time-series clustering using the <code>dtwclust</code> package. The <code>dtwclust</code> package in R (see <a href="https://cran.r-project.org/web/packages/dtwclust/vignettes/dtwclust.pdf" target="_blank">vignette</a>) provides a powerful and flexible framework for time-series clustering, allowing you to implement and compare various algorithms, particularly those leveraging Dynamic Time Warping (DTW). This showcase will guide you through a practical example of time-series clustering using <code>dtwclust</code>, including data preparation, clustering execution, visualization, and evaluation.</p>
<section id="what-is-dynamic-time-warping-dtw" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="what-is-dynamic-time-warping-dtw"><span class="header-section-number">1</span> What is Dynamic Time Warping (DTW)</h2>
<p><a href="https://en.wikipedia.org/wiki/Dynamic_time_warping" target="_blank">Dynamic Time Warping (DTW)</a> is a prominent distance measure used in shape-based time-series clustering. Unlike Euclidean distance, which compares points at the same time index, DTW allows for “warping” or stretching/compressing the time axis of one series to find an optimal alignment with another. This enables it to accurately measure similarity between time-series that may vary in speed, length, or have phase shifts, but exhibit similar overall shapes.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250827-time-series-clustering/dtw-alignment.svg" class="img-fluid figure-img" style="width:5in"></p>
<figcaption>Sample alignment performed by the DTW algorithm between two series. The dashed blue lines exemplify how some points are mapped to each other, which shows how they can be warped in time. Note that the vertical position of each series was artificially altered for visualization. Credits: Alexis Sardá-Espinosa</figcaption>
</figure>
</div>
</section>
<section id="data-preparation" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="data-preparation"><span class="header-section-number">2</span> Data Preparation</h2>
<p>We will use daily closing prices of various cryptocurrencies from the dYdX exchange. The data will be fetched using the <code>httr2</code> package.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>dYdX Exchange</strong> is a decentralized finance (DeFi) platform that allows users to trade perpetual derivatives, margin, and spot crypto assets without a centralized intermediary.</p>
</div>
</div>
<p>Let’s start by fetching the list of available perpetual markets from the dYdX API. We will then filter for popular cryptocurrencies and retrieve their daily closing prices.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(httr2)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tibble)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyr)</span>
<span id="cb1-5"></span>
<span id="cb1-6">base_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://indexer.dydx.trade/v4"</span></span>
<span id="cb1-7">full_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(base_url, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/perpetualMarkets"</span>)</span>
<span id="cb1-8"></span>
<span id="cb1-9">req <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">request</span>(full_url) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_headers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Accept =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"application/json"</span>)</span>
<span id="cb1-11"></span>
<span id="cb1-12">resp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_perform</span>(req)</span>
<span id="cb1-13">body <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resp_body_json</span>(resp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">simplifyVector =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb1-14"></span>
<span id="cb1-15">markets <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(body<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>markets) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>()</span>
<span id="cb1-16">markets <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> markets <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">volume24H =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(volume24H))</span>
<span id="cb1-17">markets <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> markets <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(volume24H)) </span>
<span id="cb1-18"></span>
<span id="cb1-19">markets <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(ticker, oraclePrice, priceChange24H, volume24H) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 4
  ticker   oraclePrice  priceChange24H  volume24H
  &lt;chr&gt;    &lt;chr&gt;        &lt;chr&gt;               &lt;dbl&gt;
1 ETH-USD  4585.44      -10.614039     150676770.
2 BTC-USD  112855.34    2015.74841      43359731.
3 SOL-USD  212.08       7.72            18677423.
4 XRP-USD  2.9988498849 -0.0133507301    3201248.
5 LINK-USD 23.7155      -0.694490999     2551624.</code></pre>
</div>
</div>
<p>Let’s take 30 the most traded markets.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">tickers <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(body<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>markets[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">31</span>], <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(market) {</span>
<span id="cb3-2">  market<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ticker</span>
<span id="cb3-3">})</span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove MATIC-USD due to missing data</span></span>
<span id="cb3-5">tickers <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> tickers[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>tickers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MATIC-USD"</span>)]</span>
<span id="cb3-6">tickers <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> tickers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">array</span>()</span></code></pre></div></div>
</div>
<p>Below are are helper functions to convert date-times to the required format and to fetch daily candle data from the dYdX API. For some reason at the time of writing this, the API ignores the <code>fromIso</code> parameter, so we will filter the data after fetching it.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">to_iso8601_ns_utc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(datetime) {</span>
<span id="cb4-2">  datetime_utc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.POSIXct</span>(datetime, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tz =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"UTC"</span>)</span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">format</span>(datetime_utc, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y-%m-%dT%H:%M:%OS9Z"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">usetz =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb4-4">}</span>
<span id="cb4-5"></span>
<span id="cb4-6">get_dydx_daily_candles <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(</span>
<span id="cb4-7">    market,</span>
<span id="cb4-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">from_iso =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb4-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">to_iso =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb4-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limit =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>) {</span>
<span id="cb4-11">  base_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://indexer.dydx.trade/v4"</span></span>
<span id="cb4-12">  full_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(base_url, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/candles/perpetualMarkets/"</span>, market)</span>
<span id="cb4-13"></span>
<span id="cb4-14">  req <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">request</span>(full_url) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_headers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Accept =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"application/json"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_url_query</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resolution =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1DAY"</span>)</span>
<span id="cb4-17"></span>
<span id="cb4-18">  from_iso <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(from_iso)) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">to_iso8601_ns_utc</span>(from_iso) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb4-19">  to_iso <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(to_iso)) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">to_iso8601_ns_utc</span>(to_iso) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb4-20"></span>
<span id="cb4-21">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(from_iso)) req <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> req <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_url_query</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fromIso =</span> from_iso)</span>
<span id="cb4-22">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(to_iso)) req <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> req <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_url_query</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">toIso =</span> to_iso)</span>
<span id="cb4-23">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(limit)) req <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> req <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_url_query</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limit =</span> limit)</span>
<span id="cb4-24"></span>
<span id="cb4-25">  resp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_perform</span>(req)</span>
<span id="cb4-26">  body <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resp_body_json</span>(resp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">simplifyVector =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb4-27"></span>
<span id="cb4-28">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> body<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>candles</span>
<span id="cb4-29">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb4-30">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">startedAt =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.POSIXct</span>(startedAt,</span>
<span id="cb4-31">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y-%m-%dT%H:%M:%OSZ"</span>,</span>
<span id="cb4-32">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tz =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"UTC"</span></span>
<span id="cb4-33">    ),</span>
<span id="cb4-34">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">open =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(open),</span>
<span id="cb4-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">high =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(high),</span>
<span id="cb4-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">low =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(low),</span>
<span id="cb4-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">close =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(close)</span>
<span id="cb4-38">  )</span>
<span id="cb4-39">  df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>()</span>
<span id="cb4-40">}</span>
<span id="cb4-41"></span>
<span id="cb4-42"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_dydx_daily_candles</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BTC-USD"</span>,</span>
<span id="cb4-43">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">from_iso =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-01-01T00:00:00Z"</span>,</span>
<span id="cb4-44">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">to_iso =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-08-26T00:00:00Z"</span></span>
<span id="cb4-45">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 13
  startedAt           ticker  resolution    low   high   open  close
  &lt;dttm&gt;              &lt;chr&gt;   &lt;chr&gt;       &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
1 2025-08-28 00:00:00 BTC-USD 1DAY       110841 113400 111243 112888
2 2025-08-27 00:00:00 BTC-USD 1DAY       110395 112674 111809 111260
3 2025-08-26 00:00:00 BTC-USD 1DAY       108713 112397 110124 111797
4 2025-08-25 00:00:00 BTC-USD 1DAY       109296 113679 113537 110132
5 2025-08-24 00:00:00 BTC-USD 1DAY       110550 115662 115405 113539
6 2025-08-23 00:00:00 BTC-USD 1DAY       114569 117024 116948 115406
# ℹ 6 more variables: baseTokenVolume &lt;chr&gt;, usdVolume &lt;chr&gt;, trades &lt;int&gt;,
#   startingOpenInterest &lt;chr&gt;, orderbookMidPriceOpen &lt;chr&gt;,
#   orderbookMidPriceClose &lt;chr&gt;</code></pre>
</div>
</div>
<p>Now, we will fetch daily closing prices for our selection of popular cryptocurrencies.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">cryptos_list <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(tickers, <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(coin) {</span>
<span id="cb6-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># cat("Fetching data for:", coin, "\n")</span></span>
<span id="cb6-3">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_dydx_daily_candles</span>(coin,</span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">from_iso =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-01-01T00:00:00Z"</span>,</span>
<span id="cb6-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">to_iso =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-08-26T00:00:00Z"</span></span>
<span id="cb6-6">  )</span>
<span id="cb6-7">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(startedAt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.POSIXct</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-01-01"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tz =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"UTC"</span>))</span>
<span id="cb6-8">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ds =</span> startedAt, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> close) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(ds)</span>
<span id="cb6-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># z-score normalization</span></span>
<span id="cb6-10">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> (y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span>
<span id="cb6-11">  df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coin <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toupper</span>(coin) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gsub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-USD"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, .)</span>
<span id="cb6-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">return</span>(df)</span>
<span id="cb6-13">})</span>
<span id="cb6-14"></span>
<span id="cb6-15">cryptos_list <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(cryptos_list)</span></code></pre></div></div>
</div>
<p>Next, we will reshape the data into a wide format suitable for clustering, where each row represents a cryptocurrency and each column represents a daily closing price.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">cryptos_list_wide <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> cryptos_list  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> coin, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(ds) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>ds) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span>  </span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.list</span>()</span></code></pre></div></div>
</div>
</section>
<section id="performing-hierarchical-clustering" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="performing-hierarchical-clustering"><span class="header-section-number">3</span> Performing Hierarchical Clustering</h2>
<p>We will perform hierarchical clustering using the DTW distance and the “ward.D2” agglomeration method. Hierarchical clustering builds a hierarchy of groups without requiring a pre-specified number of clusters initially, and the process is deterministic.</p>
<ul>
<li><p><code>k = 4</code> specifies the desired number of clusters.</p></li>
<li><p><code>type = "hierarchical"</code> sets the clustering algorithm type.</p></li>
<li><p><code>distance = "dtw"</code> uses Dynamic Time Warping distance.</p></li>
<li><p><code>seed = 42</code> for reproducibility of random initializations (if applicable).</p></li>
<li><p><code>control = hierarchical_control(method = "ward.D2")</code> specifies the linkage method.</p></li>
<li><p><code>args = tsclust_args(dist = list(window.size = 7))</code> sets DTW window constraint.</p></li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dtwclust)</span>
<span id="cb8-2"></span>
<span id="cb8-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Perform hierarchical clustering</span></span>
<span id="cb8-4">hc_4_ward <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsclust</span>(cryptos_list_wide,</span>
<span id="cb8-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb8-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hierarchical"</span>,</span>
<span id="cb8-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">distance =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dtw"</span>,</span>
<span id="cb8-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>,</span>
<span id="cb8-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hierarchical_control</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ward.D2"</span>),</span>
<span id="cb8-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsclust_args</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">window.size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span>
<span id="cb8-11">)</span>
<span id="cb8-12"></span>
<span id="cb8-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View the clustering summary</span></span>
<span id="cb8-14">hc_4_ward</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>hierarchical clustering with 4 clusters
Using dtw distance
Using PAM (Hierarchical) centroids
Using method ward.D2 

Time required for analysis:
   user  system elapsed 
  2.958   0.047   3.005 

Cluster sizes with average intra-cluster distance:

  size  av_dist
1    4 87.52806
2    2 80.73321
3   12 66.03449
4   12 51.73463</code></pre>
</div>
</div>
<p>The output provides details about the clustering, including the distance measure, centroid method, linkage method, and cluster sizes with their average intra-cluster distances.</p>
</section>
<section id="accessing-clustering-results" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="accessing-clustering-results"><span class="header-section-number">4</span> Accessing Clustering Results</h2>
<p>The <code>tsclust()</code> function returns an S4 object of class <code>TSClusters</code>. You can access its slots, such as the cluster assignments, using the @ operator.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View cluster assignments for each time series</span></span>
<span id="cb10-2">hc_4_ward<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span>cluster</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> BTC  ETH LINK  CRV  SOL  ADA AVAX  FIL  LTC DOGE ATOM  DOT  UNI  BCH  TRX NEAR 
   1    2    3    2    3    3    3    4    3    3    4    4    3    1    1    4 
 MKR  XLM  ETC COMP  WLD  APE  APT  ARB BLUR  LDO   OP PEPE  SEI SHIB 
   1    3    3    4    4    4    4    3    4    3    4    4    3    4 </code></pre>
</div>
</div>
</section>
<section id="visualizing-clustering-results" class="level2" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="visualizing-clustering-results"><span class="header-section-number">5</span> Visualizing Clustering Results</h2>
<p>The <code>plot()</code> method for <code>TSClusters</code> objects offers various visualization types.</p>
<section id="dendrogram" class="level3" data-number="5.1">
<h3 data-number="5.1" class="anchored" data-anchor-id="dendrogram"><span class="header-section-number">5.1</span> Dendrogram</h3>
<p>A dendrogram visually represents the hierarchy of clusters.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mar =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Adjust margins for better plot</span></span>
<span id="cb12-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(hc_4_ward,</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sub =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>,</span>
<span id="cb12-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hierarchical Clustering Dendrogram (DTW, Ward.D2)"</span></span>
<span id="cb12-5">)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-dendrogram-2" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-dendrogram-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250827-time-series-clustering/index_files/figure-html/fig-dendrogram-2-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-dendrogram-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Dendrogram of hierarchical clustering using DTW distance and Ward.D2 linkage.
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="series-and-centroids" class="level3" data-number="5.2">
<h3 data-number="5.2" class="anchored" data-anchor-id="series-and-centroids"><span class="header-section-number">5.2</span> Series and Centroids</h3>
<p>Visualize the time series grouped by cluster, along with their representative prototypes (centroids). By default, prototypes for hierarchical clustering with PAM centroids are actual series from the data.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(hc_4_ward, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sc"</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># sc = series + centroids</span></span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-series-centroids-2" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-series-centroids-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250827-time-series-clustering/index_files/figure-html/fig-series-centroids-2-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-series-centroids-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Time series clustered with their centroids using DTW distance and Ward.D2 linkage.
</figcaption>
</figure>
</div>
</div>
</div>
<p>You can also plot a specific centroid, and even customize its appearance.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(hc_4_ward,</span>
<span id="cb14-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"centroids"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">clus =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb14-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linetype =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span></span>
<span id="cb14-4">)</span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-specific-centroid" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-specific-centroid-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250827-time-series-clustering/index_files/figure-html/fig-specific-centroid-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-specific-centroid-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Specific centroid (cluster 1).
</figcaption>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="comparing-multiple-clustering-solutions-and-evaluation" class="level2" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="comparing-multiple-clustering-solutions-and-evaluation"><span class="header-section-number">6</span> Comparing Multiple Clustering Solutions and Evaluation</h2>
<p>In practice, choosing the optimal number of clusters (<code>k</code>) and other parameters is crucial. <code>dtwclust</code> allows you to test multiple configurations simultaneously and evaluate them using Cluster Validity Indices (CVIs).</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Cluster Validity Indices</strong> are quantitative metrics used to assess the quality and “purity” of clustering results. Since clustering is often an unsupervised process, CVIs provide an objective way to evaluate performance, especially when comparing different clustering algorithms or configurations.</p>
</div>
</div>
<p>To accelerate the process, especially when testing many combinations, parallelization is highly recommended. <code>dtwclust</code> integrates with the <code>foreach</code> and <code>doParallel</code> packages for this purpose.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(bigmemory)</span>
<span id="cb15-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(doParallel)</span>
<span id="cb15-3"></span>
<span id="cb15-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define a range of k values and agglomeration methods to test</span></span>
<span id="cb15-5">k_values <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span></span>
<span id="cb15-6">linkage_methods <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ward.D2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"average"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"single"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"complete"</span>)</span>
<span id="cb15-7"></span>
<span id="cb15-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Initialize a parallel backend</span></span>
<span id="cb15-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use detectCores() - 1 to leave one core free</span></span>
<span id="cb15-10">num_cores <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">detectCores</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb15-11"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (num_cores <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) num_cores <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ensure at least one core is used</span></span>
<span id="cb15-12"></span>
<span id="cb15-13">cl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">makeCluster</span>(num_cores)</span>
<span id="cb15-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">registerDoParallel</span>(cl)</span>
<span id="cb15-15"></span>
<span id="cb15-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Perform multiple hierarchical clusterings in parallel</span></span>
<span id="cb15-17">hc_par <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsclust</span>(cryptos_list_wide,</span>
<span id="cb15-18">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> k_values,</span>
<span id="cb15-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hierarchical"</span>,</span>
<span id="cb15-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">distance =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dtw"</span>,</span>
<span id="cb15-21">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>,</span>
<span id="cb15-22">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hierarchical_control</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> linkage_methods),</span>
<span id="cb15-23">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsclust_args</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">window.size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)),</span>
<span id="cb15-24">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">trace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb15-25">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Calculating distance matrix...
Performing hierarchical clustering...
Extracting centroids...

    Elapsed time is 13.749 seconds.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stop the parallel cluster and revert to sequential computation</span></span>
<span id="cb17-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stopCluster</span>(cl)</span>
<span id="cb17-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">registerDoSEQ</span>()</span></code></pre></div></div>
</div>
<section id="evaluate-the-results-using-internal-cvis" class="level3" data-number="6.1">
<h3 data-number="6.1" class="anchored" data-anchor-id="evaluate-the-results-using-internal-cvis"><span class="header-section-number">6.1</span> Evaluate the results using internal CVIs</h3>
<p>We’ll use <a href="https://en.wikipedia.org/wiki/Silhouette_(clustering)" target="_blank">Silhouette (Sil)</a>, <a href="https://en.wikipedia.org/wiki/Dunn_index" target="_blank">Dunn (D)</a>, and <a href="https://en.wikipedia.org/wiki/Calinski%E2%80%93Harabasz_index" target="_blank">Calinski-Harabasz (CH)</a> indices. Higher values generally indicate better clustering for these indices.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">cvi_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(hc_par, cvi, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sil"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"D"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CH"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(rbind, .)</span>
<span id="cb18-3"></span>
<span id="cb18-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Find the configuration that maximizes each CVI</span></span>
<span id="cb18-5">optimal_indices <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">apply</span>(cvi_results, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MARGIN =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">FUN =</span> which.max)</span></code></pre></div></div>
</div>
<section id="display-cvi-results-and-optimal-configurations" class="level4" data-number="6.1.1">
<h4 data-number="6.1.1" class="anchored" data-anchor-id="display-cvi-results-and-optimal-configurations"><span class="header-section-number">6.1.1</span> Display CVI results and optimal configurations</h4>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(cvi_results)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>            Sil         D        CH
 [1,] 0.3463459 0.1355412 21.828708
 [2,] 0.5144728 0.3810831  9.673635
 [3,] 0.1888799 0.2514864  2.709671
 [4,] 0.3369159 0.2050711 18.657689
 [5,] 0.2989078 0.1745498 13.778813
 [6,] 0.3477774 0.3091404  8.406351
 [7,] 0.3689743 0.3451739  6.165031
 [8,] 0.3290618 0.2245861 14.349598
 [9,] 0.2838572 0.2257579 12.063620
[10,] 0.3054947 0.3091404  6.059278
[11,] 0.3441495 0.4739975  5.473468
[12,] 0.2518630 0.2031676 11.466633
[13,] 0.2731625 0.2257579 10.492873
[14,] 0.3439047 0.3098938 10.261007
[15,] 0.2602655 0.4210699  7.605540
[16,] 0.2297632 0.2559787  9.802860</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(optimal_indices)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Sil   D  CH 
  2  11   1 </code></pre>
</div>
</div>
</section>
<section id="retrieve-the-best-clustering-based-on-silhouette-index" class="level4" data-number="6.1.2">
<h4 data-number="6.1.2" class="anchored" data-anchor-id="retrieve-the-best-clustering-based-on-silhouette-index"><span class="header-section-number">6.1.2</span> Retrieve the best clustering based on Silhouette index</h4>
<p>Let’s extract the clustering that achieved the highest Silhouette score.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">best_clustering_sil <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hc_par[[optimal_indices[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sil"</span>]]]</span>
<span id="cb23-2">best_clustering_sil</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>hierarchical clustering with 3 clusters
Using dtw distance
Using PAM (Hierarchical) centroids
Using method average 

Time required for analysis:
   user  system elapsed 
  1.364   0.122  13.749 

Cluster sizes with average intra-cluster distance:

  size  av_dist
1    4 80.06529
2   25 84.14893
3    1  0.00000</code></pre>
</div>
</div>
</section>
<section id="retrieve-the-best-clustering-based-on-dunn-index" class="level4" data-number="6.1.3">
<h4 data-number="6.1.3" class="anchored" data-anchor-id="retrieve-the-best-clustering-based-on-dunn-index"><span class="header-section-number">6.1.3</span> Retrieve the best clustering based on Dunn index</h4>
<p>Similarly, we can extract the clustering that achieved the highest Dunn score.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">best_clustering_dunn <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hc_par[[optimal_indices[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"D"</span>]]]</span>
<span id="cb25-2">best_clustering_dunn</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>hierarchical clustering with 5 clusters
Using dtw distance
Using PAM (Hierarchical) centroids
Using method single 

Time required for analysis:
   user  system elapsed 
  1.364   0.122  13.749 

Cluster sizes with average intra-cluster distance:

  size  av_dist
1    3 62.58062
2    1  0.00000
3   24 78.78400
4    1  0.00000
5    1  0.00000</code></pre>
</div>
</div>
</section>
<section id="retrieve-the-best-clustering-based-on-calinski-harabasz-index" class="level4" data-number="6.1.4">
<h4 data-number="6.1.4" class="anchored" data-anchor-id="retrieve-the-best-clustering-based-on-calinski-harabasz-index"><span class="header-section-number">6.1.4</span> Retrieve the best clustering based on Calinski-Harabasz index</h4>
<p>Finally, we can extract the clustering that achieved the highest Calinski-Harabasz score.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">best_clustering_ch <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hc_par[[optimal_indices[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CH"</span>]]]</span>
<span id="cb27-2">best_clustering_ch</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>hierarchical clustering with 3 clusters
Using dtw distance
Using PAM (Hierarchical) centroids
Using method ward.D2 

Time required for analysis:
   user  system elapsed 
  1.364   0.122  13.749 

Cluster sizes with average intra-cluster distance:

  size   av_dist
1    6 106.59632
2   12  66.03449
3   12  51.73463</code></pre>
</div>
</div>
<p>This output helps you objectively compare different clustering outcomes and select the most suitable solution for your data based on various validity metrics.</p>
</section>
</section>
</section>
<section id="performing-clustering-with-optimal-configuration" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="performing-clustering-with-optimal-configuration"><span class="header-section-number">7</span> Performing Clustering with Optimal Configuration</h2>
<p>Let’s take a look at the clustering configuration that achieved the maximum results according to the Calinski-Harabasz index.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">hc_3_ward <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsclust</span>(cryptos_list_wide,</span>
<span id="cb29-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,</span>
<span id="cb29-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hierarchical"</span>,</span>
<span id="cb29-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">distance =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dtw"</span>,</span>
<span id="cb29-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>,</span>
<span id="cb29-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hierarchical_control</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ward.D2"</span>),</span>
<span id="cb29-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsclust_args</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">window.size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>))</span>
<span id="cb29-8">)</span>
<span id="cb29-9"></span>
<span id="cb29-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View the clustering summary</span></span>
<span id="cb29-11">hc_3_ward</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>hierarchical clustering with 3 clusters
Using dtw distance
Using PAM (Hierarchical) centroids
Using method ward.D2 

Time required for analysis:
   user  system elapsed 
  2.910   0.106   3.017 

Cluster sizes with average intra-cluster distance:

  size   av_dist
1    6 106.59632
2   12  66.03449
3   12  51.73463</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View cluster assignments for each time series</span></span>
<span id="cb31-2">hc_3_ward<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span>cluster</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> BTC  ETH LINK  CRV  SOL  ADA AVAX  FIL  LTC DOGE ATOM  DOT  UNI  BCH  TRX NEAR 
   1    1    2    1    2    2    2    3    2    2    3    3    2    1    1    3 
 MKR  XLM  ETC COMP  WLD  APE  APT  ARB BLUR  LDO   OP PEPE  SEI SHIB 
   1    2    2    3    3    3    3    2    3    2    3    3    2    3 </code></pre>
</div>
</div>
<p>Plot the dendrogram for this clustering.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot the dendrogram</span></span>
<span id="cb33-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mar =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb33-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(hc_3_ward,</span>
<span id="cb33-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sub =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>,</span>
<span id="cb33-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hierarchical Clustering Dendrogram (DTW, Ward.D2)"</span></span>
<span id="cb33-6">)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250827-time-series-clustering/index_files/figure-html/plot-dendrogram-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The line plots of the time series grouped by cluster show very distinct patterns.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot time series and their centroids</span></span>
<span id="cb34-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(hc_3_ward, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sc"</span>) </span></code></pre></div></div>
<div class="cell-output-display">
<div id="fig-series-centroids-optimal" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-series-centroids-optimal-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://frequentist.org/posts/20250827-time-series-clustering/index_files/figure-html/fig-series-centroids-optimal-1.svg" class="img-fluid figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-series-centroids-optimal-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Time series clustered using DTW distance and Ward.D2 linkage.
</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="conclusion" class="level2" data-number="8">
<h2 data-number="8" class="anchored" data-anchor-id="conclusion"><span class="header-section-number">8</span> Conclusion</h2>
<p><code>dtwclust</code> provides a modular and efficient framework for time-series clustering in R, implementing various algorithms (especially DTW-related ones) and allowing for customization and comparison. It serves as a strong starting point for time-series clustering tasks.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Time-Series</category>
  <category>Clustering</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20250827-time-series-clustering/</guid>
  <pubDate>Wed, 27 Aug 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20250827-time-series-clustering/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Minimum Detectable Effect (MDE) Calculation</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20250807-mde/</link>
  <description><![CDATA[ 






<p>Minimum Detectable Effect (MDE) is defined as the smallest difference between a control and a test group that your A/B test can reliably identify as statistically significant. It’s a critical concept because it helps in determining the necessary sample size for an experiment and in interpreting the results.</p>
<section id="how-mde-is-calculated-in-experimental-design" class="level2">
<h2 class="anchored" data-anchor-id="how-mde-is-calculated-in-experimental-design">How MDE is calculated in experimental design</h2>
<p>The MDE calculation depends on several key parameters:</p>
<ul>
<li>Sample size (<img src="https://latex.codecogs.com/png.latex?n"> for the test group, <img src="https://latex.codecogs.com/png.latex?m"> for the control group).</li>
<li>Significance level (alpha, <img src="https://latex.codecogs.com/png.latex?%5Calpha">): This is the probability of a&nbsp;Type I error (falsely rejecting the null hypothesis), typically set at 0.05.</li>
<li>Statistical power (1-beta, <img src="https://latex.codecogs.com/png.latex?1-%5Cbeta">): This is the probability of correctly detecting an effect when one truly exists, commonly set at 0.8 (or 80%).</li>
<li>Variance (<img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2">) of the metric being measured in the population, often estimated from historical data.</li>
<li>Ratio of control to test group sizes (<img src="https://latex.codecogs.com/png.latex?k%20=%20m/n">).</li>
</ul>
<div class="cell">
<div class="cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250807-mde/index_files/figure-html/plot-normal-distribution-1.svg" class="img-fluid figure-img"></p>
<figcaption>Control and test distributions with critical values, alpha, beta, and power shaded.</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="calculating-mde-for-a-given-sample-size" class="level2">
<h2 class="anchored" data-anchor-id="calculating-mde-for-a-given-sample-size">Calculating MDE for a given sample size</h2>
<p>If you have a fixed number of users (n) or a defined sample size (m or n), you can determine the smallest effect (e) that your test can reliably detect:</p>
<p><img src="https://latex.codecogs.com/png.latex?e%20%3E%20%5Csqrt%7B%5Cdfrac%7B(z_%7B1-%5Calpha/2%7D%20+%20z_%7B1-%5Cbeta%7D)%5E2%20(1%20+%20k)%5Csigma%5E2%7D%7Bm%7D%7D"></p>
<p><img src="https://latex.codecogs.com/png.latex?z_%7B1-%5Calpha/2%7D"> and <img src="https://latex.codecogs.com/png.latex?z_%7B1-%5Cbeta%7D"> are the Z-scores corresponding to the desired significance level and&nbsp;power, respectively.</p>
<p><img src="https://latex.codecogs.com/png.latex?k=%5Cfrac%7Bm%7D%7Bn%7D"> is the ratio of the control group size to the test group size (e.g., <img src="https://latex.codecogs.com/png.latex?k=1"> for a 1:1 split).</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> is the estimated variance of the metric.</p>
<p><img src="https://latex.codecogs.com/png.latex?e"> is the minimal effect.</p>
<section id="how-to-calculate-mde-in-r" class="level3">
<h3 class="anchored" data-anchor-id="how-to-calculate-mde-in-r">How to calculate MDE in R</h3>
<p>For instance, with 100,000 total users, a sigma of 500, a k-ratio of 2, alpha = 0.05, and beta = 0.2, the minimum detectable effect would be approximately 9.397 units, which translates to a 1.88% change from the mean of 500.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Function to calculate MDE</span></span>
<span id="cb1-2">calculate_mde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">beta =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb1-3">  z_alpha <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-4">  z_beta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> beta)</span>
<span id="cb1-5">  m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-6">  e <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>((z_alpha <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> z_beta)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> k) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> m)</span>
<span id="cb1-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">return</span>(e)</span>
<span id="cb1-8">}</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_mde</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100000</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">beta =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 9.396802</code></pre>
</div>
</div>
</section>
<section id="interactive-mde-calculator" class="level3">
<h3 class="anchored" data-anchor-id="interactive-mde-calculator">Interactive MDE calculator</h3>
<p>Below is an interactive calculator for finding the MDE based on your sample size, alpha, beta, sigma, and k-ratio. Adjust the parameters to see how they affect the MDE.</p>
<iframe src="https://a13ks3i.shinyapps.io/mde-calculator/" width="100%" height="600" title="MDE Calculator">
</iframe>
</section>
</section>
<section id="finding-the-required-sample-size-for-a-given-mde" class="level2">
<h2 class="anchored" data-anchor-id="finding-the-required-sample-size-for-a-given-mde">Finding the required sample size for a given MDE</h2>
<p>The formula used to determine the necessary sample size (e.g., for the control group, <img src="https://latex.codecogs.com/png.latex?m">) to detect a&nbsp;specific MDE (<img src="https://latex.codecogs.com/png.latex?e">) is:</p>
<p><img src="https://latex.codecogs.com/png.latex?m%20%3E%20%5Cdfrac%7B(z_%7B1-%5Calpha/2%7D%20+%20z_%7B1-%5Cbeta%7D)%5E2%20(1%20+%20k)%5Csigma%5E2%7D%7Be%5E2%7D"></p>
<section id="how-to-calculate-sample-size-in-r" class="level3">
<h3 class="anchored" data-anchor-id="how-to-calculate-sample-size-in-r">How to calculate sample size in R</h3>
<p>Example: if a monetary metric has a mean of 500 and a standard deviation (sigma) of 500, and you want to detect a 2% effect (MDE = 10) with alpha = 0.05 and beta = 0.2, and a test-to-control ratio of k = 2, the required sample size would be approximately 29,434 users in the test group and 58,867 users in the control group, totaling 88,301 users.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Function to calculate required sample size</span></span>
<span id="cb3-2">calculate_sample_size <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">beta =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb3-3">  z_alpha <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-4">  z_beta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> beta)</span>
<span id="cb3-5">  m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (z_alpha <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> z_beta)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> k) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> sigma<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb3-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">return</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ceiling</span>(m))</span>
<span id="cb3-7">}</span>
<span id="cb3-8">m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_sample_size</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">e =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">beta =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-9">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ceiling</span>(m <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-10">total_users <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> m <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> n</span>
<span id="cb3-11"></span>
<span id="cb3-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(m, n, total_users)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 58867 29434 88301</code></pre>
</div>
</div>
</section>
<section id="interactive-sample-size-calculator" class="level3">
<h3 class="anchored" data-anchor-id="interactive-sample-size-calculator">Interactive sample size calculator</h3>
<p>Below is an interactive calculator that allows you to input your desired MDE, alpha, beta, sigma, and k-ratio to compute the required sample size. Adjust the parameters to see how they affect the sample size needed for your A/B test.</p>
<iframe src="https://a13ks3i.shinyapps.io/sample-size-calculator/" width="100%" height="600" title="Sample Size Calculator">
</iframe>
</section>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>A/B testing is a powerful tool for product development, and understanding the concept of Minimal Detectable Effect (MDE) is crucial for designing effective experiments. By calculating the required sample size or MDE, you can ensure that your tests are statistically sound and capable of providing meaningful insights into user behavior and product performance.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>A/B Testing</category>
  <category>Product</category>
  <category>Statistics</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20250807-mde/</guid>
  <pubDate>Thu, 07 Aug 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20250807-mde/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>A/B Testing: Concepts and Techniques</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20250729-ab-testing/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2 unnumbered unlisted">
<h2 class="unnumbered unlisted anchored" data-anchor-id="introduction">Introduction</h2>
<p>This article is a short recap of intensive course <a href="https://shad.yandex.ru/abweek" target="_blank">A/B Week by YSDA</a>, providing an overview of A/B testing, focusing on its key components, common metrics, types of errors, and advanced techniques like CUPED. It also discusses the challenges of peeking at results, the problem of multiple testing, and how to validate statistical criteria using A/A tests.</p>
</section>
<section id="what-is-ab-testing-and-what-are-its-key-components" class="level2">
<h2 class="anchored" data-anchor-id="what-is-ab-testing-and-what-are-its-key-components">1. What is A/B testing and what are its key components?</h2>
<p>A/B testing is a method used to determine the impact of implemented changes on a product by isolating external factors. It involves dividing users into two groups: a control group (A) that experiences no changes, and a test group (B) that is exposed to a new feature.</p>
<p>The key components of an A/B test include:</p>
<ul>
<li><p><strong>Infrastructure:</strong> A robust system is required to conduct and manage experiments.</p></li>
<li><p><strong>Customer Base:</strong> A large user base is necessary to ensure statistically significant results.</p></li>
<li><p><strong>Time:</strong> Sufficient time is needed for the experiment to run and for the data to be analyzed.</p></li>
<li><p><strong>Metrics:</strong> Carefully selected metrics are used to measure the effect of the changes. These can be “value metrics” (e.g., total cost of successful trips, number of unique completed orders) or “ratio metrics” (e.g., acceptance rate, completed rate, tips as a share of GMV).</p></li>
<li><p><strong>User Aggregation:</strong> Data is typically aggregated per user rather than per event to ensure independent observations, which is crucial for valid statistical analysis. Comparing raw event-level data can introduce dependencies that invalidate standard statistical tests.</p></li>
</ul>
<div id="f76b78f9" class="cell" data-execution_count="3">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-4-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>User aggregation vs Event-level data</figcaption>
</figure>
</div>
</div>
</div>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>The figures in this article were created based on the code from the <a href="https://github.com/dakhakimova/YSDA_ABweek">dakhakimova/YSDA_ABweek</a> repository.</p>
</div>
</div>
</div>
</section>
<section id="what-are-the-common-types-of-metrics-used-in-ab-testing-and-how-are-they-handled" class="level2">
<h2 class="anchored" data-anchor-id="what-are-the-common-types-of-metrics-used-in-ab-testing-and-how-are-they-handled">2. What are the common types of metrics used in A/B testing and how are they handled?</h2>
<p>Metrics in A/B testing are broadly categorized into:</p>
<ul>
<li><p><strong>Value Metrics:</strong> These represent absolute values or sums, such as Gross Merchandise Value (GMV), total number of impressions, or total dwell time. For these metrics, the average (mean) is commonly compared between test and control groups.</p></li>
<li><p><strong>Ratio Metrics:</strong> These represent a proportion or ratio, such as Acceptance Rate (accepted offers to seen offers) or CTR (number of clicks to the number of views). These are more complex because they involve both a numerator and a denominator, and the simple t-test for means may not be appropriate due to the inherent correlation between the numerator and denominator within each user.</p></li>
</ul>
<p>For ratio metrics, several advanced methods are used:</p>
<ul>
<li><p><strong>Delta Method:</strong> This statistical technique estimates the variance of a ratio by using the variances and covariance of its numerator and denominator. It approximates the distribution of the ratio using a <a href="https://en.wikipedia.org/wiki/Taylor_series" target="_blank">Taylor series</a> expansion.</p></li>
<li><p><strong>Linearization:</strong> This method transforms the ratio into a linear approximation, allowing the use of standard t-tests on the transformed data. There are different types of linearization, typically involving a reference value (e.g., the control group’s ratio) to define the linear terms.</p>
<div class="callout callout-style-simple callout-note callout-titled" title="More information">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>More information
</div>
</div>
<div class="callout-body-container callout-body">
<p><a href="https://www.researchgate.net/publication/322969314_Consistent_Transformation_of_Ratio_Metrics_for_Efficient_Online_Controlled_Experiments" target="_blank">Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments</a> (<a href="http://dx.doi.org/10.1145/3159652.3159699" target="_blank">DOI:10.1145/3159652.3159699</a>).</p>
</div>
</div></li>
<li><p><strong>Bucketization (Bucketing):</strong> Instead of analyzing individual user data, users (or their aggregated events) are grouped into “buckets”. The ratio is then calculated for each bucket, and a t-test is performed on the bucket-level ratios. This can help normalize the distribution and reduce the impact of outliers but may lead to loss of information or reduced power with too few buckets.</p></li>
</ul>
<div id="f0285ca1" class="cell" data-execution_count="4">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-5-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>Bucketing vs Uniform (without effect)</figcaption>
</figure>
</div>
</div>
</div>
<ul>
<li><strong>Bootstrap:</strong> A non-parametric resampling technique that involves repeatedly drawing samples with replacement from the observed data to create an empirical distribution of the statistic of interest (e.g., the difference in ratios). This distribution is then used to construct confidence intervals and calculate p-values, making it robust to distributional assumptions. Poisson bootstrap is a variant suitable for large datasets, allowing parallelization.</li>
</ul>
</section>
<section id="what-are-type-i-and-type-ii-errors-in-ab-testing-and-how-do-they-relate-to-mde" class="level2">
<h2 class="anchored" data-anchor-id="what-are-type-i-and-type-ii-errors-in-ab-testing-and-how-do-they-relate-to-mde">3. What are Type I and Type II errors in A/B testing, and how do they relate to MDE?</h2>
<p>In hypothesis testing:</p>
<ul>
<li><p><strong>Null Hypothesis (H0):</strong> States there is no effect or difference between groups (e.g., the new feature has no impact).</p></li>
<li><p><strong>Alternative Hypothesis (H1):</strong> States there is an effect or difference.</p></li>
</ul>
<p>The two types of errors are:</p>
<ul>
<li><p><strong>Type I Error</strong> <img src="https://latex.codecogs.com/png.latex?(%5Calpha)">: Rejecting the null hypothesis when it is actually true. This is also known as the “level of significance” and represents the probability of falsely concluding that an effect exists when it doesn’t.</p></li>
<li><p><strong>Type II Error</strong> <img src="https://latex.codecogs.com/png.latex?(%5Cbeta)">: Failing to reject the null hypothesis when the alternative hypothesis is true. This means failing to detect an effect that actually exists.</p></li>
</ul>
<p>There’s an inverse relationship between Type I and Type II errors: decreasing <em>alpha</em> (making it harder to find an effect) will increase <em>beta</em> (making it harder to detect a real effect), and vice-versa.</p>
<p><strong>Minimal Detectable Effect (MDE)</strong> is the smallest true difference between the control and test groups that an A/B test can reliably detect as statistically significant, given predefined values for:</p>
<ul>
<li><p><strong>Sample Size</strong> <img src="https://latex.codecogs.com/png.latex?(n)">: The number of users in each group.</p></li>
<li><p><strong>Significance Level</strong> <img src="https://latex.codecogs.com/png.latex?(%5Calpha)">: The probability of a Type I error (e.g., 0.05).</p></li>
<li><p><strong>Statistical Power</strong> <img src="https://latex.codecogs.com/png.latex?(1%20-%20%5Cbeta)">: The probability of correctly detecting a true effect (e.g., 0.8 or 80%).</p></li>
</ul>
<p>MDE is crucial for experiment design, helping to estimate the required sample size and understand the sensitivity of the test to detect meaningful changes.</p>
<div class="callout callout-style-simple callout-note callout-titled" title="More information">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>More information
</div>
</div>
<div class="callout-body-container callout-body">
<p>For a detailed explanation of MDE and its calculation, see <a href="https://frequentist.org/posts/20250807-mde/" target="_blank">this article</a>.</p>
</div>
</div>
<div id="d4afb2d3" class="cell" data-execution_count="5">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-6-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>Dependency of MDE on <img src="https://latex.codecogs.com/png.latex?m"></figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="how-can-the-validity-of-a-statistical-criterion-be-checked-using-aa-tests" class="level2">
<h2 class="anchored" data-anchor-id="how-can-the-validity-of-a-statistical-criterion-be-checked-using-aa-tests">4. How can the validity of a statistical criterion be checked using A/A tests?</h2>
<p>A/A testing is a method where two identical groups are compared to each other, with no actual changes introduced. Since no effect is expected, an A/A test helps validate the statistical criterion used in A/B tests.</p>
<p>The primary principle for validation is that if the null hypothesis is true (i.e., there is no actual difference between the groups), the p-values obtained from the statistical tests should be uniformly distributed between 0 and 1.</p>
<p>Validation steps involve:</p>
<ol type="1">
<li><p><strong>Synthetic Data Generation:</strong> Create simulated datasets for test and control groups where no effect is present.</p></li>
<li><p><strong>Repeated Testing:</strong> Run the statistical criterion (e.g., t-test) many times (e.g., 10,000 times) on these synthetic A/A datasets.</p></li>
<li><p><strong>P-value Distribution Analysis: Histogram of P-values:</strong> If the p-values are uniformly distributed, the histogram should appear flat. Any peaks or skews indicate issues with the criterion.</p></li>
<li><p><strong>QQ-plot (Quantile-Quantile Plot):</strong> This plot compares the observed p-values’ quantiles against the theoretical quantiles of a uniform distribution. Points should fall approximately along a 45-degree line. Deviations suggest the p-values are not uniformly distributed.</p></li>
<li><p><strong>Empirical Cumulative Distribution Function (ECDF):</strong> Plotting the ECDF of the p-values against the theoretical CDF of a uniform distribution (which is a straight line from 0,0 to 1,1). Similar to QQ-plots, a close fit indicates uniformity.</p></li>
<li><p><a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test" target="_blank"><strong>Kolmogorov-Smirnov (KS) Test</strong></a><strong>:</strong> A non-parametric statistical test that formally assesses whether the observed p-values significantly differ from a uniform distribution. A high p-value from the KS test (e.g., &gt; 0.05) would suggest uniformity.</p></li>
<li><p><strong>Confidence Interval for Type I Error:</strong> Calculate a confidence interval for the proportion of times the null hypothesis was incorrectly rejected (Type I error rate). This observed error rate should ideally be close to the chosen alpha level (e.g., 0.05) and fall within its confidence interval.</p></li>
</ol>
<p>If any of these checks fail, it indicates that the chosen statistical criterion is not valid for the given data and experiment setup, even before considering any actual effects.</p>
<div id="414fd815" class="cell" data-execution_count="6">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-7-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>User aggregation vs Uniform (A/A test without effect)</figcaption>
</figure>
</div>
</div>
</div>
</section>
<section id="peeking" class="level2">
<h2 class="anchored" data-anchor-id="peeking">5. What are the challenges with “peeking” at A/B test results and how can they be addressed?</h2>
<p>“Peeking” or “p-hacking” refers to the practice of repeatedly checking the results of an A/B test as data accumulates and stopping the experiment as soon as a statistically significant result is observed.</p>
<section id="challenges" class="level3">
<h3 class="anchored" data-anchor-id="challenges">Challenges:</h3>
<ul>
<li><p><strong>Increased Type I Error (False Positives):</strong> Every time you “peek” at the data and run a statistical test, you increase the probability of encountering a false positive (Type I error). If you test multiple times, the cumulative probability of making at least one Type I error across all checks dramatically inflates beyond the chosen alpha level (e.g., 0.05). This leads to unreliable and irreproducible findings.</p></li>
<li><p><strong>Misinterpretation of P-values:</strong> The p-value’s interpretation relies on the assumption of a single, pre-specified test. Continuous monitoring violates this.</p></li>
</ul>
</section>
<section id="solutions-to-address-peeking" class="level3">
<h3 class="anchored" data-anchor-id="solutions-to-address-peeking">Solutions to Address Peeking:</h3>
<ul>
<li><p><strong>Group Sequential Testing (GST):</strong> This approach allows for multiple interim analyses (peeks) while controlling the overall Family-Wise Error Rate (FWER). It achieves this by adjusting the significance thresholds for each sequential look. Common methods for setting these boundaries include:</p>
<ul>
<li><p><strong>O’Brien-Fleming (OBF) Boundaries:</strong> These set very stringent (hard-to-cross) thresholds at early stages of the experiment, which gradually become less strict as more data accumulates, approaching the traditional alpha level at the final analysis.</p></li>
<li><p><strong>Pocock Boundaries:</strong> These set constant (but higher than traditional alpha) thresholds for all interim analyses.</p></li>
<li><p><strong>Fixed Sample Size:</strong> Pre-determining the sample size and running the experiment until that size is reached, then performing a single statistical test. This avoids the temptation to peek and minimizes Type I error inflation.</p></li>
<li><p><strong>Sequential Testing with Alpha Spending Functions:</strong> More flexible methods that distribute the total Type I error rate across multiple analyses, allowing for adaptive monitoring of experiments.</p></li>
</ul></li>
</ul>
<div id="3eb50fe0" class="cell" data-execution_count="7">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-8-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>GST: dynamic thresholds for z-statistic</figcaption>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="what-is-the-problem-of-multiple-testing-and-how-can-it-be-mitigated" class="level2">
<h2 class="anchored" data-anchor-id="what-is-the-problem-of-multiple-testing-and-how-can-it-be-mitigated">6. What is the problem of multiple testing and how can it be mitigated?</h2>
<p>The “multiple testing problem” arises when multiple statistical hypotheses are tested simultaneously. If you perform <img src="https://latex.codecogs.com/png.latex?m"> independent A/B tests, each with a Type I error rate (<em>alpha</em>) of, say, 0.05, the probability of making at least one false positive (Family-Wise Error Rate, FWER) increases significantly with <img src="https://latex.codecogs.com/png.latex?m">.</p>
<section id="how-fwer-grows" class="level3">
<h3 class="anchored" data-anchor-id="how-fwer-grows">How FWER grows:</h3>
<p>For <img src="https://latex.codecogs.com/png.latex?m"> independent tests, the probability of <em>not</em> making a Type I error in any single test is <img src="https://latex.codecogs.com/png.latex?1%20-%20%5Calpha">. Therefore, the probability of <em>not</em> making any Type I error across all <img src="https://latex.codecogs.com/png.latex?m"> tests is <img src="https://latex.codecogs.com/png.latex?(1%20-%20%5Calpha)%5Em">. Consequently, the FWER (probability of at least one Type I error) is <img src="https://latex.codecogs.com/png.latex?1%20-%20(1%20-%20%5Calpha)%5Em">. This value quickly exceeds the nominal <img src="https://latex.codecogs.com/png.latex?%5Calpha"> as <img src="https://latex.codecogs.com/png.latex?m"> increases.</p>
</section>
<section id="mitigation-strategies" class="level3">
<h3 class="anchored" data-anchor-id="mitigation-strategies">Mitigation Strategies:</h3>
<p>To control the FWER when performing multiple comparisons, adjusted p-value thresholds or methods are used:</p>
<ul>
<li><p><strong>Bonferroni Correction:</strong> A very conservative method that divides the original alpha by the number of tests <img src="https://latex.codecogs.com/png.latex?%5Calpha_%7Badjusted%7D%20=%20%5Calpha/m">. While effective at controlling FWER, it often severely reduces statistical power, making it harder to detect true effects.</p></li>
<li><p><strong>Šidák Correction:</strong> A slightly less conservative method than Bonferroni, calculating the adjusted alpha as <img src="https://latex.codecogs.com/png.latex?%5Calpha_%7Badjusted%7D%20=%201%20-%20(1%20-%5Calpha)%5E%7B%5Cfrac%7B%5Calpha%7D%7Bm%7D%7D">.</p></li>
<li><p><strong>Holm-Bonferroni Method (Holm):</strong> A stepwise procedure that is less conservative than Bonferroni while still controlling FWER. It sorts p-values and adjusts them iteratively.</p></li>
<li><p><strong>False Discovery Rate (FDR) Control (e.g., Benjamini-Hochberg):</strong> Instead of controlling FWER (the probability of <em>any</em> false positive), FDR methods control the expected proportion of false positives among <em>all</em> rejected hypotheses. This approach is less stringent than FWER control, leading to higher statistical power, and is often preferred in exploratory research or when many tests are performed.</p></li>
</ul>
<p>Choosing the right correction depends on the specific goals: if avoiding <em>any</em> false positive is paramount (e.g., clinical trials), FWER control is chosen. If a higher number of true positives is desired even with some false positives (e.g., feature development), FDR control might be more appropriate.</p>
</section>
</section>
<section id="what-is-cuped-and-how-does-it-improve-ab-test-sensitivity" class="level2">
<h2 class="anchored" data-anchor-id="what-is-cuped-and-how-does-it-improve-ab-test-sensitivity">7. What is CUPED and how does it improve A/B test sensitivity?</h2>
<p><strong>CUPED (Controlled-experiment Using Pre-Experiment Data)</strong> is a technique designed to improve the sensitivity (power) of A/B tests by reducing the variance of the metrics being analyzed. It achieves this by leveraging pre-experiment data (covariates) for each user.</p>
<section id="how-cuped-works" class="level3">
<h3 class="anchored" data-anchor-id="how-cuped-works">How CUPED works:</h3>
<p>CUPED works by creating an adjusted metric (<img src="https://latex.codecogs.com/png.latex?Z_i">) for each user (<img src="https://latex.codecogs.com/png.latex?i">):</p>
<p><img src="https://latex.codecogs.com/png.latex?Z_i%20=%20Y_i%20-%20%CE%B8X_i%20+%20%CE%B8E%5BX%5D~,"> where:</p>
<ul>
<li><p><img src="https://latex.codecogs.com/png.latex?Y_i"> is the observed metric value for user <img src="https://latex.codecogs.com/png.latex?i"> in the experiment.</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?X_i"> is a pre-experiment covariate for user <img src="https://latex.codecogs.com/png.latex?i"> (e.g., the same metric’s value during a period <em>before</em> the experiment began).</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?E%5BX%5D"> is the expected value of the covariate across the entire population (or both groups combined).</p></li>
<li><p><img src="https://latex.codecogs.com/png.latex?%5Ctheta"> is a coefficient calculated as <img src="https://latex.codecogs.com/png.latex?Cov(X,%20Y)%20/%20Var(X)">, which maximizes variance reduction.</p></li>
</ul>
<p>By using this adjusted metric <img src="https://latex.codecogs.com/png.latex?Z_i">, the variance of the difference between the test and control groups <img src="https://latex.codecogs.com/png.latex?Var(%5Cbar%20Z)"> can be significantly reduced, specifically by a factor of <img src="https://latex.codecogs.com/png.latex?(1%20-%20r%5E2)">, where <img src="https://latex.codecogs.com/png.latex?r"> is the Pearson correlation coefficient between <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">. A higher correlation between pre-experiment and in-experiment metrics leads to a greater reduction in variance.</p>
<div class="callout callout-style-simple callout-note callout-titled" title="More information">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>More information
</div>
</div>
<div class="callout-body-container callout-body">
<p><a href="https://www.researchgate.net/publication/237838291_Improving_the_Sensitivity_of_Online_Controlled_Experiments_by_Utilizing_Pre-Experiment_Data" target="_blank">Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data</a> (<a href="http://dx.doi.org/10.1145/2433396.2433413" target="_blank">DOI:10.1145/2433396.2433413</a>).</p>
</div>
</div>
</section>
<section id="benefits" class="level3">
<h3 class="anchored" data-anchor-id="benefits">Benefits:</h3>
<ul>
<li><strong>Increased Sensitivity/Power:</strong> By reducing variance, CUPED allows the A/B test to detect smaller effects (lower MDE) with the same sample size, or to achieve the same power with a smaller sample size (thus saving time and resources).</li>
</ul>
<div id="9a5835e8" class="cell" data-execution_count="8">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-9-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>CUPED vs T-test power comparison</figcaption>
</figure>
</div>
</div>
</div>
<ul>
<li><strong>Applicability:</strong> It’s particularly useful when pre-experiment data is available and correlates well with the outcome metric.</li>
</ul>
</section>
<section id="limitationsconsiderations" class="level3">
<h3 class="anchored" data-anchor-id="limitationsconsiderations">Limitations/Considerations:</h3>
<ul>
<li><p>Requires pre-experiment data for all users in both groups.</p></li>
<li><p>Needs to handle new users or those with no pre-experiment data (e.g., by imputing the mean).</p></li>
</ul>
</section>
</section>
<section id="what-are-the-key-differences-between-frequentist-and-bayesian-ab-testing-approaches" class="level2">
<h2 class="anchored" data-anchor-id="what-are-the-key-differences-between-frequentist-and-bayesian-ab-testing-approaches">8. What are the key differences between frequentist and Bayesian A/B testing approaches?</h2>
<section id="frequentist-classical-ab-testing" class="level3">
<h3 class="anchored" data-anchor-id="frequentist-classical-ab-testing">Frequentist (Classical) A/B Testing:</h3>
<ul>
<li><p><strong>Core Idea:</strong> Focuses on the probability of observing the data given a specific hypothesis (typically the null hypothesis H0). It uses p-values to determine statistical significance.</p></li>
<li><p><strong>Hypothesis:</strong> Formulates a null hypothesis (e.g., no difference between groups) and an alternative hypothesis H1.</p></li>
<li><p><strong>P-value:</strong> The probability of obtaining results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true.</p></li>
<li><p><strong>Decision Rule:</strong> Compare the p-value to a pre-defined significance level (alpha, e.g., 0.05). If p-value &lt; alpha, reject H0.</p></li>
<li><p><strong>Interpretation:</strong> “There is a X% chance of observing this data if there’s no effect.” Does NOT directly state the probability that H1 is true.</p></li>
<li><p><strong>Stopping Rules:</strong> Requires pre-defined sample sizes or sequential testing methods to control Type I error. Peeking is a major concern.</p></li>
</ul>
</section>
<section id="bayesian-ab-testing" class="level3">
<h3 class="anchored" data-anchor-id="bayesian-ab-testing">Bayesian A/B Testing:</h3>
<ul>
<li><p><strong>Core Idea:</strong> Updates beliefs about parameters (e.g., conversion rates, average revenue) based on observed data. It uses probability distributions to represent knowledge.</p></li>
<li><p><strong>Prior Distribution:</strong> Represents initial beliefs about the parameter before the experiment (e.g., prior knowledge that average conversion is around 5%).</p></li>
<li><p><strong>Likelihood:</strong> The probability of observing the data given different possible parameter values.</p></li>
<li><p><strong>Posterior Distribution:</strong> The updated probability distribution of the parameter after incorporating the observed data. Calculated as <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BPosterior%7D%20%5Cpropto%20%5Ctext%7BLikelihood%7D%20%5Ctimes%20%5Ctext%7BPrior%7D">.</p></li>
<li><p><strong>Decision Rule:</strong> Directly calculates the probability that one variant is better than another (e.g., P(Variant B &gt; Variant A)). A common threshold is 95% or 98%.</p></li>
<li><p><strong>Interpretation:</strong> “There is a X% probability that Variant B is better than Variant A.” This is more intuitive for business stakeholders.</p></li>
<li><p><strong>Stopping Rules:</strong> Allows for continuous monitoring and stopping tests early without inflating Type I error rates, as the posterior distribution continuously updates with new data.</p></li>
</ul>
<div id="f25de0a3" class="cell" data-execution_count="9">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-10-output-1.svg" class="img-fluid figure-img"></p>
<figcaption>Bayesian A/B testing simulation</figcaption>
</figure>
</div>
</div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250729-ab-testing/index_files/figure-html/cell-10-output-2.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="key-advantages-of-bayesian" class="level3">
<h3 class="anchored" data-anchor-id="key-advantages-of-bayesian">Key Advantages of Bayesian:</h3>
<ul>
<li><p><strong>Intuitive Interpretation:</strong> Directly provides probabilities of hypotheses (e.g., “B is better than A”).</p></li>
<li><p><strong>Flexibility:</strong> Easily incorporates prior knowledge, handles unequal sample sizes, and can be used for complex models.</p></li>
<li><p><strong>No Peeking Problem:</strong> Interim analyses are natural, as beliefs are simply updated.</p></li>
</ul>
</section>
<section id="key-disadvantages-of-bayesian" class="level3">
<h3 class="anchored" data-anchor-id="key-disadvantages-of-bayesian">Key Disadvantages of Bayesian:</h3>
<ul>
<li><p><strong>Computational Cost:</strong> Can be more intensive for complex models (though straightforward for common A/B test scenarios).</p></li>
<li><p><strong>Prior Selection:</strong> Requires choosing a prior distribution, which can sometimes be subjective, though with large datasets, the choice of a “non-informative” prior typically has minimal impact.</p></li>
</ul>
</section>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some related articles you might find interesting:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="UHJvZHVjdCUyQ01hcmtldGluZyUyQ1NwYXRpYWwlMkNHZW9zcGF0aWFsJTJDU3RyYXRlZ3k=" data-listing-date-sort="1770422400000" data-listing-file-modified-sort="1770660370665" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2663" data-listing-title-sort="Using Transit Time to Rethink Hotel Search" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260207-transit-time-hotel-search/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260207-transit-time-hotel-search/index.html" class="title listing-title">Using Transit Time to Rethink Hotel Search</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

<tr data-index="1" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9u" data-listing-date-sort="1764115200000" data-listing-file-modified-sort="1767873604969" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="982" data-listing-title-sort="Building an E-Commerce Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251126-e-commerce-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251126-e-commerce-dashboard/index.html" class="title listing-title">Building an E-Commerce Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">5 min</span>
</td>

</tr>

<tr data-index="2" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDU3RhdGlzdGljcyUyQ1I=" data-listing-date-sort="1761868800000" data-listing-file-modified-sort="1767873908846" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1178" data-listing-title-sort="Propensity Score Matching for Causal Analysis" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251028-propensity-score-matching/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251028-propensity-score-matching/index.html" class="title listing-title">Propensity Score Matching for Causal Analysis</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="3" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1761436800000" data-listing-file-modified-sort="1767873994105" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1109" data-listing-title-sort="Building the Analytical Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251026-cfpb-dashboard/0001.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251026-cfpb-dashboard/index.html" class="title listing-title">Building the Analytical Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="4" data-categories="QkklMkNTdGF0aXN0aWNzJTJDTUwlMkNWaXN1YWxpemF0aW9uJTJDUg==" data-listing-date-sort="1758931200000" data-listing-file-modified-sort="1767874188518" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1544" data-listing-title-sort="Building a Credit Risk Dashboard with Power BI and R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250927-credit-risk-analytics/image.png" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250927-credit-risk-analytics/index.html" class="title listing-title">Building a Credit Risk Dashboard with Power BI and R</a>
</td>
<td>
<span class="listing-reading-time">8 min</span>
</td>

</tr>

<tr data-index="5" data-categories="QSUyRkIlMjBUZXN0aW5nJTJDUHJvZHVjdCUyQ1N0YXRpc3RpY3MlMkNS" data-listing-date-sort="1754524800000" data-listing-file-modified-sort="1759267192458" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1187" data-listing-title-sort="Minimum Detectable Effect (MDE) Calculation" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20250807-mde/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250807-mde/index.html" class="title listing-title">Minimum Detectable Effect (MDE) Calculation</a>
</td>
<td>
<span class="listing-reading-time">6 min</span>
</td>

</tr>

<tr data-index="6" data-categories="TWFya2V0aW5nJTJDUHJvZHVjdCUyQ1B5dGhvbg==" data-listing-date-sort="1722816000000" data-listing-file-modified-sort="1770626733464" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1892" data-listing-title-sort="Kano Method for Prioritization of Features" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" data-src="../../posts/20240805-kano-model/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20240805-kano-model/index.html" class="title listing-title">Kano Method for Prioritization of Features</a>
</td>
<td>
<span class="listing-reading-time">10 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>

</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>A/B Testing</category>
  <category>Product</category>
  <category>Statistics</category>
  <guid>https://frequentist.org/posts/20250729-ab-testing/</guid>
  <pubDate>Tue, 29 Jul 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20250729-ab-testing/image.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Animation of Spatial Data</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20250704-animation/</link>
  <description><![CDATA[ 






<p>This is an example of how to create an animated visualization of spatial data using R. The data is sourced from the German Weather Service (Deutscher Wetterdienst, DWD) and includes cloud coverage and density observations from various weather stations across Germany.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="banner.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Image credits: Author"><img src="https://frequentist.org/posts/20250704-animation/banner.png" class="img-fluid figure-img" alt="Image credits: Author"></a></p>
<figcaption>Image credits: Author</figcaption>
</figure>
</div>
<section id="load-the-stations-data" class="level2">
<h2 class="anchored" data-anchor-id="load-the-stations-data">Load the stations data</h2>
<p>Here we will download the stations data from the DWD website. The data contains information about weather stations, including their IDs, names, locations, and the time period they were active.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/subdaily/cloudiness/historical/"</span></span>
<span id="cb1-2">stations_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N_Terminwerte_Beschreibung_Stationen.txt"</span></span>
<span id="cb1-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>)) {</span>
<span id="cb1-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dir.create</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>)</span>
<span id="cb1-5">}</span>
<span id="cb1-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/"</span>, stations_file))) {</span>
<span id="cb1-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download.file</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(url, stations_file), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>, stations_file), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"wb"</span>)</span>
<span id="cb1-8">}</span></code></pre></div></div>
</div>
<p>Let’s read the stations data.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">col_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb2-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"STATIONS_ID"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"von_datum"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bis_datum"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Stationshoehe"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lat"</span>,</span>
<span id="cb2-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lon"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Stationsname"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bundesland"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Abgabe"</span></span>
<span id="cb2-4">)</span>
<span id="cb2-5"></span>
<span id="cb2-6">stations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.fwf</span>(</span>
<span id="cb2-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(</span>
<span id="cb2-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>,</span>
<span id="cb2-9">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N_Terminwerte_Beschreibung_Stationen.txt"</span></span>
<span id="cb2-10">  ),</span>
<span id="cb2-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">widths =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">41</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">41</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">skip =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,</span>
<span id="cb2-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fileEncoding =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Windows-1252"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col.names =</span> col_names</span>
<span id="cb2-13">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.table</span>()</span>
<span id="cb2-14"></span>
<span id="cb2-15"></span>
<span id="cb2-16">stations[, von_datum <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_trim</span>(von_datum), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y%m%d"</span>)]</span>
<span id="cb2-17">stations[, bis_datum <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_trim</span>(bis_datum), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y%m%d"</span>)]</span>
<span id="cb2-18">stations[, lon <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(lon)]</span>
<span id="cb2-19">stations[, lat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(lat)]</span></code></pre></div></div>
</div>
</section>
<section id="read-the-links-to-the-data-files" class="level2">
<h2 class="anchored" data-anchor-id="read-the-links-to-the-data-files">Read the links to the data files</h2>
<p>We will read the HTML content of the DWD website to extract the links to the cloudiness data files. The links will be filtered to include only those that contain the term “terminwerte”.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">page_content <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_html</span>(url)</span>
<span id="cb3-2"></span>
<span id="cb3-3">links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> page_content <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html_nodes</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html_attr</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"href"</span>)</span>
<span id="cb3-6"></span>
<span id="cb3-7">links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> links[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(links) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(links, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"terminwerte"</span>)]</span>
<span id="cb3-8">links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> links <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.table</span>()</span>
<span id="cb3-9">links <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>                                            links
                                           &lt;char&gt;
1: terminwerte_N_00001_19370101_19860630_hist.zip
2: terminwerte_N_00003_18910101_20110331_hist.zip
3: terminwerte_N_00044_19710301_20111231_hist.zip
4: terminwerte_N_00052_19730101_20011231_hist.zip
5: terminwerte_N_00061_19750701_19780831_hist.zip
6: terminwerte_N_00070_19730601_19860930_hist.zip</code></pre>
</div>
</div>
<p>Extract the station IDs from the links. The station IDs are 5-digit numbers that are part of the file names.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">links[, STATIONS_ID <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract</span>(links, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"[0-9]{5}"</span>)]</span>
<span id="cb5-2">links[, STATIONS_ID <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.integer</span>(STATIONS_ID)]</span></code></pre></div></div>
</div>
</section>
<section id="filter-the-stations-data" class="level2">
<h2 class="anchored" data-anchor-id="filter-the-stations-data">Filter the stations data</h2>
<p>We will download only those stations data that were active during the specified time period.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">stations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> stations[von_datum <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2023-12-01"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> bis_datum <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-01-01"</span>]</span>
<span id="cb6-2">links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> links[stations, on <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"STATIONS_ID"</span>]</span></code></pre></div></div>
</div>
</section>
<section id="download-and-process-the-data-files" class="level2">
<h2 class="anchored" data-anchor-id="download-and-process-the-data-files">Download and process the data files</h2>
<p>In this section, we will download the data files from the DWD website and process them to extract the cloud coverage and density observations. The data will be stored in a DuckDB database which is useful if we need to reuse the data later without downloading and parsing it again.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># check if the files exist</span></span>
<span id="cb7-2">files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list.files</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">full.names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb7-3">files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> files[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(files, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"produkt_n_termin"</span>)]</span>
<span id="cb7-4"></span>
<span id="cb7-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(files) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) {</span>
<span id="cb7-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (link <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> links<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>links) {</span>
<span id="cb7-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download.file</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(url, link), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(</span>
<span id="cb7-8">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>, link</span>
<span id="cb7-9">    ), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"wb"</span>)</span>
<span id="cb7-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unzip</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/"</span>, link), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exdir =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>)</span>
<span id="cb7-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/Metadaten*"</span>)</span>
<span id="cb7-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/*.html"</span>)</span>
<span id="cb7-13">  }</span>
<span id="cb7-14">}</span>
<span id="cb7-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/*.zip"</span>)</span></code></pre></div></div>
</div>
<p>Here is where parsing is done.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">con <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbConnect</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">duckdb</span>(),</span>
<span id="cb8-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dbdir =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"db"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"weather.duckdb"</span>)</span>
<span id="cb8-3">)</span>
<span id="cb8-4">tables <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SHOW ALL TABLES;"</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"name"</span>]</span>
<span id="cb8-5">start_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2023-12-01"</span></span>
<span id="cb8-6"></span>
<span id="cb8-7"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloudiness"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> tables<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>name)) {</span>
<span id="cb8-8">  files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list.files</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">full.names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb8-9">  files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> files[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(files, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"produkt_n_termin"</span>)]</span>
<span id="cb8-10"></span>
<span id="cb8-11">  observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.table</span>()</span>
<span id="cb8-12"></span>
<span id="cb8-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (file <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> files) {</span>
<span id="cb8-14">    temp_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">";"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.table</span>()</span>
<span id="cb8-15">    temp_data[, MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_trim</span>(MESS_DATUM), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y%m%d"</span>)]</span>
<span id="cb8-16">    temp_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> temp_data[MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> start_date]</span>
<span id="cb8-17">    observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(</span>
<span id="cb8-18">      observations,</span>
<span id="cb8-19">      temp_data</span>
<span id="cb8-20">    )</span>
<span id="cb8-21">  }</span>
<span id="cb8-22"></span>
<span id="cb8-23">  observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[N_TER <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">999</span>]</span>
<span id="cb8-24">  observations[, CD_TER <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(CD_TER <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">999</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, CD_TER)]</span>
<span id="cb8-25"></span>
<span id="cb8-26">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"db"</span>)) {</span>
<span id="cb8-27">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dir.create</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"db"</span>)</span>
<span id="cb8-28">  }</span>
<span id="cb8-29"></span>
<span id="cb8-30">  con <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbConnect</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">duckdb</span>(),</span>
<span id="cb8-31">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dbdir =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(</span>
<span id="cb8-32">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"db"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"weather.duckdb"</span></span>
<span id="cb8-33">    )</span>
<span id="cb8-34">  )</span>
<span id="cb8-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbWriteTable</span>(con, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloudiness"</span>, observations, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb8-36">}</span>
<span id="cb8-37"></span>
<span id="cb8-38">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbGetQuery</span>(</span>
<span id="cb8-39">  con,</span>
<span id="cb8-40">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SELECT * FROM cloudiness WHERE MESS_DATUM &gt;= '%s'"</span>, start_date)</span>
<span id="cb8-41">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.table</span>()</span>
<span id="cb8-43"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbDisconnect</span>(con)</span>
<span id="cb8-44"></span>
<span id="cb8-45">observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 231,108
Columns: 6
$ STATIONS_ID &lt;int&gt; 4024, 4024, 4024, 4024, 4024, 4024, 4024, 4024, 4024, 4024…
$ MESS_DATUM  &lt;date&gt; 2023-12-01, 2023-12-01, 2023-12-01, 2023-12-02, 2023-12-0…
$ QN_4        &lt;int&gt; 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9…
$ N_TER       &lt;int&gt; 4, 5, 8, 8, 8, 8, 8, 7, 8, 8, 8, 8, 7, 8, 8, 8, 7, 8, 8, 8…
$ CD_TER      &lt;lgl&gt; NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ eor         &lt;chr&gt; "eor", "eor", "eor", "eor", "eor", "eor", "eor", "eor", "e…</code></pre>
</div>
</div>
<p>As the data contains multiple observations per day for each station, we will aggregate the data to get the average cloud coverage and density per day per station.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[, .(</span>
<span id="cb10-2">  .N,</span>
<span id="cb10-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cloud_coverage =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(N_TER, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb10-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cloud_density =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(CD_TER, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb10-5">), by <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"STATIONS_ID"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MESS_DATUM"</span>)]</span></code></pre></div></div>
</div>
<p>Let’s plot the cloud coverage for a specific station as a time series to visualize the data.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">observations[MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2024-01-01"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> STATIONS_ID <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">433</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> MESS_DATUM, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> cloud_coverage)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rollapply</span>(cloud_coverage,</span>
<span id="cb11-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">FUN =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">partial =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb11-6">  )), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cloud Coverage"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_gray</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://frequentist.org/posts/20250704-animation/index_files/figure-html/plot-cloud-coverage-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="add-h3-addresses" class="level2">
<h2 class="anchored" data-anchor-id="add-h3-addresses">Add H3 addresses</h2>
<p>To visualize the data on a map, we will convert the latitude and longitude coordinates of the stations into H3 addresses. H3 is a geospatial indexing system that allows us to represent geographic locations as hexagonal cells.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">points <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> stations[, .(lon, lat)] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>()</span>
<span id="cb12-2"></span>
<span id="cb12-3">points[, h3_address <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">point_to_cell</span>(points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">res =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)]</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Assuming columns 1 and 2 contain x, y coordinates in EPSG:4326</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">stations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> stations[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb14-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"STATIONS_ID"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"von_datum"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bis_datum"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Stationshoehe"</span>,</span>
<span id="cb14-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lat"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lon"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Stationsname"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bundesland"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Abgabe"</span></span>
<span id="cb14-4">)]</span>
<span id="cb14-5"></span>
<span id="cb14-6">stations[points, on <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> .(lon, lat), h3_address <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> h3_address]</span>
<span id="cb14-7"></span>
<span id="cb14-8">stations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> stations[, geometry <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_to_polygon</span>(h3_address, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">simple =</span> F)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]]</span></code></pre></div></div>
</div>
<p>Load the boundaries of Germany to use as a background for the map.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">boundaries <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geoboundaries</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Germany"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">release_type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gbOpen"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">adm_lvl =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"adm1"</span>)</span></code></pre></div></div>
</div>
<p>Join the stations data with the observations data to have the geometry of the stations in the observations data. Calculate mean for each H3 address and drop duplicates.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[stations, on <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"STATIONS_ID"</span>]</span>
<span id="cb16-2">observations[, coud_coverage <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(cloud_coverage, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>), by <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> h3_address]</span>
<span id="cb16-3">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h3_address"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MESS_DATUM"</span>))</span></code></pre></div></div>
</div>
</section>
<section id="create-animations-of-cloud-coverage-in-germany" class="level2">
<h2 class="anchored" data-anchor-id="create-animations-of-cloud-coverage-in-germany">Create animations of cloud coverage in Germany</h2>
<p>In the following section, we will create animations of cloud coverage in Germany using the observations data in 2024. This code creates a series of maps showing the average cloud coverage for each day in 2024, with a rolling average of 7 days to smooth the data.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">min_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[, MESS_DATUM] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb17-2">max_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[, MESS_DATUM] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb17-3"></span>
<span id="cb17-4">min_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(min_date, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2024-01-01"</span>)))</span>
<span id="cb17-5">max_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(max_date, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.Date</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2024-12-31"</span>)))</span>
<span id="cb17-6"></span>
<span id="cb17-7">max_coverage <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[(MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> min_date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span></span>
<span id="cb17-8">  (MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> max_date), cloud_coverage] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb17-9">min_coverage <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> observations[(MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> min_date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span></span>
<span id="cb17-10">  (MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> max_date), cloud_coverage] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb17-11"></span>
<span id="cb17-12">table_dates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(min_date <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, max_date <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.table</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MESS_DATUM =</span> V1)</span>
<span id="cb17-15">observations <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> table_dates[observations, on <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MESS_DATUM"</span>]</span>
<span id="cb17-16"></span>
<span id="cb17-17">observations[, cloud_coverage_r7 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rollapply</span>(cloud_coverage,</span>
<span id="cb17-18">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">FUN =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"center"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">partial =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb17-19">),</span>
<span id="cb17-20">by <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> STATIONS_ID</span>
<span id="cb17-21">]</span>
<span id="cb17-22"></span>
<span id="cb17-23">dates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(min_date, max_date, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb17-24"></span>
<span id="cb17-25"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"figures"</span>)) {</span>
<span id="cb17-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dir.create</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"figures"</span>)</span>
<span id="cb17-27">}</span>
<span id="cb17-28"></span>
<span id="cb17-29"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (d <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(dates)) {</span>
<span id="cb17-30">  p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> cloud_coverage_r7),</span>
<span id="cb17-31">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> observations[MESS_DATUM <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> d] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>()</span>
<span id="cb17-32">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> boundaries, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray78"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray54"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-34">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_sf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">geometry =</span> geometry), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray78"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-35">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_whitebox_c</span>(</span>
<span id="cb17-36">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">palette =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"deep"</span>,</span>
<span id="cb17-37">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb17-38">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(min_coverage, max_coverage)</span>
<span id="cb17-39">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-40">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">default_crs =</span> sf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_crs</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4326</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-41">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_void</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-42">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb17-43">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bottom"</span>,</span>
<span id="cb17-44">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.key.height =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unit</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pt"</span>),</span>
<span id="cb17-45">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.key.width =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unit</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pt"</span>),</span>
<span id="cb17-46">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.title.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"top"</span>,</span>
<span id="cb17-47">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.minor =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb17-48">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb17-49">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.background =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_rect</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>),</span>
<span id="cb17-50">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gray35"</span>)</span>
<span id="cb17-51">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-52">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> d, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cloud Coverage"</span>)</span>
<span id="cb17-53"></span>
<span id="cb17-54">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggsave</span>(</span>
<span id="cb17-55">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(</span>
<span id="cb17-56">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"figures"</span>,</span>
<span id="cb17-57">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloudiness-"</span>, d, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".png"</span>)</span>
<span id="cb17-58">    ),</span>
<span id="cb17-59">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot =</span> p,</span>
<span id="cb17-60">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">units =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"px"</span>,</span>
<span id="cb17-61">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1200</span>,</span>
<span id="cb17-62">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">height =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1200</span>,</span>
<span id="cb17-63">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dpi =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span>
<span id="cb17-64">  )</span>
<span id="cb17-65">}</span></code></pre></div></div>
</div>
</section>
<section id="create-a-gif-animation" class="level2">
<h2 class="anchored" data-anchor-id="create-a-gif-animation">Create a GIF animation</h2>
<p>Finally, we will create a GIF animation from the generated PNG files. The GIF will show the cloud coverage in Germany over the course of 2024, with each frame representing a day.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">png_files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list.files</span>(</span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"figures"</span>),</span>
<span id="cb18-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">full.names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pattern =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloudiness.+</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.png"</span></span>
<span id="cb18-4">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb18-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sort</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb18-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>()</span>
<span id="cb18-7"></span>
<span id="cb18-8">gif_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(</span>
<span id="cb18-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"animation-cloudiness.gif"</span></span>
<span id="cb18-10">)</span>
<span id="cb18-11"></span>
<span id="cb18-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gifski</span>(png_files, gif_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1200</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">height =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1200</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">delay =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">loop =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "/mnt/Projects/Blog/posts/20250704-animation/animation-cloudiness.gif"</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">here</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"posts"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20250704-animation"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"figures"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cloudiness*"</span>))</span></code></pre></div></div>
</div>
<p><img src="https://frequentist.org/posts/20250704-animation/animation-cloudiness.gif" class="img-fluid"></p>
<p>The resulting GIF animation shows the cloud coverage in Germany for each day in 2024, with a rolling average of 7 days to smooth the data. The animation provides a clear visual representation of how cloud coverage changed over time across different regions in Germany.</p>
<p>Source code available in the repository: <a href="https://github.com/AxesAccess/Animations-in-R" target="_blank">Animations-in-R</a></p>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Here are some related posts that you might find interesting:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="TUwlMkNBbmltYXRpb24lMkNS" data-listing-date-sort="1771027200000" data-listing-file-modified-sort="1770982202443" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="9" data-listing-word-count-sort="1687" data-listing-title-sort="Implementing a Neural Network in Base R" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260214-backpropagating-love/image.gif" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260214-backpropagating-love/index.html" class="title listing-title">Implementing a Neural Network in Base R</a>
</td>
<td>
<span class="listing-reading-time">9 min</span>
</td>

</tr>

<tr data-index="1" data-categories="UHJvZHVjdCUyQ01hcmtldGluZyUyQ1NwYXRpYWwlMkNHZW9zcGF0aWFsJTJDU3RyYXRlZ3k=" data-listing-date-sort="1770422400000" data-listing-file-modified-sort="1770660370665" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2663" data-listing-title-sort="Using Transit Time to Rethink Hotel Search" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20260207-transit-time-hotel-search/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20260207-transit-time-hotel-search/index.html" class="title listing-title">Using Transit Time to Rethink Hotel Search</a>
</td>
<td>
<span class="listing-reading-time">14 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>
</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>Visualization</category>
  <category>Spatial</category>
  <category>Animation</category>
  <category>R</category>
  <guid>https://frequentist.org/posts/20250704-animation/</guid>
  <pubDate>Fri, 04 Jul 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20250704-animation/image.gif" medium="image" type="image/gif"/>
</item>
<item>
  <title>Product Cards Creation Application</title>
  <dc:creator>Aleksei Prishchepo</dc:creator>
  <link>https://frequentist.org/posts/20250531-content-mate/</link>
  <description><![CDATA[ 






<p>This application is designed to create product cards for an online store. It utilizes a large language model (LLM) and programming libraries to generate detailed descriptions based on product specifications, images, and files. This allows a business to streamline the process of crafting informative and engaging product listings with minimal manual effort. This enhances both productivity and content quality.</p>
<section id="application-features" class="level2">
<h2 class="anchored" data-anchor-id="application-features">Application Features</h2>
<p>See the detailed description of the application features in the <a href="https://frequentist.org/projects/">Projects</a> section: <a href="https://frequentist.org/projects/webapp-content-mate/" target="_blank">Product Cards Creation Application</a>.</p>
</section>
<section id="discussion-on-linkedin" class="level2">
<h2 class="anchored" data-anchor-id="discussion-on-linkedin">Discussion on LinkedIn</h2>
<p>Here is a LinkedIn post where I discuss how I created this AI application:</p>
<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:ugcPost:7350848934489063424?collapsed=1" height="568" width="100%" frameborder="0" allowfullscreen="1" title="How I created an AI application from scratch">
</iframe>
</section>
<section id="see-also" class="level2">
<h2 class="anchored" data-anchor-id="see-also">See Also</h2>
<p>Below you can find some related articles on building AI applications and working with LLMs:</p>
<div id="listing-posts" class="quarto-listing quarto-listing-container-table">
<table class="quarto-listing-table table">
<thead>
<tr>

<th>
 
</th>

<th>
Title
</th>

<th>
Reading Time
</th>

</tr>
</thead>
<tbody class="list">

<tr data-index="0" data-categories="QWdlbnRzJTJDTExNJTJDQXBwJTJDUHl0aG9u" data-listing-date-sort="1765584000000" data-listing-file-modified-sort="1773058068416" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1393" data-listing-title-sort="Agentic vs Deterministic Workflows: Designing a Reliable AI Application" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20251213-agentic-vs-deterministic/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20251213-agentic-vs-deterministic/index.html" class="title listing-title">Agentic vs Deterministic Workflows: Designing a Reliable AI Application</a>
</td>
<td>
<span class="listing-reading-time">7 min</span>
</td>

</tr>

<tr data-index="1" data-categories="UkFHJTJDTkxQJTJDTExNJTJDUHl0aG9u" data-listing-date-sort="1742515200000" data-listing-file-modified-sort="1748974609635" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2344" data-listing-title-sort="Implementing a Local Retrieval-Augmented Generation System" data-listing-filename-sort="index.qmd">
<td>
<span class="listing-image"><img loading="lazy" src="https://frequentist.org/posts/20250321-rag/image.svg" style="height: 40px;"></span>
</td>
<td>
<a href="../../posts/20250321-rag/index.html" class="title listing-title">Implementing a Local Retrieval-Augmented Generation System</a>
</td>
<td>
<span class="listing-reading-time">12 min</span>
</td>

</tr>

</tbody>
</table>
<div class="listing-no-matching d-none">No matching items</div>
</div>



</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <category>App</category>
  <category>LLM</category>
  <category>Python</category>
  <guid>https://frequentist.org/posts/20250531-content-mate/</guid>
  <pubDate>Sat, 31 May 2025 00:00:00 GMT</pubDate>
  <media:content url="https://frequentist.org/posts/20250531-content-mate/mark-konig-Tl8mDaue_II-unsplash_square.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
