LinkedIn Analytics Web Application

A local-first web application that transforms LinkedIn Takeout exports into structured analytics on roles, industries, and geographic reach using NLP, unsupervised learning, and asynchronous data processing.
Webapp
Shiny for Python
Python
Featured
Author

Aleksei Prishchepo

Published

December 27, 2025

Project Overview

My Professional Network is a local-first privacy-preserving analytics platform that transforms raw LinkedIn Takeout data into an interactive dashboard. The project demonstrates full-stack data science skills by combining ETL, NLP, unsupervised learning, and asynchronous web architecture to help users understand and strategically grow their professional networks.

NoteRole

Full-Stack Data Scientist / Systems Engineer

NoteDomain

Network Analysis, NLP, Personal Analytics

NoteTools

Python, Shiny for Python, PostgreSQL, Redis, Celery, Docker, Hugging Face embeddings, Google Maps API

Key Features & Problems Solved

Personal network intelligence from raw platform exports

Follow the link to explore the application:

My Professional Network

Converts static LinkedIn archives into structured, queryable data without relying on third-party SaaS or cloud lock-in.

Semantic role clustering

Uses vector embeddings and unsupervised learning to group thousands of heterogeneous job titles into meaningful professional clusters.

Network structure and reach analysis

Reveals dominant industries, role distributions, and underrepresented areas within a personal network.

Geographic footprint visualization

Maps global connections by resolving ambiguous location strings into precise coordinates.

Local-first, privacy-preserving analytics

All data processing, modeling, and visualization run locally, keeping sensitive career data fully under user control.

Implementation Highlights

Microservices-based local architecture

Orchestrated with Docker Compose, separating web UI, background workers, database, cache, and ML inference services for scalability and isolation.

Asynchronous processing with Celery & Redis

Long-running tasks (parsing profiles, embedding generation, geocoding) execute in the background, keeping the UI responsive.

Local NLP with vector embeddings

Job titles are embedded using a locally hosted Hugging Face inference service, with aggressive caching to reduce recomputation and latency.

Unsupervised clustering pipeline

Combines dimensionality reduction (LSA / SVD) with K-Means, including a custom heuristic for automatic cluster labeling and long-tail handling.

Optimized geocoding strategy

Deduplicates and persists resolved locations to minimize API usage and control operational costs.

Outcomes & Impact

  • Turned opaque, static LinkedIn exports into a living analytical system.

  • Demonstrated how data science, backend engineering, and UX can coexist in a single cohesive product.

  • Showcased trade-offs around performance, privacy, scalability, and maintainability.

  • Created a foundation for future extensions such as graph databases and conversational network analysis.

Skills Demonstrated

Data Engineering • NLP & Embeddings • Unsupervised Learning • Network Analysis • Asynchronous Systems • Docker & DevOps • Full-Stack Data Science • Privacy-First System Design • System Architecture

Apply This to Your Business

If you have a business problem that requires data-driven solutions, feel free to reach out via contact page to discuss how I can help leverage data science, analytics, and automation to drive value for your organization.

See Also

Back to top