Processing of Financial Transactions for Market Analysis of Shopping Centers
The goal of this project is to generate a dashboard for visualizing consumer trends among individual customers in Spanish shopping centers. My role in the project focuses on data processing, which is later integrated into a custom-built dashboard by another team.
The raw data consists of hundreds of millions of individual customer transactions. Each transaction corresponds to a checking account or card operation carried out by those customers (whose data has been properly anonymized). These raw data are processed using PySpark algorithms in Databricks (the project is cloud-based due to data security requirements) to extract key information from each transaction, such as the store, shopping center, and city where the purchase took place, as well as to assign each transaction to predefined categories and subcategories.
The algorithms I develop are regularly shared with the data engineering team, who integrate them into the full processing pipeline. This pipeline generates the datasets that are subsequently used by the various dashboards.
These dashboards are ultimately used by the end client (a shopping center management company) to support business decision-making.
Project dates: June 2024 – Ongoing
Tools used:
- PySpark on Databricks
- Excel