Jobs Insights Project

Infrastructure & Architecture

Jobs Pipeline

Project background

With this project, I wanted to both demonstrate a broad range of data and BI skills and gain practical insights into the job market to support my own career transition.

Data selection

Initially, I considered using sample datasets from Kaggle or Hugging Face, but after researching publicly available options, the Reed.co.uk API stood out as a richer, more realistic source for a project centred around analytics, ETL, and reporting.

Skills demonstration

I designed this project as a way to deepen my practical experience with Python, ETL development, cloud services, and end-to-end data pipeline design. The goal was to build something real and production-like, rather than follow isolated tutorials.

To accelerate development and expose myself to best-practice patterns, I used GitHub Copilot within VS Code. Copilot helped generate scaffolding for functions, classes, retries, pagination, and error-handling logic. However, I designed the overall architecture, data flow, transformations, and schema myself, and iteratively refined, debugged, and reworked the Python modules as my understanding grew.

Working with AI-assisted coding helped me:

Learn Python syntax and structure more quickly
Understand common ETL and API integration patterns
Focus on problem-solving rather than boilerplate
Build modular, reusable components
Adopt better coding practices over time

As I am new to Python I don’t claim to fully understand every line of the generated code yet, I do understand the structure, flow and purpose of each component. This project was intentionally a practical learning exercise, and there are still areas particularly around advanced patterns and optimisation that I am actively developing.

Debugging, integrating, and refining the solution gave me a strong foundation, and I continue to revisit and improve sections as my fluency in Python increases.

The project provided hands-on experience with:

API ingestion (pagination, filtering, retries, error handling)
Python ETL workflows (Pandas, JSON parsing, cleaning logic)
Azure Functions for scheduled ingestion
Azure Storage Containers for raw and staged data
Azure Key Vault for secure secret management
PostgreSQL schema design and loading
Dimensional modelling (fact and dimension tables)
Power BI modelling, DAX, and semantic layer design

Although I am still strengthening my Python capability particularly around structuring larger applications and testing, this project demonstrates my ability to design solutions, learn quickly, and deliver working, cloud-based data pipelines using modern tools and BI practices.

If you are interested in looking at the code which was developed, this is available via the public repo found here - https://github.com/graemeboulton/job-insights-project