Job hunt in Germany - featuring Python, R, and SQL

Reporting with Quarto

Author

Nico Blokker

Published

03-09-2023


Load, manipulate, and visualize open positions in Berlin.

%%{init: {"flowchart": {"htmlLabels": false}} }%%
flowchart LR
  subgraph Process
     A 
     B(Manipulate data in SQL) --> C[Visualize in R] 
  end
  A[Call API in Python] --> Z[(Setup DB in R)]
  Z[(Setup DB in R)] --> B(Manipulate data in SQL)
  
  classDef white fill: #fff
  class Process white

Download data using Python

  • search for positions including ‘data’
python
import requests
import pandas as pd

# build query-url
where = 'Berlin'; what = 'data'
url = 'https://rest.arbeitsagentur.de/jobboerse/jobsuche-service/pc/v4/jobs?'
query = f'{url}&was={what}&wo={where}&veroeffentlichtseit=30&size=100&'

# make & parse query
result = requests.get(query, headers = {'X-API-Key' : key}) # key from official documentation
offers = pd.DataFrame(result.json()["stellenangebote"])

# unnest list columns & merge
locations = offers['arbeitsort'].apply(pd.Series) 
coords = locations['koordinaten'].apply(pd.Series)  
offers_wide = pd.concat([offers, locations, coords], axis = 1) 

# subset
offers_wide_subset = offers_wide[['titel', 'arbeitgeber', 'ort', 'lat', 'lon']]

The query yields 100 open positions, which are written to the database and subsequently filtered.

Mimic SQL database using R

R
library(reticulate)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
dbWriteTable(con, "jobs", py$offers_wide_subset)

Subset data using SQL

SQL
SELECT * FROM jobs
WHERE titel LIKE '%Analyst%' OR titel LIKE '%Business%';
  • output limited to the following
3 records
titel arbeitgeber ort lat lon
Big Data Analyst alfatraining Bildungszentrum GmbH Berlin 52.52408 13.41371
Data Analyst:in mit Schwerpunkt Marketing VE Sales GmbH Berlin 52.49268 13.40258
Senior Data Analyst - Digital Marketing (w/m/d) Mercedes-Benz AG Berlin 52.50609 13.43890

Visualize locations using R

  • the outputted table is handed over to R as table_jobs
R
library(leaflet)
leaflet() %>%
  addTiles() %>%
  addMarkers(lat = table_jobs$lat, lng = table_jobs$lon, label = table_jobs$titel)

References

  • https://jobsuche.api.bund.dev/
  • https://github.com/bundesAPI/jobsuche-api
  • https://medium.com/@skyetetra/pretending-to-have-a-sql-database-with-rstudio-e80d9a1338b1
  • https://lazyanalyst.medium.com/sql-in-r-markdown-2ceffeb7df4
  • https://chat.openai.com/
  • https://campus.datacamp.com/courses/building-dashboards-with-flexdashboard/data-visualization-for-dashboards?ex=10
  • https://github.com/mermaid-js/mermaid/issues/391