%%{init: {"flowchart": {"htmlLabels": false}} }%% flowchart LR subgraph Process A B(Manipulate data in SQL) --> C[Visualize in R] end A[Call API in Python] --> Z[(Setup DB in R)] Z[(Setup DB in R)] --> B(Manipulate data in SQL) classDef white fill: #fff class Process white
Job hunt in Germany - featuring Python, R, and SQL
Reporting with Quarto
Load, manipulate, and visualize open positions in Berlin.
Download data using Python
- search for positions including ‘data’
python
import requests
import pandas as pd
# build query-url
= 'Berlin'; what = 'data'
where = 'https://rest.arbeitsagentur.de/jobboerse/jobsuche-service/pc/v4/jobs?'
url = f'{url}&was={what}&wo={where}&veroeffentlichtseit=30&size=100&'
query
# make & parse query
= requests.get(query, headers = {'X-API-Key' : key}) # key from official documentation
result = pd.DataFrame(result.json()["stellenangebote"])
offers
# unnest list columns & merge
= offers['arbeitsort'].apply(pd.Series)
locations = locations['koordinaten'].apply(pd.Series)
coords = pd.concat([offers, locations, coords], axis = 1)
offers_wide
# subset
= offers_wide[['titel', 'arbeitgeber', 'ort', 'lat', 'lon']] offers_wide_subset
The query yields 100 open positions, which are written to the database and subsequently filtered.
Mimic SQL database using R
R
library(reticulate)
library(DBI)
<- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
con dbWriteTable(con, "jobs", py$offers_wide_subset)
Subset data using SQL
SQL
SELECT * FROM jobs
WHERE titel LIKE '%Analyst%' OR titel LIKE '%Business%';
- output limited to the following
titel | arbeitgeber | ort | lat | lon |
---|---|---|---|---|
Big Data Analyst | alfatraining Bildungszentrum GmbH | Berlin | 52.52408 | 13.41371 |
Data Analyst:in mit Schwerpunkt Marketing | VE Sales GmbH | Berlin | 52.49268 | 13.40258 |
Senior Data Analyst - Digital Marketing (w/m/d) | Mercedes-Benz AG | Berlin | 52.50609 | 13.43890 |
Visualize locations using R
- the outputted table is handed over to R as
table_jobs
R
library(leaflet)
leaflet() %>%
addTiles() %>%
addMarkers(lat = table_jobs$lat, lng = table_jobs$lon, label = table_jobs$titel)
References
- https://jobsuche.api.bund.dev/
- https://github.com/bundesAPI/jobsuche-api
- https://medium.com/@skyetetra/pretending-to-have-a-sql-database-with-rstudio-e80d9a1338b1
- https://lazyanalyst.medium.com/sql-in-r-markdown-2ceffeb7df4
- https://chat.openai.com/
- https://campus.datacamp.com/courses/building-dashboards-with-flexdashboard/data-visualization-for-dashboards?ex=10
- https://github.com/mermaid-js/mermaid/issues/391