%%{init: {"flowchart": {"htmlLabels": false}} }%%
flowchart LR
subgraph Process
A
B(Manipulate data in SQL) --> C[Visualize in R]
end
A[Call API in Python] --> Z[(Setup DB in R)]
Z[(Setup DB in R)] --> B(Manipulate data in SQL)
classDef white fill: #fff
class Process white
Job hunt in Germany - featuring Python, R, and SQL
Reporting with Quarto
Load, manipulate, and visualize open positions in Berlin.
Download data using Python
- search for positions including ‘data’
python
import requests
import pandas as pd
# build query-url
where = 'Berlin'; what = 'data'
url = 'https://rest.arbeitsagentur.de/jobboerse/jobsuche-service/pc/v4/jobs?'
query = f'{url}&was={what}&wo={where}&veroeffentlichtseit=30&size=100&'
# make & parse query
result = requests.get(query, headers = {'X-API-Key' : key}) # key from official documentation
offers = pd.DataFrame(result.json()["stellenangebote"])
# unnest list columns & merge
locations = offers['arbeitsort'].apply(pd.Series)
coords = locations['koordinaten'].apply(pd.Series)
offers_wide = pd.concat([offers, locations, coords], axis = 1)
# subset
offers_wide_subset = offers_wide[['titel', 'arbeitgeber', 'ort', 'lat', 'lon']]The query yields 100 open positions, which are written to the database and subsequently filtered.
Mimic SQL database using R
R
library(reticulate)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
dbWriteTable(con, "jobs", py$offers_wide_subset)Subset data using SQL
SQL
SELECT * FROM jobs
WHERE titel LIKE '%Analyst%' OR titel LIKE '%Business%';- output limited to the following
| titel | arbeitgeber | ort | lat | lon |
|---|---|---|---|---|
| Big Data Analyst | alfatraining Bildungszentrum GmbH | Berlin | 52.52408 | 13.41371 |
| Data Analyst:in mit Schwerpunkt Marketing | VE Sales GmbH | Berlin | 52.49268 | 13.40258 |
| Senior Data Analyst - Digital Marketing (w/m/d) | Mercedes-Benz AG | Berlin | 52.50609 | 13.43890 |
Visualize locations using R
- the outputted table is handed over to R as
table_jobs
R
library(leaflet)
leaflet() %>%
addTiles() %>%
addMarkers(lat = table_jobs$lat, lng = table_jobs$lon, label = table_jobs$titel)References
- https://jobsuche.api.bund.dev/
- https://github.com/bundesAPI/jobsuche-api
- https://medium.com/@skyetetra/pretending-to-have-a-sql-database-with-rstudio-e80d9a1338b1
- https://lazyanalyst.medium.com/sql-in-r-markdown-2ceffeb7df4
- https://chat.openai.com/
- https://campus.datacamp.com/courses/building-dashboards-with-flexdashboard/data-visualization-for-dashboards?ex=10
- https://github.com/mermaid-js/mermaid/issues/391