Data Report

Authors

Laura Conti

Sabrina Popp

Alex Leccadito

John Truninger

Published

June 11, 2023

1 Raw Data

The raw data sets can all be found in the directory data/01_raw. Each data set is then located in its own folder, which includes a spreadsheet (.xlsx, .xls, .csv or .ods), as well as the URL to the source for downloading the data.

Number	Dataset Name	Source	Folder in data/01_raw
R01	Swiss Real Estate Offer Index	`https://www.iazicifi.ch/`	01_Swiss Real Estate Offer Index
R02	Distribution of population by tenure status	`https://ec.europa.eu/`	02_Distribution of population by tenure status
R03	Average sales price of new and existing residential property	`https://www.statista.com/`	03_Average price per square meter in european countries
R04	General overview “Buildings” by cantons 2021	`https://www.bfs.admin.ch/`	04_Allgemeine Übersicht Gebäude nach Kantonen 2021
R05	Structure of the permanent resident population by canton, 1999-2021	`https://www.bfs.admin.ch/`	05_Struktur der ständigen Wohnbevölkerung nach Kanton, 1999-2021
R06	Swiss Construction Price Index - Average unit prices in Switzerland and the major regions	`https://www.bfs.admin.ch/`	06_Schweizerischer Baupreisindex - Durchschnittliche Einheitspreise in der Schweiz und in den Grossregionen
R07	Taxes & Public Finances	`https://www.statista.com/`	07_Steuern_&_Staatsfinanzen

Code

from pathlib import Path
import numpy as np
import pandas as pd
import openpyxl

import warnings
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

Code

data_path = Path('../data')
raw_data = Path(data_path, '01_raw')
prc_data = Path(data_path, '02_processed')
exp_data = Path(data_path, '03_exports')

1.1 Dataset R01: Swiss Real Estate Offer Index

1.1.1 Details Dataset

The source of this dataset consists of an excel file with multiple spreadsheets. Collectivly, it shows the monthly average price of real estate in selected regions of switzerland.

Sheet: Miet_Indizes: This sheet contains the monthly averaged offer prices for rental appartments in selected regions of switherland.
- Description of rows:
  - respective month index for the body of the dataset
- Description of columns:
  - t: date of the index in format dd.mm.yyyy
  - Eigenthumswohnungen: prices for condominiums in switzerland
  - Einfamilienhäuser: prices for houses in switzerland
Sheet: Preis_Indizes: This sheet contains the monthly averaged price offers for condominiumns and houses in switzerland.
- Description of rows:
  - respective month index for the body of the dataset
- Description of columns:
  - t: date of the index in format dd.mm.yyyy
  - Eigenthumswohnungen: prices for condominiums in switzerland
  - Einfamilienhäuser: prices for houses in switzerland

This dataset was created by the company IAZI AG - CIFI SA and ImmoScout24. The Swiss Real Estate Offer Index is a real estate index family. They are the world’s first hedonic indices that are calculated and updated in real time. Thanks to the continuous updating and their methodology, they allow the timely monitoring of the development of offer prices and offer rents on the largest real estate platform in Switzerland.

In the following steps, the data will be extracted from 2 sheets and in the end combined to create the raw dataset R01.

1.1.2 Data Quality

The data is already preprocessed which means, handling NaN values won’t be necessairy. However since two sheet will be merged, a cut-off from a specific year is nescessairy since the two subdatasets do not have the same time range.

Code

rDataSet_10 = Path(
    raw_data, 
    '01_Swiss Real Estate Offer Index', 
    'Swiss Real Estate Offer Index.xlsx'
)

1.1.3 Data Extraction & Cleansing

As before mentioned a new dataset will be created through extracting the specific sheets from the main dataset and combining them into one dataset.

Data extraction and reshaping Miet_Indizes

Code

dataframe_10 = pd.read_excel(
    rDataSet_10, 
    sheet_name='Miet_Indizes',
    skiprows=6,
    skipfooter=3,
)

# drop unused columns
dataframe_10 = dataframe_10.drop(
    dataframe_10.columns[2:], 
    axis=1
)

# rename columns
dataframe_10 = dataframe_10.rename(
    columns={
        't': 'date',
        'CH': 'rental_index'
    }
)

# format column date to datetime
dataframe_10['date'] = pd.to_datetime(
                            dataframe_10['date'],
                            dayfirst=True
                       ).dt.date


# reset index & sort by date
dataframe_10.reset_index(drop=True, inplace=True)
dataframe_10.sort_values(by='date', inplace=True)

dataframe_10.head()

	date	rental_index
0	2014-12-31	260.82
1	2015-01-31	259.36
2	2015-02-28	258.83
3	2015-03-31	258.10
4	2015-04-30	261.57

Data extraction and reshaping Preis_Indizes

Code

dataframe_11 = pd.read_excel(
    rDataSet_10, 
    sheet_name='Preis_Indizes',
    skiprows=6,
    skipfooter=3,
)

# rename columns
dataframe_11 = dataframe_11.rename(
    columns={
        't': 'date',
        'Eigentumswohnungen': 'condos',
        'Einfamilienhäuser': 'houses'
    }
)

# format column date to datetime
dataframe_11['date'] = pd.to_datetime(
                            dataframe_11['date'],
                            dayfirst=True
                       ).dt.date

# reset index & sort by date
dataframe_11.reset_index(drop=True, inplace=True)
dataframe_11.sort_values(by='date', inplace=True)

dataframe_11.head()

	date	condos	houses
0	2010-12-31	5822.53	5458.33
1	2011-01-31	5852.88	5502.88
2	2011-02-28	5846.56	5509.60
3	2011-03-31	5964.04	5597.43
4	2011-04-30	5943.01	5594.06

1.1.4 Data merging

Code

dataframe_10 = pd.merge(
    dataframe_10,
    dataframe_11,
    on='date'
)

dataframe_10.head()

	date	rental_index	condos	houses
0	2014-12-31	260.82	6898.10	5990.61
1	2015-01-31	259.36	6916.64	5964.14
2	2015-02-28	258.83	6894.75	5974.35
3	2015-03-31	258.10	6879.93	5988.68
4	2015-04-30	261.57	6901.96	5987.68

1.2 Dataset R02: Distribution of population by tenure status

1.2.1 Details Dataset

Collectivly, this presents the distribution of population by tenure status (tenure, owner) in selected european countries.

Sheet: Summary: This sheet gives additional informationa bout the source and the contents of the dataset
Sheet: Structure: The summary sheet gives additionaly information about the structure of the data.
Sheet: Sheet 1: This sheet shows the distribution of population by tenure status, type of household and income group with the tenure type owner
- Description of rows:
  - respective country index for the body of the dataset
- Description of columns:
  - 2003 - 2022: year of the data
Sheet: Sheet 2: This sheet shows the distribution of population by tenure status, type of household and income group with the tenure type tenure

Note: The relevant sheet for this task is Sheet 1

1.2.2 Data Quality

Although there are some NaN entries in the dataset, they wont be replaced or looked at differently. Instead, the visualisation will only cover reference a specific column where there are the least NaN values. Aditionally there are some countries whose names are abnormal. These will be renamed to match the rest of the data.

Code

rDataSet_20 = Path(
    raw_data, 
    '02_Distribution of population by tenure status',
    'ilc_lvho02__custom_6240818_spreadsheet.xlsx'
)

1.2.3 Data Extraction & Cleansing

Code

dataframe_20 = pd.read_excel(
    rDataSet_20, 
    sheet_name='Sheet 1',
    skiprows=11,
    skipfooter=3,
)

# keep only relevant columns
dataframe_20 = dataframe_20.iloc[:, [0, 16]]

# rename columns
dataframe_20 = dataframe_20.rename(
    columns={
        'GEO (Labels)': 'country',
        'Unnamed: 16': '2018'
    }
)

# rename row with Germany
dataframe_20 = dataframe_20.replace(
    'Germany (until 1990 former territory of the FRG)',
    'Germany'
)

# reset index
dataframe_20.reset_index(drop=True, inplace=True)


dataframe_20.head()

	country	2018
0	Belgium	72.3
1	Bulgaria	83.6
2	Czechia	78.7
3	Denmark	60.5
4	Germany	51.5

1.3 Dataset R03: Average price per square meter in european countries

1.3.1 Details Dataset

This dataset contains the monthly averaged price per square meter for owned real estate in selected european countries.

Sheet: Overview: This sheet containes the additionaly information about the data and its source
Sheet: Data: This sheet contains the monthly averaged price offers for condominiumns and houses in switzerland
- Description of Rows:
  - Country: respective country
- Description of Columns:
  - New Housing: average sales price for new built houses
  - Existing Housing: average sales price for existing houses

1.3.2 Data Quality

The data for this dataset was originally published by the company Deloitte in August 2022 in a pdf File. Afterwards the relevant data has been copied into a workable Excel file and uploaded to Statista. In the following steps an extraction of the sheet ‘data’ will be made. Since only one price is needed in the end, the average of the price for existing houses and new houses gets calculated. This also removes the problematic of the NaN Values because in that case only the existing value is kept.

Code

rDataSet_30 = Path(
    raw_data, 
    '03_Average price per square meter in european countries',
    'statistic_id722905_average-residential-real-estate-square-meter-prices-in-europe-2021-by-country.xlsx'
)

1.3.3 Data Extraction & Cleansing

Code

dataframe_30 = pd.read_excel(
    rDataSet_30, 
    sheet_name='Data',
    skiprows=4,
)

# drop unused columns
dataframe_30.drop(
    ['Unnamed: 0'],
    axis=1,
    inplace=True
)

# rename columns
dataframe_30 = dataframe_30.rename(
    columns={
        'Unnamed: 1': 'country',
        'New housing': 'new_housing',
        'Existing housing': 'existing_housing'
    }
)

# calculate average price
dataframe_30['avg_price'] = np.nanmean(
                                dataframe_30[['new_housing', 'existing_housing']],
                                axis=1
                            )

# drop unused columns
dataframe_30.drop(
    ['new_housing', 'existing_housing'],
    axis=1,
    inplace=True
)

dataframe_30.head()

	country	avg_price
0	UK	4235.0
1	Austria	4205.0
2	France	4012.0
3	Netherlands	3599.5
4	Norway	4286.0

1.4 Dataset R04: General overview “Buildings” by cantons 2021

1.4.1 Details Dataset

This dataset contains detailed information on the types of habitable buildings in switzerland, categorized by swiss cantons. Each sheet in this dataset represents a canton of switzerland, as well as a sheet for the entire country.

Description of rows:
- Total: contains different types of housings and buildings
- Bauperiode: contains ranges of years when the houses were built
- Grossanzahl: number of storeys of the buildings
- Energiequelle der Heizung: energy source for heating
- Energiequelle für die Warmwasseraufbereitung: energy source for warm water treatment
- Eigentümertyp: type of ownership
Description of columns:
- Total: cumulative number of habitable buildings
- Bauperiode: building periods
- Mit ... Wohnungen: number of appartments within a building
Body of the dataset:
- The body of the dataset is filled with the number of buildings that satisfy both, the row and column conditions.

1.4.2 Data Quality

The files provided by the BFS (Bundesamt für Statistik, Federal Statistics Office) are already cleaned datasets. Therefore, handling of NaN values will mostly not be necessary. The current dataset exhibits some cells without values, but this is due to yearly numbers being present in the rows and the columns. This leads to values along the diagonal and NaN values in the triangles above and below the diagonal. The main objective of processing this dataset is to extract the necessary columns and rows for later analysis and visualizations.

Code

rDataSet_40 = Path(
    raw_data, 
    '04_Allgemeine Übersicht Gebäude nach Kantonen 2021',
    'je-d-09.02.00-2021.xlsx'
)

1.4.3 Data Extraction & Cleansing

Code

wb_40 = openpyxl.load_workbook(rDataSet_40, read_only=True)
sheets_40 = wb_40.sheetnames


def buildings_extraction(path: str, sheet: str, df_out: pd.DataFrame) -> None:
    """
    Extracts the total number of buildings and single family homes for
    a given sheet (canton)

    :param path: path to the excel file
    :param sheet: sheet name
    :param df_out: dataframe to store the extracted data
    :return: None
    """

    df_temp = pd.read_excel(io=path, sheet_name=sheet, skiprows=4)
    canton = sheet[-2:]
    total = int(df_temp.iloc[0, 1])
    single_house = int(df_temp.iloc[2, 1])

    df_out.loc[canton] = [total, single_house]

dataframe_40 = pd.DataFrame(
    columns=['buildings_total', 'single_family_home']
)

for sheet in sheets_40:
    buildings_extraction(
        path=rDataSet_40,
        sheet=sheet,
        df_out=dataframe_40
    )

dataframe_40 = dataframe_40.iloc[1:].copy()
dataframe_40.index.name = 'canton'

dataframe_40 = dataframe_40.reset_index()
dataframe_40 = dataframe_40.sort_values(by='canton')

dataframe_40.head()

	canton	buildings_total	single_family_home
18	AG	153894	102206
15	AI	5299	2917
14	AR	16323	9251
1	BE	238111	114053
12	BL	67390	46632

1.5 Dataset R05: Structure of the permanent resident population by canton, 1999-2021

1.5.1 Details Dataset

This dataset contains information about the population size in each canton of switzerland.

Description of rows:
- The rows represent the different cantons of Switzerland by regions
Description of columns:
- The columns categorize the population size by age, sex, nationality (Swiss or non-Swiss), civil status and the typology of the residential area
- Total: aggregates the data from all other columns
Body of the dataset:
- The body of the dataset contains the population count that satisfies both, the row and column conditions.

1.5.2 Data Quality

As previously mentioned, the datasets from the BFS are known for their high data quality, requiring primarily the extraction of relevant information.

Code

rDataSet_50 = Path(
    raw_data,
    '05_Struktur der ständigen Wohnbevölkerung nach Kanton, 1999-2021',
    'je-d-01.02.03.04.xlsx'
)

1.5.3 Data Extraction & Cleansing

Code

dataframe_50 = pd.read_excel(
    io=rDataSet_50,
    sheet_name='2021',
    skiprows=4,
)

# selecting and renaming relevant columns
dataframe_50 = dataframe_50[['Unnamed: 0', 'Unnamed: 1']].copy()
dataframe_50.rename(
    inplace=True,
    columns={
        'Unnamed: 0': 'canton',
        'Unnamed: 1': 'inhabitants'
    },
)

# select canton rows (present in dataframe_40)
dataframe_50 = dataframe_50[dataframe_50['canton'].isin(dataframe_40['canton'])]

dataframe_50.head()

	canton	inhabitants
2	VD	822968.0
3	VS	353209.0
4	GE	509448.0
6	BE	1047473.0
7	FR	329809.0

1.6 Dataset R06: Swiss Construction Price Index - Average unit prices in Switzerland and the major regions

1.6.1 Details Dataset

This dataset contains detailed information about pricing in the building industry across different regions of Switzerland. The columns represent the different regions, while the rows contain various segments of the building process, such as attic, framework, flooring, and more.

Description of rows:
- The rows contain different categories and specifications of construction activities.
Description of columns:
- The columns represent the unit and quantity of construction activities for the different regions of Switzerland.
Body of the Dataset:
- The body of the dataset is filled with the number of inhabitants that satisfy both the row and column conditions.

1.6.2 Data Quality

As mentioned previously, the datasets from BFS are very clean. Therefore, this part mainly involves extracting the relevant data. Since this dataset contains macros, a copy was created with the suffix ’_raw’ for convenience. Some rows consist of NaN (Not a Number) values, which is a result of the dataset structure. Additionally, some values are missing for the region of Ticino. This information is provided at the end of the Excel sheet as a comment: “Drei Punkte (…) bedeuten, dass der Wert nicht vorhanden, nicht genügend repräsentativ oder unter Datenschutz ist.” (EN: Three dots (…) mean that the value is not present, not sufficiently representative, or under data protection.)

Code

rDataSet_60 = Path(
    raw_data, 
    '06_Schweizerischer Baupreisindex - Durchschnittliche Einheitspreise in der Schweiz und in den Grossregionen',
    'cc-t-05.05.02.xlsx'
)

1.6.3 Data Extraction & Cleansing

Code

dataframe_60 = pd.read_excel(
    io=rDataSet_60,
    sheet_name='BAP_PCO',
    skiprows=9,
)

# drop unused columns
dataframe_60 = dataframe_60.drop(
    columns=[
        'Unnamed: 0',
        'Unnamed: 1',
        'Unnamed: 2',
        'Einheit',
        'Menge',
        'Unnamed: 4',
        'Unnamed: 15',
        'Unnamed: 16'
    ]
)

# rename columns
dataframe_60 = dataframe_60.rename(
    columns={
        'NPK Position': 'Category',
        'Schweiz': 'Switzerland',
        'Genferseeregion (VD,VS,GE)': 'Lake Geneva Region',
        'Espace Mittelland (BE, FR, SO, NE, JU)': 'Espace Midland',
        'Nordwestschweiz (BS, BL, AG)': 'Northwestern Switzerland',
        'Zürich (ZH)': 'Zurich',
        'Ostschweiz (GL, SH, AR, AI, SG, GR, TG)': 'Eastern Switzerland',
        'Zentralschweiz (LU, UR, SZ, OW, NW, ZG)': 'Central Switzerland',
        'Tessin (TI)': 'Ticino'
    }
)

def assignCategory(data: pd.DataFrame) -> pd.DataFrame:
    categories = []
    for index, row in data.iterrows():
        if type(row['Category']) != float:
            category = ' '.join(row['Category'].split(' ')[1:])
        categories.append(category)

    current_category = 'none'
    for i in range(len(categories)):
        if categories[i] != '':
            current_category = categories[i]
        categories[i] = current_category

    data['Category'] = categories
    return data

dataframe_61 = assignCategory(dataframe_60)
dataframe_61.head()

	Category	Switzerland	Lake Geneva Region	Espace Midland	Northwestern Switzerland	Zurich	Eastern Switzerland	Central Switzerland	Ticino
0	Erdarbeiten	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Erdarbeiten	6.5806	9.649058	5.1200	5.0588	5.8022	5.3222	6.5600	8.9343
2	Erdarbeiten	6.3088	8.066795	5.2969	5.5250	6.2000	5.7944	5.3686	8.2543
3	Erdarbeiten	52.3058	55.199414	47.0450	46.8138	52.0889	57.5089	58.3843	47.9086
4	Erdarbeiten	12.4308	17.165547	11.8777	10.9088	11.7300	10.3056	9.4986	11.9183

1.6.4 Handling `NaN` values

Code

dataframe_61 = dataframe_61.replace('...', np.NaN)
dataframe_61 = dataframe_61.dropna()

print(dataframe_61.isna().sum())
print(dataframe_61.dtypes)

Category                    0
Switzerland                 0
Lake Geneva Region          0
Espace Midland              0
Northwestern Switzerland    0
Zurich                      0
Eastern Switzerland         0
Central Switzerland         0
Ticino                      0
dtype: int64
Category                     object
Switzerland                 float64
Lake Geneva Region          float64
Espace Midland              float64
Northwestern Switzerland    float64
Zurich                      float64
Eastern Switzerland         float64
Central Switzerland         float64
Ticino                      float64
dtype: object

1.6.5 Normalization

Code

dataframe_62 = dataframe_61.groupby('Category').sum()

def normalize_rows(data: pd.DataFrame) -> pd.DataFrame:
    regions = list(data.columns)
    categories = data.index
    data2 = pd.DataFrame(columns = regions)
    
    for index, row in data.iterrows():
        row_list_floats = [float(x) for x in row]
        ch = float(row['Switzerland'])
        normalized_row = [x * 100 / ch - 100 for x in row_list_floats]
        data2.loc[len(data2.index)] = normalized_row
    data2['Category'] = categories
    data2 = data2.set_index('Category')
    return data2

dataframe_63 = normalize_rows(dataframe_62)
dataframe_63 = dataframe_63.drop(columns=['Switzerland'])

dataframe_63.head()

	Lake Geneva Region	Espace Midland	Northwestern Switzerland	Zurich	Eastern Switzerland	Central Switzerland	Ticino
Category
Allgemeine Schreinerarbeiten	9.782592	-5.054522	-5.332594	-1.278220	-3.929714	-2.317067	10.482749
Aufzüge	-1.834763	5.337382	-1.834763	-1.834763	-1.834763	1.768710	-0.301372
Baureinigung	12.377640	10.988312	-0.829673	-8.518666	-17.082390	-12.249670	6.612687
Bodenbeläge	6.990752	-4.967931	-3.518762	2.796378	-9.766969	1.671983	12.403766
Dichtungsbeläge	6.729232	-5.094807	1.559295	-6.350533	2.841165	-5.195145	10.569095

Sorting columns by the most positive values (above swiss average)

Code

dataframe_63_T = dataframe_63.T
dataframe_63_T['POS_COUNT'] = dataframe_63_T.select_dtypes(include='float64').gt(0).sum(axis=1)

dataframe_64 = dataframe_63_T.sort_values(by='POS_COUNT', ascending=False)
dataframe_64 = dataframe_64.drop(columns=['POS_COUNT'])

dataframe_64.head()

Category	Allgemeine Schreinerarbeiten	Aufzüge	Baureinigung	Bodenbeläge	Dichtungsbeläge	Erdarbeiten	Estriche	Fenster	Gerüste	Gipserarbeiten	Gärtnerarbeiten	Kücheneinrichtungen	Malerarbeiten	Maurer- und Stahlbetonarbeiten	Metallbauarbeiten	Plattenarbeiten	Spenglerarbeiten	Tiefbauarbeiten	Zimmerarbeiten
Lake Geneva Region	9.782592	-1.834763	12.377640	6.990752	6.729232	18.853844	9.125194	-2.855986	30.495326	-2.540170	17.343409	5.541763	26.377162	-0.587649	1.882927	-11.743498	34.513863	-1.217680	20.001063
Ticino	10.482749	-0.301372	6.612687	12.403766	10.569095	4.854349	14.092483	0.948539	-13.272724	-10.468717	17.892175	-11.455716	12.276155	-8.746794	-20.442963	-4.932700	-20.601170	-11.603154	1.526116
Espace Midland	-5.054522	5.337382	10.988312	-4.967931	-5.094807	-12.021344	1.678070	-1.247265	-5.394688	-9.314736	-3.830083	6.225370	-0.782745	9.283176	1.162569	6.901373	-3.566494	-3.394399	-0.242437
Central Switzerland	-2.317067	1.768710	-12.249670	1.671983	-5.195145	1.162903	-5.680765	-0.498641	-7.814889	1.363496	-18.516207	6.864781	-3.704139	-8.262848	8.587475	7.402684	-23.364883	-2.173099	-13.500970
Northwestern Switzerland	-5.332594	-1.834763	-0.829673	-3.518762	1.559295	-11.036378	-6.413027	3.134207	-16.413265	4.357343	-2.070326	-5.279719	-21.140344	2.573407	-4.320370	8.691597	-0.342758	12.987236	-13.228093

1.6.6 English translation mapping

Code

translation_dict = {
    'Allgemeine Schreinerarbeiten': 'General carpentry work',
    'Aufzüge': 'Elevators',
    'Baureinigung': 'Construction cleaning',
    'Bodenbeläge': 'Floor coverings',
    'Dichtungsbeläge': 'Sealing coatings',
    'Erdarbeiten': 'Earthworks',
    'Estriche ': 'Attic',
    'Fenster': 'Windows',
    'Gerüste': 'Scaffolding',
    'Gipserarbeiten': 'Plastering work',
    'Gärtnerarbeiten': 'Gardening work',
    'Kücheneinrichtungen': 'Kitchen fittings',
    'Malerarbeiten': 'Painting work',
    'Maurer- und Stahlbetonarbeiten': 'Masonry and reinforced concrete work',
    'Metallbauarbeiten': 'Metal construction work',
    'Plattenarbeiten': 'Tiling work',
    'Spenglerarbeiten': 'Sheet metal work',
    'Tiefbauarbeiten': 'Civil engineering works',
    'Zimmerarbeiten': 'Carpentry work'
}

dataframe_64 = dataframe_64.rename(columns=translation_dict)

dataframe_64['region'] = dataframe_64.index
dataframe_64.reset_index(drop=True, inplace=True)
dataframe_64.head()

Category	General carpentry work	Elevators	Construction cleaning	Floor coverings	Sealing coatings	Earthworks	Attic	Windows	Scaffolding	Plastering work	Gardening work	Kitchen fittings	Painting work	Masonry and reinforced concrete work	Metal construction work	Tiling work	Sheet metal work	Civil engineering works	Carpentry work	region
0	9.782592	-1.834763	12.377640	6.990752	6.729232	18.853844	9.125194	-2.855986	30.495326	-2.540170	17.343409	5.541763	26.377162	-0.587649	1.882927	-11.743498	34.513863	-1.217680	20.001063	Lake Geneva Region
1	10.482749	-0.301372	6.612687	12.403766	10.569095	4.854349	14.092483	0.948539	-13.272724	-10.468717	17.892175	-11.455716	12.276155	-8.746794	-20.442963	-4.932700	-20.601170	-11.603154	1.526116	Ticino
2	-5.054522	5.337382	10.988312	-4.967931	-5.094807	-12.021344	1.678070	-1.247265	-5.394688	-9.314736	-3.830083	6.225370	-0.782745	9.283176	1.162569	6.901373	-3.566494	-3.394399	-0.242437	Espace Midland
3	-2.317067	1.768710	-12.249670	1.671983	-5.195145	1.162903	-5.680765	-0.498641	-7.814889	1.363496	-18.516207	6.864781	-3.704139	-8.262848	8.587475	7.402684	-23.364883	-2.173099	-13.500970	Central Switzerland
4	-5.332594	-1.834763	-0.829673	-3.518762	1.559295	-11.036378	-6.413027	3.134207	-16.413265	4.357343	-2.070326	-5.279719	-21.140344	2.573407	-4.320370	8.691597	-0.342758	12.987236	-13.228093	Northwestern Switzerland

1.7 Dataset R7: Taxes & Public Finances

1.7.1 Details Dataset

The source of these comprehensive data is Statista, renowned for its quality and reliability. Researchers, analysts, and interested individuals can rely on this data to conduct informed studies and analyses.

The files provided by Statista are preprocessed datasets, typically requiring minimal treatment of missing values or NaNs. The current datasets are cleaned.

The primary objective of processing this dataset is to extract the necessary columns and rows for subsequent analysis and visualization.

File Schweiz - Einkommenssteuersätze nach Kantonen 2022_Statista.csv:
- Description of columns:
  - KANTON: represents the canton names of Switzerland
  - PROZENT: represents the income tax rates in percentage
File Schweiz - Gewinnsteuersätze nach Kantonen 2022 _ Statista.csv
- Description of columns:
  - KANTON: represents the canton names of Switzerland
  - PROZENT: represents the profit tax rates in percentage
File Schweiz - Vermögenssteuersätze nach Kantonen 2018 _ Statista.csv
- Description of columns:
  - KANTON: represents the canton names of Switzerland
  - PROZENT: represents the wealth tax rates in promille

Code

rDataSet_70 = Path(
    raw_data,
    '07_Steuern_&_Staatsfinanzen',
    'Schweiz - Einkommenssteuersätze nach Kantonen 2022 _ Statista.csv'
)
rDataSet_71 = Path(
    raw_data,
    '07_Steuern_&_Staatsfinanzen',
    'Schweiz - Gewinnsteuersätze nach Kantonen 2022 _ Statista.csv'
)
rDataSet_72 = Path(
    raw_data,
    '07_Steuern_&_Staatsfinanzen',
    'Schweiz - Vermögenssteuersätze nach Kantonen 2018 _ Statista.csv'
)

1.7.2 Data Extraction & Cleansing

Code

dataframe_70 = pd.read_csv(
    rDataSet_70,
    sep=';',
    encoding='latin1'
)

dataframe_71 = pd.read_csv(
    rDataSet_71,
    sep=';',
    encoding='latin1'
)

dataframe_72 = pd.read_csv(
    rDataSet_72,
    sep=';',
    encoding='latin1'
)

# join dataframes
dataframe_73 = pd.merge(
    dataframe_70,
    dataframe_71,
    on='KANTON'
)

dataframe_73 = pd.merge(
    dataframe_73,
    dataframe_72,
    on='KANTON'
)

# rename columns
dataframe_73 = dataframe_73.rename(
    columns={
        'KANTON': 'canton',
        'PROZENT_x': 'income_tax',
        'PROZENT_y': 'profit_tax',
        'Promille': 'wealth_tax'
    }
)

dataframe_73.head()

	canton	income_tax	profit_tax	wealth_tax
0	Genf	44.75	14.00	10.1
1	Basel-Landschaft	42.17	17.97	7.6
2	Waadt	41.50	14.00	7.9
3	Bern	41.04	21.04	5.8
4	Basel-Stadt	40.34	13.04	8.0

2 Processed Data

The processed data sets can all be found in the directory data/02_processed.

Code

# save R01 to csv
dataframe_10.to_csv(
    Path(prc_data, 'P01_price_indices.csv'),
    index=False
)

# save R02 to csv
dataframe_20.to_csv(
    Path(prc_data, 'P02_ratio_homeowners_eu.csv'),
    index=False
)

# save R03 to csv
dataframe_30.to_csv(
    Path(prc_data, 'P03_avg_price_smeter.csv'),
    index=False
)

# save R04 to csv
dataframe_40.to_csv(
    Path(prc_data, 'P04_buildings_by_canton.csv'),
    index=False
)

# save R05 to csv
dataframe_50.to_csv(
    Path(prc_data, 'P05_population_by_canton.csv'),
    index=False
)

# save R06 to csv
dataframe_64.to_csv(
    Path(prc_data, 'P06_construction_prices.csv'),
    index=False
)

# save R07 to csv
dataframe_73.to_csv(
    Path(prc_data, 'P07_taxes.csv'),
    index=False
)

Number	Dataset Name	Source
P01	P01_price_indices.csv	R01
P02	P02_ratio_homeowners_eu.csv	R02
P03	P03_avg_price_smeter.csv	R03
P04	P04_buildings_by_canton.csv	R04
P05	P05_population_by_canton.csv	R05
P06	P06_construction_prices.csv	R06
P07	P07_taxes.csv	R07

3 Exploratory Data Analysis

3.1 EDA P01: Price Indices

Code

dataframe_10 = pd.read_csv(
    Path(prc_data, 'P01_price_indices.csv')
)

display(dataframe_10.describe())
dataframe_10.info()

	rental_index	condos	houses
count	101.000000	101.000000	101.000000
mean	261.415050	7392.592772	6478.379406
std	3.221622	518.387040	495.529958
min	256.620000	6856.950000	5940.580000
25%	259.100000	7024.970000	6057.130000
50%	260.820000	7157.740000	6315.930000
75%	263.460000	7617.250000	6786.410000
max	271.970000	8589.250000	7479.460000

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   date          101 non-null    object 
 1   rental_index  101 non-null    float64
 2   condos        101 non-null    float64
 3   houses        101 non-null    float64
dtypes: float64(3), object(1)
memory usage: 3.3+ KB

3.2 EDA P02: Ratio Homeowners EU

Code

dataframe_20 = pd.read_csv(
    Path(prc_data, 'P02_ratio_homeowners_eu.csv')
)

display(dataframe_20.describe())
dataframe_20.info()

	2018
count	36.000000
mean	75.527778
std	12.510170
min	42.500000
25%	69.800000
50%	74.800000
75%	84.100000
max	96.400000

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   country  36 non-null     object 
 1   2018     36 non-null     float64
dtypes: float64(1), object(1)
memory usage: 704.0+ bytes

3.3 EDA P03: Average Price per Square Meter

Code

dataframe_30 = pd.read_csv(
    Path(prc_data, 'P03_avg_price_smeter.csv')
)

display(dataframe_30.describe())
dataframe_30.info()

	avg_price
count	18.000000
mean	2841.213889
std	1518.824498
min	1246.500000
25%	1687.625000
50%	2377.000000
75%	3908.875000
max	7126.350000

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   country    18 non-null     object 
 1   avg_price  18 non-null     float64
dtypes: float64(1), object(1)
memory usage: 416.0+ bytes

3.4 EDA P04: Buildings by Canton

Code

dataframe_40 = pd.read_csv(
    Path(prc_data, 'P04_buildings_by_canton.csv')
)

display(dataframe_40.describe())
dataframe_40.info()

	buildings_total	single_family_home
count	26.000000	26.000000
mean	68236.961538	38733.923077
std	64860.525980	35848.970768
min	5299.000000	2686.000000
25%	17153.000000	9472.250000
50%	55635.500000	29065.500000
75%	103385.750000	62761.500000
max	238111.000000	118612.000000

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   canton              26 non-null     object
 1   buildings_total     26 non-null     int64 
 2   single_family_home  26 non-null     int64 
dtypes: int64(2), object(1)
memory usage: 752.0+ bytes

3.5 EDA P05: Population by Canton

Code

dataframe_50 = pd.read_csv(
    Path(prc_data, 'P05_population_by_canton.csv')
)

display(dataframe_50.describe())
dataframe_50.info()

	inhabitants
count	2.600000e+01
mean	3.361073e+05
std	3.613990e+05
min	1.636000e+04
25%	7.634725e+04
50%	2.408105e+05
75%	4.035468e+05
max	1.564662e+06

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   canton       26 non-null     object 
 1   inhabitants  26 non-null     float64
dtypes: float64(1), object(1)
memory usage: 544.0+ bytes

3.6 EDA P06: Construction Prices

Code

dataframe_60 = pd.read_csv(
    Path(prc_data, 'P06_construction_prices.csv')
)

display(dataframe_60.describe())
dataframe_60.info()

	General carpentry work	Elevators	Construction cleaning	Floor coverings	Sealing coatings	Earthworks	Attic	Windows	Scaffolding	Plastering work	Gardening work	Kitchen fittings	Painting work	Masonry and reinforced concrete work	Metal construction work	Tiling work	Sheet metal work	Civil engineering works	Carpentry work
count	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000	7.000000
mean	0.336175	-0.076333	-1.243108	0.801317	0.722615	-0.434642	0.486336	0.331205	-3.709846	-0.392602	-0.171934	-1.510563	-0.836175	-1.656109	-1.725674	0.385366	-5.061521	-0.898923	-1.994070
std	6.846241	2.742609	11.704542	7.542107	6.546152	10.466442	8.952986	2.116255	15.607533	7.361430	13.288935	7.577881	16.526478	7.031241	9.314879	8.503175	19.327410	8.449973	11.387849
min	-5.332594	-1.834763	-17.082390	-9.766969	-6.350533	-12.021344	-11.037287	-2.855986	-16.413265	-10.468717	-18.516207	-11.455716	-21.140344	-8.746794	-20.442963	-11.743498	-23.364883	-11.603154	-13.500970
25%	-4.492118	-1.834763	-10.384168	-4.243347	-5.144976	-6.850601	-6.046896	-0.872953	-10.543806	-5.927453	-6.011251	-6.697025	-11.012066	-8.350783	-3.688942	-6.690155	-16.402130	-5.672421	-9.860529
50%	-2.317067	-1.834763	-0.829673	1.671983	1.559295	-2.191048	1.639683	0.208580	-7.063719	1.363496	-3.830083	-4.356089	-0.782745	-0.587649	1.162569	4.825717	-9.866113	-2.173099	-2.021207
75%	4.252186	0.733669	8.800500	4.893565	4.785198	3.008626	5.401632	1.788769	-5.949827	4.734577	7.636541	5.883567	5.858418	2.580034	2.995544	7.152028	-1.954626	2.920700	0.641840
max	10.482749	5.337382	12.377640	12.403766	10.569095	18.853844	14.092483	3.134207	30.495326	8.742762	17.892175	6.864781	26.377162	9.283176	8.587475	8.691597	34.513863	12.987236	20.001063

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 20 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   General carpentry work                7 non-null      float64
 1   Elevators                             7 non-null      float64
 2   Construction cleaning                 7 non-null      float64
 3   Floor coverings                       7 non-null      float64
 4   Sealing coatings                      7 non-null      float64
 5   Earthworks                            7 non-null      float64
 6   Attic                                 7 non-null      float64
 7   Windows                               7 non-null      float64
 8   Scaffolding                           7 non-null      float64
 9   Plastering work                       7 non-null      float64
 10  Gardening work                        7 non-null      float64
 11  Kitchen fittings                      7 non-null      float64
 12  Painting work                         7 non-null      float64
 13  Masonry and reinforced concrete work  7 non-null      float64
 14  Metal construction work               7 non-null      float64
 15  Tiling work                           7 non-null      float64
 16  Sheet metal work                      7 non-null      float64
 17  Civil engineering works               7 non-null      float64
 18  Carpentry work                        7 non-null      float64
 19  region                                7 non-null      object 
dtypes: float64(19), object(1)
memory usage: 1.2+ KB

3.7 EDA P07: Taxes

Code

dataframe_70 = pd.read_csv(
    Path(prc_data, 'P07_taxes.csv')
)

display(dataframe_70.describe())
dataframe_70.info()

	income_tax	profit_tax	wealth_tax
count	26.000000	26.000000	26.000000
mean	33.516154	14.683462	4.665385
std	6.507109	2.548612	2.353711
min	22.220000	11.850000	1.300000
25%	30.015000	12.815000	2.650000
50%	33.250000	13.935000	4.400000
75%	39.447500	15.822500	6.250000
max	44.750000	21.040000	10.100000

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   canton      26 non-null     object 
 1   income_tax  26 non-null     float64
 2   profit_tax  26 non-null     float64
 3   wealth_tax  26 non-null     float64
dtypes: float64(3), object(1)
memory usage: 960.0+ bytes

4 Preparation for Visualization

To use the data correctly in our visualizations, we exported the specific needed data to .js constants variables. These files can be found in the directory data/03_exports.

Chart Name	Used Dataframes	Exported File
S1	P02	s1Data.js
S2	P03	s2Data.js
S3	P01	s3Data.js
S4	P04, P05	s4Data.js
S5	P06	s5Data.js
S6	P07	s6Data.js

Export Function

Code

def generate_data_js(df: pd.DataFrame, js_file_name: str, variable_name: str, orient: str) -> None:
    """
    Generates a JS file with a variable name and the data from the dataframe
    -------------------------------------------------------------------------
    :param df: dataframe to export
    :param js_file_name: name of the JS file
    :param variable_name: name of the variable
    :param orient: orientation of the dataframe
    :return: None
    """
    json_data = df.to_json(orient=orient, force_ascii=False)
    with open(Path(exp_data, js_file_name), 'w', encoding='utf-8') as f:
        f.write(f"export const {variable_name} = {json_data};")

4.1 S1 Chart

Code

s1_country_filter = [
    'Switzerland',
    'Norway',
    'United Kingdom',
    'Austria',
    'France',
    'Netherlands',
    'Denmark',
    'Czechia',
    'Slovenia',
    'Belgium',
    'Spain',
    'Italy',
    'Croatia',
    'Poland',
    'Hungary',
    'Portugal',
    'Serbia',
    'Latvia',
]

s1_data = pd.read_csv(
    Path(prc_data, 'P02_ratio_homeowners_eu.csv')
)

# filter out countries
s1_data = s1_data[s1_data['country'].isin(s1_country_filter)]

# sort data descending
s1_data = s1_data.sort_values(by='2018', ascending=False)

# export data
generate_data_js(
    df=s1_data,
    js_file_name='s1Data.js',
    variable_name='s1Data',
    orient='records'
)

s1_data.head()

	country	2018
10	Croatia	90.1
16	Hungary	86.0
34	Serbia	84.4
20	Poland	84.0
13	Latvia	81.6

4.2 S2 Chart

Code

s2_data = pd.read_csv(
    Path(prc_data, 'P03_avg_price_smeter.csv')
)

# sort data descending
s2_data = s2_data.sort_values(by='avg_price', ascending=False)

# export data
generate_data_js(
    df=s2_data,
    js_file_name='s2Data.js',
    variable_name='s2Data',
    orient='records'
)

s2_data.head()

	country	avg_price
17	Switzerland	7126.35
4	Norway	4286.00
0	UK	4235.00
1	Austria	4205.00
2	France	4012.00

4.3 S3 Chart

Code

s3_data = pd.read_csv(
    Path(prc_data, 'P01_price_indices.csv')
)

# sort data descending
s3_data = s3_data.sort_values(by='date', ascending=True)

# filter out data before 2020-01-01
s3_data = s3_data[s3_data['date'] >= '2020-01-01']

# calculate relative differences
s3_data['prc_rental'] = (100/258.26)*s3_data['rental_index']
s3_data['prc_condos'] = (100/7157.74)*s3_data['condos']
s3_data['prc_houses'] = (100/6308.97)*s3_data['houses']

# export data
generate_data_js(
    df=s3_data,
    js_file_name='s3Data.js',
    variable_name='s3Data',
    orient='records'
)

s3_data.head()

	date	rental_index	condos	houses	prc_rental	prc_condos	prc_houses
61	2020-01-31	258.26	7157.74	6308.97	100.000000	100.000000	100.000000
62	2020-02-29	258.80	7245.38	6316.64	100.209092	101.224409	100.121573
63	2020-03-31	258.43	7366.09	6398.48	100.065825	102.910835	101.418774
64	2020-04-30	256.67	7383.67	6384.32	99.384341	103.156443	101.194331
65	2020-05-31	258.01	7342.29	6434.78	99.903198	102.578328	101.994145

4.4 S4 Chart

Code

region_mapping_s4 = {
    'VD': 'Lake Geneva Region',
    'VS': 'Lake Geneva Region',
    'GE': 'Lake Geneva Region',
    'BE': 'Espace Midland',
    'FR': 'Espace Midland',
    'SO': 'Espace Midland',
    'NE': 'Espace Midland',
    'JU': 'Espace Midland',
    'BS': 'Northwestern Switzerland',
    'BL': 'Northwestern Switzerland',
    'AG': 'Northwestern Switzerland',
    'ZH': 'Zurich',
    'GL': 'Eastern Switzerland',
    'SH': 'Eastern Switzerland',
    'AR': 'Eastern Switzerland',
    'AI': 'Eastern Switzerland',
    'SG': 'Eastern Switzerland',
    'GR': 'Eastern Switzerland',
    'TG': 'Eastern Switzerland',
    'LU': 'Central Switzerland',
    'UR': 'Central Switzerland',
    'SZ': 'Central Switzerland',
    'OW': 'Central Switzerland',
    'NW': 'Central Switzerland',
    'ZG': 'Central Switzerland',
    'TI': 'Ticino'
}

s4_data_1 = pd.read_csv(
    Path(prc_data, 'P04_buildings_by_canton.csv')
)

s4_data_2 = pd.read_csv(
    Path(prc_data, 'P05_population_by_canton.csv')
)

# merge dataframes
s4_data = pd.merge(
    s4_data_1,
    s4_data_2,
    on='canton'
)

s4_data['prc_single_family_home'] = s4_data['single_family_home'] / s4_data['buildings_total'] * 100
s4_data['region'] = s4_data['canton'].map(region_mapping_s4)

# export data
generate_data_js(
    df=s4_data,
    js_file_name='s4Data.js',
    variable_name='s4Data',
    orient='records'
)

s4_data.head()

	canton	buildings_total	single_family_home	inhabitants	prc_single_family_home	region
0	AG	153894	102206	703086.0	66.413245	Northwestern Switzerland
1	AI	5299	2917	16360.0	55.048122	Eastern Switzerland
2	AR	16323	9251	55585.0	56.674631	Eastern Switzerland
3	BE	238111	114053	1047473.0	47.899089	Espace Midland
4	BL	67390	46632	292817.0	69.197210	Northwestern Switzerland

4.5 S5 Chart

Code

s5_data = pd.read_csv(
    Path(prc_data, 'P06_construction_prices.csv')
)

# move last column to first position
cols = list(s5_data.columns)
cols = [cols[-1]] + cols[:-1]
s5_data = s5_data[cols]

# transform dataframe
regions = s5_data['region'].to_list()
s5_data = s5_data.T
s5_data.columns = regions

# reset & rename index
s5_data.reset_index(inplace=True)
s5_data.rename(columns={'index': 'work_category'}, inplace=True)

# drop first row
s5_data = s5_data.iloc[1:].copy()

# export data
generate_data_js(
    df=s5_data,
    js_file_name='s5Data.js',
    variable_name='s5Data',
    orient='records'
)

s5_data.head()

	work_category	Lake Geneva Region	Ticino	Espace Midland	Central Switzerland	Northwestern Switzerland	Zurich	Eastern Switzerland
1	General carpentry work	9.782592	10.482749	-5.054522	-2.317067	-5.332594	-1.27822	-3.929714
2	Elevators	-1.834763	-0.301372	5.337382	1.76871	-1.834763	-1.834763	-1.834763
3	Construction cleaning	12.37764	6.612687	10.988312	-12.24967	-0.829673	-8.518666	-17.08239
4	Floor coverings	6.990752	12.403766	-4.967931	1.671983	-3.518762	2.796378	-9.766969
5	Sealing coatings	6.729232	10.569095	-5.094807	-5.195145	1.559295	-6.350533	2.841165

4.6 S6 Chart

Code

region_mapping_s6 = {
    'Genf': 'Lake Geneva Region',
    'Waadt': 'Lake Geneva Region',
    'Wallis': 'Lake Geneva Region',
    'Bern': 'Espace Midland',
    'Freiburg': 'Espace Midland',
    'Solothurn': 'Espace Midland',
    'Neuenburg': 'Espace Midland',
    'Jura': 'Espace Midland',
    'Basel-Stadt': 'Northwestern Switzerland',
    'Basel-Landschaft': 'Northwestern Switzerland',
    'Aargau': 'Northwestern Switzerland',
    'Zürich': 'Zurich',
    'Glarus': 'Eastern Switzerland',
    'Schaffhausen': 'Eastern Switzerland',
    'Appenzell Ausserrhoden': 'Eastern Switzerland',
    'Appenzell Innerrhoden': 'Eastern Switzerland',
    'St. Gallen': 'Eastern Switzerland',
    'Graubünden': 'Eastern Switzerland',
    'Thurgau': 'Eastern Switzerland',
    'Luzern': 'Central Switzerland',
    'Uri': 'Central Switzerland',
    'Schwyz': 'Central Switzerland',
    'Obwalden': 'Central Switzerland',
    'Nidwalden': 'Central Switzerland',
    'Zug': 'Central Switzerland',
    'Tessin': 'Ticino'
}

s6_data = pd.read_csv(
    Path(prc_data, 'P07_taxes.csv',
    encoding='latin1')
)

# sort data descending
s6_data = s6_data.sort_values(by='canton', ascending=False)
s6_data['region'] = s6_data['canton'].map(region_mapping_s6)

# group data by region
s6_data = s6_data.groupby('region').mean().reset_index()

# export data
generate_data_js(
    df=s6_data,
    js_file_name='s6Data.js',
    variable_name='s6Data',
    orient='records'
)

s6_data.head()

	region	income_tax	profit_tax	wealth_tax
0	Central Switzerland	25.516667	12.575000	2.066667
1	Eastern Switzerland	30.308571	13.455714	3.614286
2	Espace Midland	37.384000	15.954000	5.280000
3	Lake Geneva Region	40.916667	15.040000	8.100000
4	Northwestern Switzerland	38.963333	16.143333	6.666667

1 Raw Data

1.1 Dataset R01: Swiss Real Estate Offer Index

1.1.1 Details Dataset

1.1.2 Data Quality

1.1.3 Data Extraction & Cleansing

1.1.4 Data merging

1.2 Dataset R02: Distribution of population by tenure status

1.2.1 Details Dataset

1.2.2 Data Quality

1.2.3 Data Extraction & Cleansing

1.3 Dataset R03: Average price per square meter in european countries

1.3.1 Details Dataset

1.3.2 Data Quality

1.3.3 Data Extraction & Cleansing

1.4 Dataset R04: General overview “Buildings” by cantons 2021

1.4.1 Details Dataset

1.4.2 Data Quality

1.4.3 Data Extraction & Cleansing

1.5 Dataset R05: Structure of the permanent resident population by canton, 1999-2021

1.5.1 Details Dataset

1.5.2 Data Quality

1.5.3 Data Extraction & Cleansing

1.6 Dataset R06: Swiss Construction Price Index - Average unit prices in Switzerland and the major regions

1.6.1 Details Dataset

1.6.2 Data Quality

1.6.3 Data Extraction & Cleansing

1.6.4 Handling NaN values

1.6.5 Normalization

1.6.6 English translation mapping

1.7 Dataset R7: Taxes & Public Finances

1.7.1 Details Dataset

1.7.2 Data Extraction & Cleansing

2 Processed Data

3 Exploratory Data Analysis

3.1 EDA P01: Price Indices

3.2 EDA P02: Ratio Homeowners EU

3.3 EDA P03: Average Price per Square Meter

3.4 EDA P04: Buildings by Canton

3.5 EDA P05: Population by Canton

3.6 EDA P06: Construction Prices

3.7 EDA P07: Taxes

4 Preparation for Visualization

4.1 S1 Chart

4.2 S2 Chart

4.3 S3 Chart

4.4 S4 Chart

4.5 S5 Chart

4.6 S6 Chart

1.6.4 Handling `NaN` values