# Altro case: Fjordbank Welcome. You have 2–4 hours to build a small data platform for a fictional Norwegian retail bank called **Fjordbank**. We use [havn](https://github.com/chraltro/havn) — a self-hosted data platform built on DuckDB. It's designed to be friendly; if you're used to dbt or Databricks, you'll be productive in minutes. There is no single right answer. We're interested in how you think, the trade-offs you make, and how you communicate them. Ship what you can in the time you have — partial-but-thoughtful beats complete-but-shallow. --- ## The scenario Fjordbank's data team has asked Altro for help building a reporting layer. Today, their data sits raw in a Postgres read-replica. Their analysts write direct SQL against it and keep running into the same problems: wrong totals, stale dimensions, timezone confusion, duplicate rows. They want a clean **bronze → silver → gold** warehouse with documented models and a few key marts. You are the data engineer they hired for two weeks. This is your first three hours. ## What we give you A Postgres endpoint with five tables in the `raw` schema: | Table | Rows | What it is | |---|---|---| | `raw.customers` | ~50k | Customer master. Includes Norwegian fødselsnummer, contact info, segment. | | `raw.accounts` | ~80k | Accounts per customer. Includes account type, currency, branch. | | `raw.transactions` | ~6M | The fact. 3 years of card payments, transfers, salaries, fees, interest, etc. | | `raw.fx_rates` | ~3k | Daily FX rates to NOK. | | `raw.branches` | ~150 | Branch master. | Credentials and the single-command ingest are in the "Getting started" section below. ## What we want you to build ### 1. Ingest (provided, but you may improve) We ship an ingest script that copies `raw.*` into `landing.*` in your local `warehouse.duckdb`. It works. If you see a way to make it better — incremental loads, schema validation, anything — feel free. But this is not where we want you to spend most of your time. ### 2. Bronze (required) Typed, deduplicated, lightly cleaned versions of each source table. Rules: - Dates and timestamps must be proper types. - Amounts must be `DECIMAL`, not float. - Remove exact duplicates; document what you did with near-duplicates. - Do not lose data — if you drop rows, log why. ### 3. Silver (required) Business-meaningful models. At minimum: - **`silver.dim_customer`** — conformed customer dimension. Handle PII sensibly. havn supports Python macros that auto-register as DuckDB UDFs — a simple masking function in `macros/pii.py` is a nice way to demonstrate this. Example: ```python # macros/pii.py from havn import macro @macro def mask_email(email: str) -> str: if "@" not in email: return "" local, domain = email.split("@", 1) return f"{local[0]}***@{domain}" ``` Callable directly in SQL: `SELECT mask_email(email) FROM bronze.customers`. - **`silver.dim_account`** — conformed account dimension. Think about how you'd represent account attributes that change over time. - **`silver.fct_transaction`** — transaction fact, with FX applied, reversals handled, and a stable grain you can defend. ### 4. Gold (required — pick at least two) - **`gold.agg_customer_monthly`** — customer × month: income, spend, net flow, top merchant category. - **`gold.agg_daily_balance`** — running balance per account per day. - **`gold.suspicious_activity`** — transactions that look anomalous for a given customer. Definition is yours to justify. - **`gold.branch_performance`** — revenue, active accounts, and customer count per branch per month. ### 5. Documentation (required) A `README.md` in the project root explaining: - Your grain and SCD decisions - What you found wrong with the source data and how you handled it - What you'd do differently with more time ### 6. Stretch (optional, signals seniority) - A Python macro that adds domain value (masking, IBAN validation, fnr validation, etc.) - An export script that pushes one of your gold tables somewhere useful (a CSV drop, a webhook, back into Postgres) - Tests for at least one model - A short reflection on what you'd monitor in production --- ## Getting started ```bash # 1. Install havn pip install havn # 2. Scaffold a project havn init fjordbank cd fjordbank # 3. Drop in the ingest script (provided below) # It pulls from our Postgres and loads landing.* cp /ingest/fjordbank.py ingest/ # 4. Set the connection string we sent you echo "FJORDBANK_DB_URL=postgresql://..." > .env # 5. Pull the data (~2-3 minutes) havn run ingest/fjordbank.py # 6. Confirm havn tables havn query "SELECT COUNT(*) FROM landing.transactions" # 7. Start the web UI if you want havn serve # open http://localhost:3000 ``` From here: write SQL in `transform/bronze/`, `transform/silver/`, `transform/gold/`. Run `havn transform` to build the DAG. Run `havn query` for ad-hoc SQL. See `havn --help` for the rest. --- ## How you'll be evaluated We look at four dimensions, weighted roughly equally: 1. **Modeling judgment.** Did you pick a grain you can defend? Did you conform dimensions? Did you represent change over time where it matters? 2. **Data quality instinct.** Did you notice the things that are wrong with the source data? How you decided to handle them matters more than whether you "caught them all." 3. **Communication.** Is your README useful to a colleague? Are your SQL files readable? Can someone else pick up your work on Monday? 4. **Engineering craft.** Is it idempotent? Is it efficient? Would you be comfortable putting this in front of a client? We do not evaluate based on: - Number of models (more is not better) - Exotic features of havn (we'd rather you use it simply and well) - Whether you "finished" — we've seen great three-hour solutions with two gold tables, and weak ones with six ## Tips - **Profile the data first.** Run `havn query` on each source table. See what's in there. `COUNT`, `MIN`, `MAX`, `COUNT DISTINCT`, look at nulls. Twenty minutes of profiling saves two hours of rework. - **Write down what you find.** Keep a `NOTES.md` as you go. Paste it into your README at the end. - **Explain your grain in a comment at the top of each fact.** "One row per settled transaction, reversals netted, FX in NOK." This is the kind of thing we expect from Altro engineers. - **If you get stuck, ship what works and document the blocker.** We'd rather see honest scope cuts than silent failures. - **Git commits are nice-to-have.** A clean history tells us how you think. ## Delivery Zip the project directory (excluding `warehouse.duckdb` — it's big and we'll rebuild from your SQL) and send it back to us with your `README.md`. We'll schedule a 45-minute review call where you'll walk us through your decisions. Good luck. See you on the other side. — Altro Engineering