A Simple Apartment Scraper, and Why Stupid Is Smart

When me and my family needed a new apartment almost a year ago, I wrote a scraper for finding apartments on kvalster.se. The venture was successful.

This is from when we moved in:

I just refactored this project.

At first, I did a 'smart' version, making a request every hour using setInterval. I planned to make the program a background process and mail the result to myself.

nohup node scrape-kvalster.js | mutt -s "new apartments" user@domain.org &

This would have been an improvement. The earlier version used the Mailgun API, however, I wanted the new version more aligned to how Unix applications should be done.

Nonetheless, I did not like this solution either. I wanted to make the application even dumber.

Mind this, you can still produce the same effect.

This version scrapes kvalster.se, and if new apartments are found, relevant to requirements stated in a config file, it prints them to standard out. In my final version, you can also show the database (storing all entries).

The main benefit of a dumb application is its limitations. From a Unix perspective, a dumb application ideally does one thing well and does nothing more.

The idea behind dumb applications is smart compositions. Unless combined, the 'logic' should be dumb.

Perhaps I want to scrape in intervals? crontab -e and add the specification for the job using this format:

* * * * * command to be executed
– – – – –
| | | | |
| | | | +—– day of week (0 – 6) (Sunday=0)
| | | +——- month (1 – 12)
| | +——— day of month (1 – 31)
| +———– hour (0 – 23)
+————- min (0 – 59)

Perhaps I want to print the last 3 posts of the database?

node show-db | tail -n 3

Or make a text file containing a sorted list of the saved entries?

node show-db | sort > sorted-entries.db

Smart compositions are made possible by acknowledging that text is a universal interface. All Unix applications acknowledging this principle add up to the library of the system.

A physical library is only as powerful as the power of its usages; its power derives from the users. Or rather, a library enhances the cognitive power of users as they get tools to solve tasks. And when they are done with the books needed (the best symbol for knowledge) they put the books back; this later expansion catching the character of being provisionary.

Unix tools handles complexity by avoiding it. Imagine the unnecessary complexity of a full program handling mail, sorting, filters, intervals, to just mention the examples I've provided here. For every new feature, the application becomes more complex, harder to make, and more prone to bugs.

This scraper would have no benefits from being contained in a large application. My main take away from the Unix philosophy as a junior frontend developer, is an ambition to only make a 'scaled' application when needed. Keep It Simple, Stupid.

// show-db.js
//-----------------------

#!/usr/bin/node

const db = require('./db.json');
console.log(db.apartments.join('\n'));


// scrape-kvalster.js
//-----------------------

#!/usr/bin/node

const fetch = require('node-fetch');
const { JSDOM } = require('jsdom');

const { getDb, writeDb } = require('./db.js');
const config = require('./config.js');

// Load db
const lazyDb = getDb();

// Make search string
const {  rooms, showNotSpecified, rent, entryAge, city } = config;
const query = `L%C3%A4genheter?Rum=${rooms}${showNotSpecified}&maxHyra=${rent}&maxListad=${entryAge}`;
const api = `https://kvalster.se/${city}/Uthyres`;

// Small helpers
const extractHref = (el) => el && el.href;
const isEqual = (str1) => (str2) => str1 === str2;
const isEntryInDb = (entry) => !lazyDb.apartments.some(isEqual(entry));

// Scrape kvalster.se with settings from config. Returns hits
async function request(url) {
  const response = await fetch(url);
  const html = await response.text();
  const dom = new JSDOM(html);
  const result = Array.from(dom.window.document.body.querySelectorAll('span a'));
  return [...new Set(result.map(extractHref))];
}

// Main: make request, if new entries, add to db and print
async function getApartments() {
  const result = await request(`${api}/${query}`);
  const newEntries = result.filter(isEntryInDb)
  if (newEntries.length > 0) {
    lazyDb.apartments.push(...newEntries);
    console.log(newEntries.join('\n'));
    writeDb(lazyDb);
  }
}
getApartments();


// db.js
//-----------------------

const { readFileSync, writeFileSync } = require('fs');

function getDb() {
  return JSON.parse(readFileSync('./db.json', 'UTF8'));
}

function writeDb(db) {
  writeFileSync('./db.json', JSON.stringify(db, null, 2));
}

module.exports = { getDb, writeDb };


// config.js
//-----------------------

// Kvalster query settings:
const rooms = 4;
// `showNotSpecified` : Empty string to exclude apartments without spec.
const showNotSpecified = '&OSpec=0';
const rent = 15000;
const entryAge = 65;
const city = 'Helsingborg';

module.exports = {
  rooms, showNotSpecified, rent, entryAge, city
}


// db.json
//-----------------------

{
  "apartments": []
}