# Data Processing with Bash

7 mins

Inspired by this post, I built a pipeline to extract facilities from ~700K hotels, combine all repeated facilities and rank by number of occurences, all in Bash.

TL;DR commands:

echo "supplier_id,supplier_value,mapping_type" > header.csv

cat hotel-facility-dump | \                 #read from file
rg "facility:" | \                          #find relevant logs
awk -F'facility: ' '{print $2}' | \ #extract content in the form of 'facility name,facility code' sort | \ #sort for the next step uniq -c | \ #get all unique entries with count sort --numeric-sort --reverse | \ #get entries with count in descending order head -n 500 | \ #take top 500 entries awk -F ' "' '{print 12345",""\42"$2}' > \   #remove count and put back double quote
hotel-processed                             #write to output file



# Everything I Know About Development in 2019

13 mins

I’m starting this series so that I can track how far I’ve gone in my career as an engineer.

In true Dan Abramov style I’ll be listing what I don’t know as well, so that I can concentrate on learning those things in the future.

# SBT Tricks

2 mins

I was recently upgrading a library at work from using Scala 2.11 to 2.12. Here are some sbt tricks that I picked up while trying to perform the migration.

# Scala, Six Months In

5 mins

After switching jobs, I was introduced to Scala at my new workplace. I’ve compiled a list of things that I have learned until now. I’ve kept the comparisons to the languages themselves, instead of functionality that can be provided by libraries.

# The in keyword of Python explained

6 mins

When learning Python, one of the first things we do with lists, dictionaries and other iterators1 is something like the following:

(All examples are in Python 3.6.6)

for i in mylist:
do_something(i)


This loops through each element in mylist and calls the do_something function on it.