Data Processing with Bash

7 mins

Inspired by this post, I built a pipeline to extract facilities from ~700K hotels, combine all repeated facilities and rank by number of occurences, all in Bash.

TL;DR commands:

echo "supplier_id,supplier_value,mapping_type" > header.csv

cat hotel-facility-dump | \                 #read from file
rg "facility:" | \                          #find relevant logs
awk -F'facility: ' '{print $2}' | \         #extract content in the form of 'facility name,facility code'
sort | \                                    #sort for the next step
uniq -c | \                                 #get all unique entries with count
sort --numeric-sort --reverse | \           #get entries with count in descending order
head -n 500 | \                             #take top 500 entries
awk -F ' "' '{print 12345",""\42"$2}' > \   #remove count and put back double quote
hotel-processed                             #write to output file

cat header.txt hotel-processed > hotel-facilites.csv

Everything I Know About Development in 2019

13 mins

I’m starting this series so that I can track how far I’ve gone in my career as an engineer.

In true Dan Abramov style I’ll be listing what I don’t know as well, so that I can concentrate on learning those things in the future.

SBT Tricks

2 mins

I was recently upgrading a library at work from using Scala 2.11 to 2.12. Here are some sbt tricks that I picked up while trying to perform the migration.

Scala, Six Months In

5 mins

After switching jobs, I was introduced to Scala at my new workplace. I’ve compiled a list of things that I have learned until now. I’ve kept the comparisons to the languages themselves, instead of functionality that can be provided by libraries.

The in keyword of Python explained

6 mins

When learning Python, one of the first things we do with lists, dictionaries and other iterators1 is something like the following:

(All examples are in Python 3.6.6)

for i in mylist:

This loops through each element in mylist and calls the do_something function on it.

