So I was inspired to create this blog to document my mission to become a data scientist. Initially this arose from a shocking discovery of how some fundamental data concepts had been lost to me over the years due to my over reliance on cool reporting tools. Its like forgetting how to do long division because you have a calculator on your phone.
During an interview I was asked what the difference between an Inner Join and Outer Join was and I was embarrassed to realize, how I couldn’t clearly articulate what I knew. I also forgot the name of the Levenshtien’s distance algorithm when talking about a machine learning project, but that is a topic for another day.
Lets call the company I interviewed with Mission Impossible. I didn’t have the email of the person I interviewed with and wanted to thank him for giving me the hour. I collected all the emails I had received from “MEI” in one form or the other to see if I could get a pattern for the company emails. I saved all the data in two tables. One for Contacts and the other for Emails.
Joins basically allow you to merge multiple tables into one result set. An inner join ignores no matches (doesn’t deal with null values) while outer joins work even with no match.
So in my example, an inner join would bring back only the the rows that match the primary key/foreign key but ignore records that don’t. An outer join however would return all the records and show null for the data that didn’t exist in the other table. For this example we will use a left outer join to get all the records from the Contacts table and all the records from the email table even those with null values, and for extra credit we will limit the result to only emails that have “missionimpossible” in them so we can see the format of the company emails.
So based on the results and the format of the emails my best guess would be to email email@example.com and if it goes to the wrong person…. this message will self destruct in 5 seconds….5..4…3..2…
Wish me luck!