Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
Data Mastery: SQL — Aggregation
There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, see my article on Python’s Pandas library).
This series Data Mastery: SQL, will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for:
- Software Engineers who want to analyze their creation’s data
- Product Managers, Marketers, and others who want to be data-driven
- Beginning Data Scientists, Data Engineers, Data Analysts, or Business Intelligence Developers preparing for interviews
See my resource list of the books I used to prepare for my big interview
Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning.
This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract.
Aggregation
Often in our jobs we need to provide summary statistics of data. The SQL aggregation functions you need to know are:
- COUNT, SUM, AVG, MIN, MAX
COUNT is a special one because it can be used in conjunction with DISTINCT to find the number of unique values. Otherwise it counts the presence of non NULL values. All aggregation functions will go after the word SELECT in the comma separated list.
Revisiting our daily_user_score table:
date | userid | sessionid | score------------------------------------------2018–09–01 | 983489272 | 125 | 1122018–09–01 | 234342423 | 34 | 1122018–09–01 | 567584329 | 207 | 6182018–09–02 | 983489272 | 126 | 4102018–09–02 | 983489272 | 127 | 339
Say we want to find the number of users who played our game and the average, minimum, and maximum score. Here is our query:
SELECT COUNT(DISTINCT userid) AS number_of_users, AVG(score) AS average_score, MIN(score) AS minimum_score, MAX(score) AS maximum_scoreFROM daily_user_score;
This query returns:
number_of_users | average_score | minimum_score | maximum_score---------------------------------------------------------------3 | 318.2 | 112 | 618
Notice that the keyword AS allows you to pick a name for the column which is called an “alias”. It is good practice to use this when you use aggregation functions in the SELECT statement.
Try it yourself
Write a SQL query that sums up all points earned in all games ever played.
Answer
SELECT sum(score) AS total_pointsFROM daily_user_score;
This query returns:
total_points------------1591
Thanks for reading! If you have questions feel free to comment & I will try to get back to you.
Connect with me on Instagram @lauren__glass
Connect with me on LinkedIn
Check out my essentials list on Amazon
Search for me using my nametag on Instagram!
Data Mastery: SQL — Aggregation was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.