Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
Pandas library is the de-facto standard tool for data scientists, nowadays. It is used widely by many data scientists around the globe. After being familiar with it I always use it for processing table-structured data whatever project I am working on. It works fast and reliable, supports CSV, Excel, JSON and so on.
However, as a person experienced in SQL, I had some difficulties and confusion with manipulating the tables (a.k.a. DataFrames) in the beginning. Eventually, I learned more APIs and ways of doing the things properly. I believe many people who do his/her first steps on Pandas may have the same experience. Some time ago I prepared the cheatsheet using SQL queries and their analogy in Pandas. And I am happy to share it with all of you.
In this post, I am sharing the queries using âSELECTâ. I will post other queries and their analogies to Pandas soon later. I would be happy to receive your feedback and wish-list regarding the cheatsheet.
Here we go.
We have three simple tables:
- users
- courses
- attendance
And the rest of the queries will be about data manipulations on them.
Lets start from basic SELECTÂ command.
SELECT * FROM users
SELECT * FROM users LIMIT 0,10
SELECT * FROM users WHERE email IS NULL
SELECT first_name, last_name FROM users
SELECT DISTINCT birth_year FROM users
Basic math and arithmetic operations.
SELECT AVG(points) FROM users;SELECT SUM(points) FROM users;
Conditional operations and LIKE.
SELECT * FROM users WHERE birth_year BETWEEN 1998 AND 2018
SELECT * FROM users WHERE first_name LIKE 'Ch%'
SELECT * FROM users WHERE first_name LIKE '%es'
SELECT * FROM users WHERE first_name LIKE '%on%'
SELECT first_name, last_name FROM users WHERE first_name LIKE '%on%'
I finalize this post with JOIN and ORDER BY related SQL queries. To make the code more simpler and easy to read I will assign the results into new variables and re-use them in next ones.
SELECT * FROM attendance atnLEFT JOIN users usr ON atn.user_id = usr.id
Now lets join the result above with course titles. The result it will be same as the result of following SQLÂ command.
SELECT * FROM attendance atnLEFT JOIN users usr ON atn.user_id = usr.idLEFT JOIN courses co ON co.id = atn.course_id
The query above will return the full table/dataframe with all columns included. To select necessary columns we can use the same method as we did before.
SELECT * FROM users ORDER BY first_name, last_name
SELECT * FROM users ORDER BY first_name, last_name DESC
SELECT first_name, last_name, birth_year,points, course_name, instructor FROM attendance atnLEFT JOIN users usr ON atn.user_id = usr.idLEFT JOIN courses co ON co.id = atn.course_idORDER BY first_name, last_name
Pandas cheatsheet for SQL people (part 1) was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.