Everything You Need for SQL Basics and SQL Exam Quick Review
What is SQL?
SQL (Structured Query Language) is a language for queries that allows you to access and manipulate databases. A query is a command that tells the database what to do with the data it has, and how you want to access it. In this post, we will learn some basic SQL commands as well as some intermediate ones so you have a basic understanding of what SQL can do. In the end, there will be a quick review of all the SQL commands covered here as an exam review. Enjoy!
Before we dive into SQL, here is the data we will be using:
regions
countries
locations
departments
employees (limiting to 10 rows)
(sample data was used from this link)
Basics SQL Commands
SELECT
WHERE
ORDER BY
LIMIT
GROUP BY/AGG FUNC
JOINS
SELECT
Used to select data from a database
Syntax
SELECT column1, column2, ...
FROM table_name
Example
SELECT location_id, street_address, postal_code
FROM locations
Output
If you want to select an entire table, use
SELECT * FROM table_name
WHERE
Used to filter rows based on a condition/conditions
Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition
Example
SELECT location_id, street_address, postal_code
FROM locations
WHERE location_id < 2000
Output
These are the operators you can use in your condition:
- = (Equal)
- > (Greater than)
- < (Less than)
- >= (Greater than or equal to)
- <= (Less than or equal to)
- != (Not equal)
- BETWEEN … AND … (between a range)
- LIKE (search for a pattern in a string)
- IN values (whether the value is inside values)
You can use multiple conditions by using AND or OR. For example,
WHERE location_id < 2000 AND postal_code > 50000
ORDER BY
Used to sort data in ascending/descending order
Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...#You can set it to descending order
ORDER BY column1 DESC
Example
SELECT location_id, street_address, postal_code
FROM locations
ORDER BY location_id DESC
Output
You can ORDER BY multiple columns at once. This means if you order by column1 and column2, you first sort by column1, and when two rows have the same value for column1, you sort by column2. For example:
SELECT location_id, department_id, department_name
FROM departments
ORDER BY location_id, department_id
Output
When location_id is equal to 1700, we go to department_id, and sort by department_id in ascending order
LIMIT
Used to limit the number of rows displayed. Sometimes we don’t want to load an entire dataset if it’s too large.
Syntax
SELECT *
FROM table_name
LIMIT number_of_rows
Example
SELECT *
FROM employees
LIMIT 10
Output
Only displaying 10 rows in employees
GROUP BY/AGG FUNC
Used to group rows with the same values. GROUP BY is used with aggregate functions, which are functions applied to each of the grouped values.
Syntax
SELECT column_name(s)
From table_name
GROUP BY column_name(s)
Example
SELECT department_id, COUNT(*)
FROM employees
GROUP BY department_id
Here, we are grouping by department_id in the employees dataset, and we want to see how many employees are in each department_id, so we can use the COUNT() aggregate function. Since one row represents one employee, and there are no duplicates, we can just pass * into COUNT(). (* represents an entire row). This is the output:
Example 2
SELECT department_id, AVG(salary) AS department_salary_average
FROM employees
GROUP BY department_id
ORDER BY department_salary_average DESC
Here, we are again grouping by the department_id, but instead of finding out the number of employees in each department_id, we want to find the average salary of employees in each department_id. We can use the AVG() aggregate function and pass in salary as its column to aggregate on. Here’s the output:
We are also sorting the dataset by the salary average, from the highest to the lowest salary average
Note: in the previous example, we cannot use AVG(*) in SELECT. This is because some of the columns in employees are not integers. For example, the first_name and last_name columns are strings. If we take all the first names and last names of each department_id, how do we take the average of these? Thus, doing so will result in an error like this:
Some basic aggregate functions
- COUNT() — returns the number of occurrences in the dataset
- MAX() -returns the maximum number in the aggregated values
- MIN() -returns the minimum number in the aggregated values
- SUM() -returns the sum of the aggregated values
- AVG() -returns the average of the aggregated values
JOINs
Used to combine tables, usually based on a common column they share.
Syntax
SELECT column_names(s)
FROM table1
JOIN table2
ON table1.column_name1 = table2.column_name1
Note that column_name1 for the two tables doesn’t necessarily need to share the same column name, but usually, it is best when they share common values.
Example
SELECT
employees.employee_id,
employees.first_name,
employees.last_name,
employees.department_id,
departments.department_name
FROM employees
JOIN departments
ON employees.department_id = departments.department_id
LIMIT 10
We are joining departments with employees on the column department_id. Note that when both tables have a department_id column, we need to specify which department_id we are calling by using table_name.column_id. For every row in employees, if there is a row in departments that has the same department_id as the department_id in this row of employees, we join the two rows. Here’s the output:
Output
In the first row of employees, Steven King is in department_id 9. In department, department_id 9 has a department_name of Executive. So we join Executive as the department_name for Steven King. The same goes for every other row.
Different Types of Joins:
- Join (Inner Join) — returns rows that have a match in both tables
- Left Join — returns all the rows from the left table and the matched rows from the right table. For the rows from the left table that did not match any of the rows in the right table, the corresponding values are left as null
- Right Join — returns all the rows from the right table and the matched rows from the left table. For the rows from the right table that did not match any of the rows in the left table, the corresponding values are left as null
- Full Join — returns all the rows from both tables. The rows that do not have a match for both rows are left as null. Also known as a cross-product.
Intermediate SQL Commands
CREATE TABLE
CREATE INDEX
Subquery
HAVING
EXISTS
PRIMARY/FOREIGN KEY
CREATE/DROP TABLE
Used to create a new table in a database.
Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
)
Example
CREATE TABLE regions (
region_id SERIAL PRIMARY KEY,
region_name CHARACTER VARYING (25)
)
Output
You’ll see that there are no values in the table. This is because CREATE TABLE only creates the structure for the table. To actually input values to the table, you’ll need the INSERT INTO command:
INSERT INTO table_name(column1, column2) VALUES (value1, value2)
Example
INSERT INTO regions(region_id,region_name) VALUES (1,'Europe');
INSERT INTO regions(region_id,region_name) VALUES (2,'Americas');
INSERT INTO regions(region_id,region_name) VALUES (3,'Asia');
INSERT INTO regions(region_id,region_name) VALUES (4,'Middle East and Africa');#display regions
SELECT *
FROM regions;
Output
CREATE INDEX
Used to create indexes in tables.
Syntax
CREATE INDEX index_name
ON table_name (column1, column2, ...)
Example
CREATE INDEX phone_number_idx
ON employees (phone_number)
We won’t get into too many details about the knowledge behind indexing, but generally, we add indexes to allow faster retrieval of data from databases, thus improving the overall efficiency and performance of our query. If you’re interested, here’s a website that explains how indexing improves performance.
HAVING
Similar to the WHERE clause, but can be used on aggregate functions.
Syntax
SELECT column_name(s)
FROM table_name
GROUP BY column_name(s)
HAVING condition
Example
SELECT department_id, AVG(salary) AS department_salary_average
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 8000
Here we are finding the average salary of each department_id that is higher than 8000. We need to compare the AVG(salary) in the having clause, and since it is an aggregate function, we cannot use the WHERE clause. Here’s the output:
Subquery
A query inside a query. Subqueries are usually within a WHERE clause. It’s useful when you need to compare a table to another table with specific conditions (where, group by, joins, etc.)
Syntax
SELECT column_names(s)
FROM table_name
WHERE condition (
SELECT column_name(s)
FROM table_name
WHERE/GROUP BY/JOIN
)
Example
SELECT *
FROM employees
WHERE salary > (
SELECT salary
FROM employees
WHERE first_name = 'Alexander' AND last_name = 'Hunold'
)
Here, we’re finding the employees that have a higher salary than Alexander Hunold. We need the subquery to get Alexander Hunold’s salary, and then place the result in the WHERE condition to compare it. Here’s the output:
EXISTS
Checks the existence of a row in a subquery.
Syntax
SELECT column_name(s)
FROM table_name
WHERE EXISTS (
SELECT column_name
FROM table_name
WHERE condition
)
Example
SELECT departments.department_name
FROM departments
WHERE EXISTS (
SELECT *
FROM employees
WHERE employees.department_id = departments.department_id AND employees.salary > 10000
)
Here, we’re trying to find the list of departments that have employees with salaries over 10000. The way this query works is: we compare each row in departments with every row in employees, and check if any row in employees matches the WHERE condition in the subquery. If that row exists, that row in departments will be returned.
In this example, we first compare the first row in departments with every row in employees, and we see that there is no employee with a department_id = 1 and has a salary > 10000. Therefore, Administration (the first department in departments) will not be returned. When we move on to the second row of departments, we see a match with Michael Hartstein, who is in department_id = 2 and has a salary > 10000. Since this row exists, Marketing (the second department in departments) will be returned, and vice versa. Here’s the output:
If we want to check the departments where no employees have a salary of over 10000, we can do the same thing as the previous example, but instead of EXIT, we do NOT EXIST
PRIMARY/FOREIGN KEY
You can set keys in a table as a primary key or a foreign key. They are useful when you want to establish a connection between two tables.
Primary key constraints:
- Must be unique
- Cannot contain null values
- A table can only have one primary key
A foreign key refers to another table’s primary key.
Syntax:
#Primary Key
CREATE TABLE table1 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
PRIMARY KEY (column1)
)#Foreign Key
CREATE TABLE table2 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
FOREIGN KEY (column1) REFERENCES table1(column1)
)
You can also drop a primary/foreign key. Here’s the syntax:
#Primary Key
ALTER TABLE table_name
DROP PRIMARY KEY#Foreign Key
ALTER TABLE table_name
DROP FOREIGN KEY foreign_key
Common Issue: Dangling Foreign Keys
Dangling foreign keys occur when a foreign key refers to a non-existent primary key. Therefore, you should always create the parent table (the table with the primary key) first, and then create the child table (the table with the foreign key).
But what if you want to update or delete an instance with a primary key? Without ON UPDATE CASCADE or ON DELETE CASCADE, the instance originally referring to the updated or deleted primary key will be a dangling foreign key. ON UPDATE CASCADE automatically updates the foreign key when the referenced primary key is updated, and ON DELETE CASCADE automatically deletes the foreign key when the referenced primary key is deleted. Here’s the syntax:
CREATE TABLE table2 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
FOREIGN KEY (column1) REFERENCES table1(column1)
ON UPDATE CASCADE/ON DELETE CASCADE
)
Quick Note: SQL Query Order
The order you should write your query is:
- SELECT
- FROM
- JOIN
- WHERE
- GROUP BY
- ORDER BY
- LIMIT
Quick Recap/Exam Review
SELECT
WHERE
ORDER BY
LIMIT
GROUP BY/AGG FUNC
JOINS
CREATE TABLE
CREATE INDEX
Subquery
HAVING
EXISTS
PRIMARY/FOREIGN KEY
SELECT
SELECT column1, column2, ...
FROM table_name
WHERE
SELECT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...#You can set it to descending order
ORDER BY column1 DESC
LIMIT
SELECT *
FROM table_name
LIMIT number_of_rows
GROUP BY
SELECT column_name(s)
From table_name
GROUP BY column_name(s)
- COUNT() — returns the number of occurrences in the dataset
- MAX() -returns the maximum number in the aggregated values
- MIN() -returns the minimum number in the aggregated values
- SUM() -returns the sum of the aggregated values
- AVG() -returns the average of the aggregated values
JOIN
SELECT column_names(s)
FROM table1
JOIN table2
ON table1.column_name1 = table2.column_name1
CREATE TABLE
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
)
CREATE INDEX
CREATE INDEX index_name
ON table_name (column1, column2, ...)
HAVING
SELECT column_name(s)
FROM table_name
GROUP BY column_name(s)
HAVING condition
Subquery
SELECT column_names(s)
FROM table_name
WHERE condition (
SELECT column_name(s)
FROM table_name
WHERE/GROUP BY/JOIN
)
EXISTS
SELECT column_name(s)
FROM table_name
WHERE EXISTS (
SELECT column_name
FROM table_name
WHERE condition
)
PRIMARY/FOREIGN KEY
#Primary Key
CREATE TABLE table1 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
PRIMARY KEY (column1)
)#Foreign Key
CREATE TABLE table2 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
FOREIGN KEY (column1) REFERENCES table1(column1)
)#DROP
#Primary Key
ALTER TABLE table_name
DROP PRIMARY KEY#Foreign Key
ALTER TABLE table_name
DROP FOREIGN KEY foreign_key
Written by Richard Wang, Data Peer Consultant