Everything You Need for SQL Basics and SQL Exam Quick Review

DSUS Data Peer Consulting
11 min readOct 12, 2022

What is SQL?

SQL (Structured Query Language) is a language for queries that allows you to access and manipulate databases. A query is a command that tells the database what to do with the data it has, and how you want to access it. In this post, we will learn some basic SQL commands as well as some intermediate ones so you have a basic understanding of what SQL can do. In the end, there will be a quick review of all the SQL commands covered here as an exam review. Enjoy!

Before we dive into SQL, here is the data we will be using:

regions

countries

locations

departments

employees (limiting to 10 rows)

(sample data was used from this link)

Basics SQL Commands

SELECT
WHERE
ORDER BY
LIMIT
GROUP BY/AGG FUNC
JOINS

SELECT

Used to select data from a database

Syntax

SELECT column1, column2, ...
FROM table_name

Example

SELECT location_id, street_address, postal_code
FROM locations

Output

If you want to select an entire table, use SELECT * FROM table_name

WHERE

Used to filter rows based on a condition/conditions

Syntax

SELECT column1, column2, ...
FROM table_name
WHERE condition

Example

SELECT location_id, street_address, postal_code
FROM locations
WHERE location_id < 2000

Output

These are the operators you can use in your condition:

  • = (Equal)
  • > (Greater than)
  • < (Less than)
  • >= (Greater than or equal to)
  • <= (Less than or equal to)
  • != (Not equal)
  • BETWEEN … AND … (between a range)
  • LIKE (search for a pattern in a string)
  • IN values (whether the value is inside values)

You can use multiple conditions by using AND or OR. For example,

WHERE location_id < 2000 AND postal_code > 50000

ORDER BY

Used to sort data in ascending/descending order

Syntax

SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...#You can set it to descending order
ORDER BY column1 DESC

Example

SELECT location_id, street_address, postal_code
FROM locations
ORDER BY location_id DESC

Output

You can ORDER BY multiple columns at once. This means if you order by column1 and column2, you first sort by column1, and when two rows have the same value for column1, you sort by column2. For example:

SELECT location_id, department_id, department_name
FROM departments
ORDER BY location_id, department_id

Output

When location_id is equal to 1700, we go to department_id, and sort by department_id in ascending order

LIMIT

Used to limit the number of rows displayed. Sometimes we don’t want to load an entire dataset if it’s too large.

Syntax

SELECT *
FROM table_name
LIMIT number_of_rows

Example

SELECT *
FROM employees
LIMIT 10

Output

Only displaying 10 rows in employees

GROUP BY/AGG FUNC

Used to group rows with the same values. GROUP BY is used with aggregate functions, which are functions applied to each of the grouped values.

Syntax

SELECT column_name(s)
From table_name
GROUP BY column_name(s)

Example

SELECT department_id, COUNT(*)
FROM employees
GROUP BY department_id

Here, we are grouping by department_id in the employees dataset, and we want to see how many employees are in each department_id, so we can use the COUNT() aggregate function. Since one row represents one employee, and there are no duplicates, we can just pass * into COUNT(). (* represents an entire row). This is the output:

Example 2

SELECT department_id, AVG(salary) AS department_salary_average
FROM employees
GROUP BY department_id
ORDER BY department_salary_average DESC

Here, we are again grouping by the department_id, but instead of finding out the number of employees in each department_id, we want to find the average salary of employees in each department_id. We can use the AVG() aggregate function and pass in salary as its column to aggregate on. Here’s the output:

We are also sorting the dataset by the salary average, from the highest to the lowest salary average

Note: in the previous example, we cannot use AVG(*) in SELECT. This is because some of the columns in employees are not integers. For example, the first_name and last_name columns are strings. If we take all the first names and last names of each department_id, how do we take the average of these? Thus, doing so will result in an error like this:

Some basic aggregate functions

  • COUNT() — returns the number of occurrences in the dataset
  • MAX() -returns the maximum number in the aggregated values
  • MIN() -returns the minimum number in the aggregated values
  • SUM() -returns the sum of the aggregated values
  • AVG() -returns the average of the aggregated values

JOINs

Used to combine tables, usually based on a common column they share.

Syntax

SELECT column_names(s)
FROM table1
JOIN table2
ON table1.column_name1 = table2.column_name1

Note that column_name1 for the two tables doesn’t necessarily need to share the same column name, but usually, it is best when they share common values.

Example

SELECT 
employees.employee_id,
employees.first_name,
employees.last_name,
employees.department_id,
departments.department_name
FROM employees
JOIN departments
ON employees.department_id = departments.department_id
LIMIT 10

We are joining departments with employees on the column department_id. Note that when both tables have a department_id column, we need to specify which department_id we are calling by using table_name.column_id. For every row in employees, if there is a row in departments that has the same department_id as the department_id in this row of employees, we join the two rows. Here’s the output:

Output

In the first row of employees, Steven King is in department_id 9. In department, department_id 9 has a department_name of Executive. So we join Executive as the department_name for Steven King. The same goes for every other row.

Different Types of Joins:

  • Join (Inner Join) — returns rows that have a match in both tables
  • Left Join — returns all the rows from the left table and the matched rows from the right table. For the rows from the left table that did not match any of the rows in the right table, the corresponding values are left as null
  • Right Join — returns all the rows from the right table and the matched rows from the left table. For the rows from the right table that did not match any of the rows in the left table, the corresponding values are left as null
  • Full Join — returns all the rows from both tables. The rows that do not have a match for both rows are left as null. Also known as a cross-product.

Intermediate SQL Commands

CREATE TABLE
CREATE INDEX
Subquery
HAVING
EXISTS
PRIMARY/FOREIGN KEY

CREATE/DROP TABLE

Used to create a new table in a database.

Syntax

CREATE TABLE table_name (
column1 datatype,
column2 datatype,

...
)

Example

CREATE TABLE regions (
region_id SERIAL PRIMARY KEY,
region_name CHARACTER VARYING (25)
)

Output

You’ll see that there are no values in the table. This is because CREATE TABLE only creates the structure for the table. To actually input values to the table, you’ll need the INSERT INTO command:

INSERT INTO table_name(column1, column2) VALUES (value1, value2)

Example

INSERT INTO regions(region_id,region_name) VALUES (1,'Europe');
INSERT INTO regions(region_id,region_name) VALUES (2,'Americas');
INSERT INTO regions(region_id,region_name) VALUES (3,'Asia');
INSERT INTO regions(region_id,region_name) VALUES (4,'Middle East and Africa');#display regions
SELECT *
FROM regions;

Output

CREATE INDEX

Used to create indexes in tables.

Syntax

CREATE INDEX index_name
ON table_name (column1, column2, ...)

Example

CREATE INDEX phone_number_idx
ON employees (phone_number)

We won’t get into too many details about the knowledge behind indexing, but generally, we add indexes to allow faster retrieval of data from databases, thus improving the overall efficiency and performance of our query. If you’re interested, here’s a website that explains how indexing improves performance.

HAVING

Similar to the WHERE clause, but can be used on aggregate functions.

Syntax

SELECT column_name(s)
FROM table_name
GROUP BY column_name(s)
HAVING condition

Example

SELECT department_id, AVG(salary) AS department_salary_average
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 8000

Here we are finding the average salary of each department_id that is higher than 8000. We need to compare the AVG(salary) in the having clause, and since it is an aggregate function, we cannot use the WHERE clause. Here’s the output:

Subquery

A query inside a query. Subqueries are usually within a WHERE clause. It’s useful when you need to compare a table to another table with specific conditions (where, group by, joins, etc.)

Syntax

SELECT column_names(s)
FROM table_name
WHERE condition (
SELECT column_name(s)
FROM table_name
WHERE/GROUP BY/JOIN
)

Example

SELECT *
FROM employees
WHERE salary > (
SELECT salary
FROM employees
WHERE first_name = 'Alexander' AND last_name = 'Hunold'
)

Here, we’re finding the employees that have a higher salary than Alexander Hunold. We need the subquery to get Alexander Hunold’s salary, and then place the result in the WHERE condition to compare it. Here’s the output:

EXISTS

Checks the existence of a row in a subquery.

Syntax

SELECT column_name(s)
FROM table_name
WHERE EXISTS (
SELECT column_name
FROM table_name
WHERE condition
)

Example

SELECT departments.department_name
FROM departments
WHERE EXISTS (
SELECT *
FROM employees
WHERE employees.department_id = departments.department_id AND employees.salary > 10000
)

Here, we’re trying to find the list of departments that have employees with salaries over 10000. The way this query works is: we compare each row in departments with every row in employees, and check if any row in employees matches the WHERE condition in the subquery. If that row exists, that row in departments will be returned.

In this example, we first compare the first row in departments with every row in employees, and we see that there is no employee with a department_id = 1 and has a salary > 10000. Therefore, Administration (the first department in departments) will not be returned. When we move on to the second row of departments, we see a match with Michael Hartstein, who is in department_id = 2 and has a salary > 10000. Since this row exists, Marketing (the second department in departments) will be returned, and vice versa. Here’s the output:

If we want to check the departments where no employees have a salary of over 10000, we can do the same thing as the previous example, but instead of EXIT, we do NOT EXIST

PRIMARY/FOREIGN KEY

You can set keys in a table as a primary key or a foreign key. They are useful when you want to establish a connection between two tables.

Primary key constraints:

  1. Must be unique
  2. Cannot contain null values
  3. A table can only have one primary key

A foreign key refers to another table’s primary key.

Syntax:

#Primary Key
CREATE TABLE table1 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
PRIMARY KEY (column1)
)#Foreign Key
CREATE TABLE table2 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
FOREIGN KEY (column1) REFERENCES table1(column1)
)

You can also drop a primary/foreign key. Here’s the syntax:

#Primary Key
ALTER TABLE table_name
DROP PRIMARY KEY#Foreign Key
ALTER TABLE table_name
DROP FOREIGN KEY foreign_key

Common Issue: Dangling Foreign Keys
Dangling foreign keys occur when a foreign key refers to a non-existent primary key. Therefore, you should always create the parent table (the table with the primary key) first, and then create the child table (the table with the foreign key).

But what if you want to update or delete an instance with a primary key? Without ON UPDATE CASCADE or ON DELETE CASCADE, the instance originally referring to the updated or deleted primary key will be a dangling foreign key. ON UPDATE CASCADE automatically updates the foreign key when the referenced primary key is updated, and ON DELETE CASCADE automatically deletes the foreign key when the referenced primary key is deleted. Here’s the syntax:

CREATE TABLE table2 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
FOREIGN KEY (column1) REFERENCES table1(column1)
ON UPDATE CASCADE/ON DELETE CASCADE
)

Quick Note: SQL Query Order

The order you should write your query is:

  1. SELECT
  2. FROM
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. ORDER BY
  7. LIMIT

Quick Recap/Exam Review

SELECT
WHERE
ORDER BY
LIMIT
GROUP BY/AGG FUNC
JOINS
CREATE TABLE
CREATE INDEX
Subquery
HAVING
EXISTS
PRIMARY/FOREIGN KEY

SELECT

SELECT column1, column2, ...
FROM table_name

WHERE

SELECT column1, column2, ...
FROM table_name
WHERE condition

ORDER BY

SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...#You can set it to descending order
ORDER BY column1 DESC

LIMIT

SELECT *
FROM table_name
LIMIT number_of_rows

GROUP BY

SELECT column_name(s)
From table_name
GROUP BY column_name(s)
  • COUNT() — returns the number of occurrences in the dataset
  • MAX() -returns the maximum number in the aggregated values
  • MIN() -returns the minimum number in the aggregated values
  • SUM() -returns the sum of the aggregated values
  • AVG() -returns the average of the aggregated values

JOIN

SELECT column_names(s)
FROM table1
JOIN table2
ON table1.column_name1 = table2.column_name1

CREATE TABLE

CREATE TABLE table_name (
column1 datatype,
column2 datatype,

...
)

CREATE INDEX

CREATE INDEX index_name
ON table_name (column1, column2, ...)

HAVING

SELECT column_name(s)
FROM table_name
GROUP BY column_name(s)
HAVING condition

Subquery

SELECT column_names(s)
FROM table_name
WHERE condition (
SELECT column_name(s)
FROM table_name
WHERE/GROUP BY/JOIN
)

EXISTS

SELECT column_name(s)
FROM table_name
WHERE EXISTS (
SELECT column_name
FROM table_name
WHERE condition
)

PRIMARY/FOREIGN KEY

#Primary Key
CREATE TABLE table1 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
PRIMARY KEY (column1)
)#Foreign Key
CREATE TABLE table2 (
column1 column1_type,
column2 column2_type,
column3 column3_type,
FOREIGN KEY (column1) REFERENCES table1(column1)
)#DROP
#Primary Key
ALTER TABLE table_name
DROP PRIMARY KEY#Foreign Key
ALTER TABLE table_name
DROP FOREIGN KEY foreign_key

Written by Richard Wang, Data Peer Consultant

--

--