I recently started developing an Android application and I needed to create more complicated SQL. The database for Android app is SQLite and I use Room library for the database access but I used SQL server to learn complicated SQL quickly.
You can clone my repository or just copy the docker-compose.yml file in order to run SQL server in your machine.
This article might be helpful to establish the environment.
What I want to achieve
What I want to do is to show a list like following.
item_name | value | start_time | end_time | type | unit | sort | category_name |
---|---|---|---|---|---|---|---|
weight | 50 | 2021-07-19 10:00:00.0000000 | NULL | 3 | kg | 2 | category-3 |
pushup | 30 | 2021-07-19 10:01:00.0000000 | 2021-07-19 10:05:00.0000000 | 2 | reps | 3 | category-2 |
water | 350 | 2021-07-19 10:00:00.0000000 | NULL | 1 | ml | 1 | category-1 |
salt | 5 | 2021-07-19 10:10:00.0000000 | 2021-07-19 10:15:00.0000000 | 4 | g | 4 | category-4 |
squat | 50 | 2021-07-19 10:20:00.0000000 | 2021-07-19 10:30:01.0000000 | 2 | reps | 5 | category-2 |
sugar | 10 | 2021-07-19 10:30:00.0000000 | 2021-07-19 10:35:01.0000000 | 4 | g | 6 | category-4 |
beer | 700 | 2021-07-19 10:40:00.0000000 | 2021-07-19 11:00:01.0000000 | 1 | ml | 7 | category-1 |
water | 200 | 2021-07-19 11:05:00.0000000 | 2021-07-19 11:10:01.0000000 | 1 | ml | 1 | category-1 |
The application records everything what I need and show them. I need to create some tables to do this work. Following ER diagram is what I created for the app.
I separated the table into two main tables, Entry and Item. There are also Unit and Category tables because I want to enable the app to create new unit and category when I need it. That’s why there are 4 tables here. If its feature is not necessary Category and Unit tables are not necessary.
Table creation and preparation of data
I created the tables and inserted the data with this SQL queries below.
Unit table
CREATE TABLE Unit(
uid int NOT NULL,
name nvarchar(50) NOT NULL UNIQUE,
PRIMARY KEY(uid)
);
INSERT INTO Unit VALUES
(1, 'ml'),
(2, 'g'),
(3, 'kg'),
(4, 'reps');
Category table
CREATE TABLE Category(
uid int NOT NULL,
name nvarchar(50) NOT NULL UNIQUE,
parent int,
PRIMARY KEY(uid)
);
INSERT INTO Category VALUES
(1, 'category-1', NULL),
(2, 'category-2', NULL),
(3, 'category-3', 1),
(4, 'category-4', 3);
Item table
CREATE TABLE Item(
uid int NOT NULL,
name nvarchar(50) NOT NULL UNIQUE,
type int,
unit int,
sort int UNIQUE,
category_id int,
PRIMARY KEY(uid),
FOREIGN KEY (unit) REFERENCES Unit(uid),
FOREIGN KEY (category_id) REFERENCES Category(uid),
);
INSERT INTO Item VALUES
(1, 'water', 1, 1, 1, 1),
(2, 'pushup', 2, 4, 3, 2),
(3, 'weight', 3, 3, 2, 3),
(4, 'salt', 4, 2, 4, 4),
(5, 'squat', 2, 4, 5, 2),
(6, 'sugar', 4, 2, 6, 4),
(7, 'beer', 1, 1, 7, 1);
Entry table
CREATE TABLE Entry(
uid int NOT NULL,
item int,
start_time datetime2,
end_time datetime2,
value int,
PRIMARY KEY(uid),
FOREIGN KEY (item) REFERENCES Item(uid),
);
INSERT INTO Entry VALUES
(1, 3, '2021-07-19 10:00:00.000', NULL, 50),
(2, 2, '2021-07-19 10:01:00.000', '2021-07-19 10:05:00.000', 30),
(3, 1, '2021-07-19 10:00:00.000', NULL, 350),
(4, 4, '2021-07-19 10:10:00.000', '2021-07-19 10:15:00.000', 5),
(5, 5, '2021-07-19 10:20:00.000', '2021-07-19 10:30:01.000', 50),
(6, 6, '2021-07-19 10:30:00.000', '2021-07-19 10:35:01.000', 10),
(7, 7, '2021-07-19 10:40:00.000', '2021-07-19 11:00:01.000', 700),
(8, 1, '2021-07-19 11:05:00.000', '2021-07-19 11:10:01.000', 200);
The necessary data are prepared correctly. Next step is to extract the data from the tables.
How to merge two tables by outer join
Let’s take a look how to write the query to merge two tables. Entry table is the main table and additional information needs to be added there. There are two ways to do this.
LEFT OUTER JOIN
RIGHT OUTER JOIN
The difference is which table is treated as main/base table. LEFT OUTER JOIN treats the first table as main/base table. Following is an example for it.
SELECT *
FROM Entry LEFT OUTER JOIN ON Item Entry.item = Item.uid
In this case, Entry table is treated as main/base table and Item info is added to the Entry table. We need to set on which condition the data are merged because there are many rows in Item table. We need to pick one of data to merge it to one of rows in the main table.
If the table name is a bit long we can specify alias by AS
keyword.
SELECT *
FROM Entry AS e LEFT OUTER JOIN ON Item AS i e.item = i.uid
Following is the result of the left outer join.
It looks good. Item name, unit and other items are added to the Entry table.
Multiple LEFT OUTER JOIN on multiple tables
Let’s check how to do left outer join on multiple tables before merging 4 tables in one query. The target tables are following.
The query is simple. I simply wrote two LEFT OUTER JOIN.
SELECT *
FROM Item
LEFT OUTER JOIN Unit ON Item.unit = Unit.uid
LEFT OUTER JOIN Category ON Item.category_id = Category.uid
The point is to write the merge condition right after the LEFT OUTER JOIN
keyword. Otherwise, it shows an error. In this example, Unit table info is added to the Item table’s rows that has the same unit id. Category table is merged in the same way. Following is the result.
It works as expected but there are unnecessary or unreadable columns. Let’s make it clean.
SELECT Item.name, Item.type, Item.sort, Unit.name as unit, Category.name as category
FROM Item
LEFT OUTER JOIN Unit ON Item.unit = Unit.uid
LEFT OUTER JOIN Category ON Item.category_id = Category.uid
ORDER BY Item.sort
LEFT OUTER JOIN with subquery
I explained how to use LEFT OUTER JOIN to merge two tables and three tables in the examples above. However, the goal is to merge them all in one query. How can we merge them all? LEFT OUTER JOIN requires only one table. If we need to merge multiple tables on different level we need to consider the order of the merge. The order is either top to down (blue square first) or down to top (red square first).
Which is better? Blue square first, red square first? We need subquery to create a merged table first that is used as a base table in the subsequent step.
Firstly, I wrote a query to create a merged table surrounded by a blue square first. And then, merge other two tables. Following is the query for it but it looks long and it has two subqueries.
SELECT tmp2.item_name, tmp2.unit_name, tmp2.value, tmp2.start_time, tmp2.end_time, tmp2.type, tmp2.unit, tmp2.sort, Category.name
FROM Category
RIGHT OUTER JOIN
(
SELECT tmp.*, Unit.name as unit_name
FROM Unit
RIGHT OUTER JOIN
(
SELECT Entry.*, Item.type, Item.name as item_name, Item.unit, Item.sort, Item.category_id
FROM Entry
LEFT OUTER JOIN Item ON Entry.item = Item.uid
) as tmp
ON Unit.uid = tmp.unit
) as tmp2
ON Category.uid = tmp2.category_id
ORDER BY tmp2.sort
It includes both LEFT OUTER JOIN and RIGHT OUTER JOIN in the query. When both of them are included in one query it is not readable because it is not consistent. The result looks following.
How does it look like when we create red squared table first?
SELECT *
FROM Entry
LEFT OUTER JOIN (
SELECT Item.uid, Item.name as item_name, Unit.name as unit, Category.name as category
FROM Item
LEFT OUTER JOIN Unit ON Item.unit = Unit.uid
LEFT OUTER JOIN Category ON Item.category_id = Category.uid
) as Item2
ON Entry.item = Item2.uid
It has only one subquery and it includes only LEFT OUTER JOIN. The query is simpler than the previous one. It is better to merge deep level tables first to make the code simple. The result is of course the same.
End
Database normalization is one of important techniques to make the data maintainable. However, it increases SQL complexity and execution time of the SQL due to merging tables. It is hard to change database scheme, so I recommend designing table well at start. We can do performance tuning afterwards if necessary. We can also merge those info in application side. If the SQL you require gets big you can consider splitting the SQL into multiple SQLs and merge them again in application side. It depends on how many entries there are in the tables.
Comments