Being able to view products and pages liked by other users is not only an interesting exercise, but also a nearly-ubiquitous marketing tool—and, it can be easily implemented with MySQL.
What is Collaborative Filtering?
Nearly every dynamic website uses collaborative filtering to maximize users' exposure to a wider range of potentially relevant content. You've seen it on Amazon:
...and you've certainly seen it on Facebook:
In fact, collaborative filtering is arguably part of the fabric of modern social networking, e-commerce, and news websites—even when you can't see it. Of course, especially on large sites, most of this filtering is now done using AI, but the fundamental framework has never changed. This article will walk you through a basic implementation of collaborative filtering using MySQL.
Getting Started: Database Creation and Data Entry
For this tutorial, we will consider a scenario in which you want to discover new products in an online store by analyzing records of products liked by specific users. For example, if you were to visit the page of an interesting product, you might also want to explore the other products liked by users who liked this specific product—just in case your tastes happen to match.
In this case, MySQL is an effective database solution, so we'll begin by creating a database called mydb
:
CREATE SCHEMA IF NOT EXISTS `mydb`;
Inside mydb
, we can now create a table product
, where products with likes can be stored:
CREATE TABLE IF NOT EXISTS `mydb`.`product` (
`id` INT NOT NULL AUTO_INCREMENT,
`name_product` VARCHAR(45) NOT NULL,
`price_product` DECIMAL NOT NULL,
`amount_product` INT NOT NULL,
PRIMARY KEY (`id`))
ENGINE = InnoDB;
We also need to create a table for users. In this case, we'll call it user
and give it three columns: id
, user_name
, and user_pass
:
CREATE TABLE IF NOT EXISTS `mydb`.`user` (
`id` INT NOT NULL,
`user_address` VARCHAR(45) NOT NULL,
`user_pass` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`))
ENGINE = InnoDB;
Of course, in most cases, a table for user information may contain tens—or even hundreds—of columns to store all kinds of data about a user, but for the sake of simplicity, we'll keep it to just three for now.
Finally, we will need a table that keeps track of who likes what. We'll call it likes
and give it two columns—storing both the user's id and the product id for the "liked" item:
CREATE TABLE IF NOT EXISTS `mydb`.`likes` (
`user_id` INT NOT NULL,
`product_id` INT NOT NULL,
INDEX `fk_likes_user1_idx` (`user_id` ASC),
INDEX `fk_likes_product1_idx` (`product_id` ASC),
CONSTRAINT `fk_likes_user1`
FOREIGN KEY (`user_id`)
REFERENCES `mydb`.`user` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_likes_product1`
FOREIGN KEY (`product_id`)
REFERENCES `mydb`.`product` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
As you can see, each column gets its own index, and in order to ensure that invalid data doesn't get added to the table (thereby breaking the relationship between likes
and our other tables), we have also added foreign key constraints for both columns. This way, any new "like" is checked to make sure that:
user_id
matches an id
in user
product_id
matches an id
in product
Now let's fill in the table with products. For this example, we'll use cosmetic products:
INSERT INTO product (id, name_product, price_product, amount_product) VALUES
(1, 'Shampoo perfumes "Clear"', 130, 100),
(2, 'Eyeshadow "Makeup"', 245, 104),
(3, 'Face Lotion "Gigi"', 120, 99),
(4, 'Lipstick Set "LP"', 250, 165),
(5, 'Highlighter "NAC"', 175, 111),
(6, 'Powder "Pow"', 190, 205),
(7, 'Toner "Tonn"', 160, 198),
(8, 'Blush "Gigi"', 135, 57),
(9, 'Highlighter "Lily"', 200, 30),
(10, 'Shampoo "Lux"', 160, 11),
(11, 'Lipstick "Gigi"', 90, 225);
This is what your product
table should look like:
Now, we can add some sample users to our user
table:
INSERT INTO `mydb`.user (`id`, `user_address`, `user_pass`) VALUES
(1,'ala.l.e.xa@gmail.com','2139ujsaduif'),
(2,'ni.c.k@gmail.com','9543fjkshfei23'),
(3,'a.u.r.ora@gmail.com','324GFGuur'),
(4,'vic.t.oria@gmail.com','342geuir77'),
(5,'1.234.5@gmail.com','234jfie4Yt');
It should look something like this:
And finally, we can fill in the likes
table with some sample like data:
INSERT INTO `mydb`.likes (`user_id`, `product_id`) VALUES (4,4),(4,1),
(1,6),(3,8),(2,3),(5,10),(3,10),(3,6),(3,7),(1,2),(1,5),(2,8),
(2,9),(5,7),(5,5),(5,1),(2,10),(2,4),(3,4),(3,5),(3,9),(2,1);
Because one user can like more than one product, this table can ultimately have thousands of rows, but for our scenario, we'll stop at 22:
Two Methods for Implementing Collaborative Filtering
With all of this data now stored in our tables, we are now ready for collaborative filtering. In this post, we will achieve this using two different methods.
Method #1: Subqueries
For this first method, our main query selects all rows from likes
where the liked product is NOT our specified ("base") product. Our subquery then selects all users from likes
who DID like our "base" product, in order to create a list against which we can check all user_id
s in our main query:
SELECT product.name_product AS "Product",
COUNT(likes.user_id) AS "Also liked by __ users" FROM `likes`
JOIN product ON product.id = likes.product_id
WHERE likes.user_id IN
(
SELECT user_id
FROM `likes`
WHERE product_id = 10
)
AND likes.product_id <> 10
GROUP BY 1
ORDER BY 2 DESC;
As you can see, we actually did also use JOIN here, but only in order to display the product names—rather than just their IDs—in the result:
Method #2: JOIN
For this method, we first select all rows from likes
where product_id
matches our specified product ID as a list called did_like
. Then, we'll use JOIN
to add all rows from likes
to did_like
(our joined list is assigned the alias also_like
), checking user_id
and product_id
against did_like
to find all other products liked by the same user:
SELECT product.name_product AS "Product",
COUNT(also_like.product_id) AS "Also liked by __ users"
FROM likes AS did_like
JOIN likes AS also_like ON
also_like.user_id = did_like.user_id
AND also_like.product_id != did_like.product_id
JOIN product ON product.id = also_like.product_id
WHERE did_like.product_id = 10
GROUP BY 1
ORDER BY COUNT(also_like.product_id) DESC;
Just like the first method, we used an additional join
to make the results look pretty here:
As you can see, the results of both queries are identical, displaying all products also liked by people who liked "Lux
" Shampoo (product_id
= 10). This script can be easily modified to display similar results for all other products in the product
table.
Of course, most well-known implementations of collaborative filtering are substantially more complex than our simple examples, but the fundamental logic behind all of these systems boils down to the same principles used here.