Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / MySQL

Collaborative Filtering in MySQL: A Tutorial

5.00/5 (1 vote)
22 Apr 2021CPOL4 min read 4.6K  
Simple examples of collaborative filtering with fundamental logic behind all of these systems boiling down to same principles used here
Being able to view products and pages liked by other users is not only an interesting exercise, but also a nearly-ubiquitous marketing tool—and, it can be easily implemented with MySQL.

Collaborative Filtering in MySQL: A Tutorial

What is Collaborative Filtering?

Nearly every dynamic website uses collaborative filtering to maximize users' exposure to a wider range of potentially relevant content.  You've seen it on Amazon:

Collaborative Filtering in MySQL: A Tutorial

...and you've certainly seen it on Facebook:

Collaborative Filtering in MySQL: A Tutorial

In fact, collaborative filtering is arguably part of the fabric of modern social networking, e-commerce, and news websites—even when you can't see it. Of course, especially on large sites, most of this filtering is now done using AI, but the fundamental framework has never changed. This article will walk you through a basic implementation of collaborative filtering using MySQL.

Getting Started: Database Creation and Data Entry

For this tutorial, we will consider a scenario in which you want to discover new products in an online store by analyzing records of products liked by specific users. For example, if you were to visit the page of an interesting product, you might also want to explore the other products liked by users who liked this specific product—just in case your tastes happen to match.

In this case, MySQL is an effective database solution, so we'll begin by creating a database called mydb:

SQL
CREATE SCHEMA IF NOT EXISTS `mydb`;

Inside mydb, we can now create a table product, where products with likes can be stored:

SQL
CREATE TABLE IF NOT EXISTS `mydb`.`product` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name_product` VARCHAR(45) NOT NULL,
  `price_product` DECIMAL NOT NULL,
  `amount_product` INT NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

We also need to create a table for users. In this case, we'll call it user and give it three columns: id, user_name, and user_pass:

SQL
CREATE TABLE IF NOT EXISTS `mydb`.`user` (
  `id` INT NOT NULL,
  `user_address` VARCHAR(45) NOT NULL,
  `user_pass` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;

Of course, in most cases, a table for user information may contain tens—or even hundreds—of columns to store all kinds of data about a user, but for the sake of simplicity, we'll keep it to just three for now.

Finally, we will need a table that keeps track of who likes what. We'll call it likes and give it two columns—storing both the user's id and the product id for the "liked" item:

SQL
CREATE TABLE IF NOT EXISTS `mydb`.`likes` (
  `user_id` INT NOT NULL,
  `product_id` INT NOT NULL,
  INDEX `fk_likes_user1_idx` (`user_id` ASC),
  INDEX `fk_likes_product1_idx` (`product_id` ASC),
  CONSTRAINT `fk_likes_user1`
    FOREIGN KEY (`user_id`)
    REFERENCES `mydb`.`user` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_likes_product1`
    FOREIGN KEY (`product_id`)
    REFERENCES `mydb`.`product` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

As you can see, each column gets its own index, and in order to ensure that invalid data doesn't get added to the table (thereby breaking the relationship between likes and our other tables), we have also added foreign key constraints for both columns. This way, any new "like" is checked to make sure that:

  • user_id matches an id in user
  • product_id matches an id in product

Now let's fill in the table with products. For this example, we'll use cosmetic products:

SQL
INSERT INTO product (id, name_product, price_product, amount_product) VALUES 
(1, 'Shampoo perfumes "Clear"', 130, 100),
(2, 'Eyeshadow "Makeup"', 245, 104),
(3, 'Face Lotion "Gigi"', 120, 99),
(4, 'Lipstick Set "LP"', 250, 165),
(5, 'Highlighter "NAC"', 175, 111),
(6, 'Powder "Pow"', 190, 205),
(7, 'Toner "Tonn"', 160, 198),
(8, 'Blush "Gigi"', 135, 57),
(9, 'Highlighter "Lily"', 200, 30),
(10, 'Shampoo "Lux"', 160, 11),
(11, 'Lipstick "Gigi"', 90, 225);

This is what your product table should look like:

Collaborative Filtering in MySQL: A Tutorial

Now, we can add some sample users to our user table:

SQL
INSERT INTO `mydb`.user (`id`, `user_address`, `user_pass`) VALUES 
(1,'ala.l.e.xa@gmail.com','2139ujsaduif'),
(2,'ni.c.k@gmail.com','9543fjkshfei23'),
(3,'a.u.r.ora@gmail.com','324GFGuur'),
(4,'vic.t.oria@gmail.com','342geuir77'),
(5,'1.234.5@gmail.com','234jfie4Yt');

It should look something like this:

Collaborative Filtering in MySQL: A Tutorial

And finally, we can fill in the likes table with some sample like data:

SQL
INSERT INTO `mydb`.likes (`user_id`, `product_id`) VALUES (4,4),(4,1),
(1,6),(3,8),(2,3),(5,10),(3,10),(3,6),(3,7),(1,2),(1,5),(2,8),
(2,9),(5,7),(5,5),(5,1),(2,10),(2,4),(3,4),(3,5),(3,9),(2,1);

Because one user can like more than one product, this table can ultimately have thousands of rows, but for our scenario, we'll stop at 22:

Collaborative Filtering in MySQL: A Tutorial

Two Methods for Implementing Collaborative Filtering

With all of this data now stored in our tables, we are now ready for collaborative filtering. In this post, we will achieve this using two different methods.

Method #1: Subqueries

For this first method, our main query selects all rows from likes where the liked product is NOT our specified ("base") product. Our subquery then selects all users from likes who DID like our "base" product, in order to create a list against which we can check all user_ids in our main query:

SQL
SELECT product.name_product AS "Product", 
COUNT(likes.user_id) AS "Also liked by __ users" FROM `likes` 
JOIN product ON product.id = likes.product_id
WHERE likes.user_id IN
(
 SELECT user_id
 FROM `likes`
 WHERE product_id = 10 -- insert "base" product id here
)
AND likes.product_id <> 10 -- insert "base" product id here
GROUP BY 1
ORDER BY 2 DESC;

As you can see, we actually did also use JOIN here, but only in order to display the product names—rather than just their IDs—in the result:

Collaborative Filtering in MySQL: A Tutorial

Method #2: JOIN

For this method, we first select all rows from likes where product_id matches our specified product ID as a list called did_like. Then, we'll use JOIN to add all rows from likes to did_like (our joined list is assigned the alias also_like), checking user_id and product_id against did_like to find all other products liked by the same user:

SQL
SELECT product.name_product AS "Product", 
COUNT(also_like.product_id) AS "Also liked by __ users"
FROM likes AS did_like
JOIN likes AS also_like ON
   also_like.user_id = did_like.user_id
   AND also_like.product_id != did_like.product_id
JOIN product ON product.id = also_like.product_id
WHERE did_like.product_id = 10 -- insert "base" product id here
GROUP BY 1
ORDER BY COUNT(also_like.product_id) DESC;

Just like the first method, we used an additional join to make the results look pretty here:

Collaborative Filtering in MySQL: A Tutorial

As you can see, the results of both queries are identical, displaying all products also liked by people who liked "Lux" Shampoo (product_id = 10). This script can be easily modified to display similar results for all other products in the product table.

Of course, most well-known implementations of collaborative filtering are substantially more complex than our simple examples, but the fundamental logic behind all of these systems boils down to the same principles used here.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)