MySQL: ORDER BY RAND() – a case study of alternatives

I finally had the time to translate my post from german to english – so here we go :)

Das Problem

Many of you reading this post might already stopped by this problem – you just want a few random rows from your database and most of the times you might have used “ORDER BY RAND()” and not thought about, what’s happening there. Just add all the columns you want to select, add JOINS, WHERE, etc. and add ORDER BY RAND() and LIMIT.

For small applications this might even scale well, but what happens in large applications with hundreds, thousands or millions of rows in a table?

I want to look a little closer into this. While developing my deprecated rhCMS (a content management system), I asked myself the same question.

Problem Analysis

First of all you got to ask “What happens with ORDER BY RAND() in a Query?” and understand it.

The MySQL Database will query for all the rows you requested and filter out those that don’t match from WHERE, JOIN conditions etc. Then all rows will be in memory or in a temporary table. Then the RAND() will be replaced with a random number and attached to each row. After this, the set can be sorted ascending or descending. And then the LIMIT will be used. Using temporary tables is really bad because this causes a lot of problems and is not performant at all.

Solutions

Let’s take a look how we can solve this problem. First of all I want to show you a PHP solution:

A PHP solution

Query for the total number of rows:

<?php echo $mysql_hl->highlight('SELECT COUNT(*) FROM table'); ?>

and then query for a random number between 1 and the number of rows:

<?php echo $php_hl->highlight('<?php
  $random_id = mt_rand(1, [ COUNT(*) - Ergebnis ]);
?>'); ?>

Afterwards you can query for the ID:

<?php echo $mysql_hl->highlight('SELECT colA, colB FROM table WHERE idCol = ...'); ?>

But as you probably already asked yourself:

  1. What happens when I delete rows?
  2. What if I want more than one row?

The first question seems rather easy, right? Just query as long as you don’t get a row. But this isn’t the best solution!

And how about multiple rows? The easiest way could be put another loop around it and query for random IDs the number of times you want. This seems to be acceptable but it still has some weaknesses. Let’s have a look at a MySQL based solution:

MySQL solution

The trick is to create a MySQL PROCEDURE that fills a temporary table with rows containing random IDs that are returned to the user. But then we would have the same problem: What if the ID didn’t exist or it exists twice in our temporary table?

The final solution looks like this:

  1. Create a temporary table to save random IDs and make the ID a UNIQUE KEY.
  2. Query the maximum ID
  3. Put the IDs into the table
  4. Return the dataset

You might ask: What happens if I insert an ID twice? The solution is simple: The error handler will recursively call our procedure. This way the UNIQUE KEY will make sure no ID is inserted twice and another ID is selected in this case.

And this is what the code looks like:

<?php

echo $mysql_hl->highlight("DROP PROCEDURE IF EXISTS proc_get_random_post_id;

DELIMITER //

CREATE PROCEDURE proc_get_random_post_id (IN cnt INT)
BEGIN
  DECLARE max_id INTEGER;

  DECLARE EXIT HANDLER FOR SQLEXCEPTION
  BEGIN
    CALL proc_get_random_post_id(cnt);
  END;

  IF cnt >= 1 THEN
    DROP TEMPORARY TABLE IF EXISTS random;
    CREATE TEMPORARY TABLE random ( post_id INTEGER(11) unsigned, UNIQUE KEY(post_id) );

    insert_loop : LOOP
      IF cnt < 1 THEN
        LEAVE insert_loop;
      END IF;

      SELECT MAX(post_id) FROM posts INTO max_id;

      INSERT INTO random
        SELECT
         p.post_id
        FROM
          posts AS p
        JOIN
          (SELECT (RAND() *  max_id) AS random_id) AS r
        WHERE
          p.post_id >= r.random_id
        ORDER BY
          p.post_id
        LIMIT
          1;

      SET cnt = cnt - 1;
    END LOOP insert_loop;

    SELECT post_id FROM random;
  END IF;
END//

DELIMITER ;");
?>

Looks pretty much to get a few rows, huh? But is this really better than a PHP based solution? Let’s have a look at some benchmarks:

A problem could be the maximum recursion depth that is 0 by default. So make sure you increase it:

<?php echo $mysql_hl->highlight('SET max_sp_resursion_depth = 2;'); ?>

Note: “2″ means the number of collisions here. So make sure to increase this value for larger tables.

With a  few rows (~ 25000) the procedure wasn’t that much larger but with ~230.000 rows the procedure was almost 10 times faster! And then I queried a forum with ~4500000 posts for a random post id. The results looks like this:

The first query with the following code took about 72 seconds:

<?php echo $mysql_hl->highlight('SELECT post_id FROM posts ORDER BY RAND() LIMIT 20'); ?>
mysql&gt; SELECT post_id FROM posts ORDER BY RAND() LIMIT 20;
+---------+
| post_id |
+---------+
| 2013034 |
| 3579000 |
| 3316421 |
| 4093120 |
| 2359496 |
| 1885511 |
| 3835027 |
| 1046873 |
| 1185652 |
| 4331911 |
| 4890819 |
|  240006 |
| 2437795 |
| 1712680 |
|  868551 |
| 2096902 |
| 3083704 |
|  856132 |
| 2190370 |
| 4862958 |
+---------+
20 rows in set (52.66 sec)

The second one 35 seconds and the average was at about 25 seconds.

Note: When querying an ID with another column (e.g. a DATETIME field) the query only took about 5 seconds. And now let’s have a look into our procedure:

<?php echo $mysql_hl->highlight('CALL proc_get_random_post_id(20);'); ?>
mysql&gt; CALL proc_get_random_post_id(20);
  +---------+
  | post_id |
  +---------+
  |  354319 |
  |  436352 |
  |  762503 |
  |  924832 |
  | 1395933 |
  | 1515758 |
  | 1633515 |
  | 1731376 |
  | 1789180 |
  | 2512748 |
  | 2549002 |
  | 2854479 |
  | 2863860 |
  | 3103881 |
  | 3233293 |
  | 4337101 |
  | 4560893 |
  | 4596010 |
  | 4609112 |
  | 4960802 |
  +---------+
  20 rows in set (0.04 sec)

This small piece of code now calls our procedure in the background and only takes about 0,034 seconds!

This small example shows you how to quickly query for rows. And it is very easy to extend this to query for even more columns.

Another PHP solution

Someone asked me for a pure PHP solution and that’s what I came up with:

<?php echo $php_hl->highlight('<?php
  /**
   * Default Random ID Class
   *
   * @author Robert Hartung
   * @copyright www.roberthartung.de, 2014
   */

  class randomIdList
  {
    /**
     * MySQLI instance
     *
     * @var mysqli
     */

    protected $mysqli;

    /**
     * Maximum duplicate key errors
     *
     * @var integer
     */

    private $max_errors = 10;

    /**
     * Random ID Count
     *
     * @var integer
     */

    private $count;

    /**
     * Table to select from
     *
     * @var string
     */

    private $table;

    /**
     * Column that contains the ID
     *
     * @var string
     */

    private $column_id;

    /**
     * Constructor
     */

    public function __construct(mysqli $mysqli, $table, $column_id)
    {
      $this->mysqli = $mysqli;
      $this->table = $table;
      $this->column_id = $column_id;
    }

    /**
     * Returns a result set of random IDs
     *
     * @param boolean $return Query for the results?
     *
     * @return mysqli_result
     */

    public function getIDs($count, $return = true)
    {
      $this->count = $count;

      /* Do some preparing stuff */

      $this->mysqli->multi_query("DROP TEMPORARY TABLE IF EXISTS random;
      CREATE TEMPORARY TABLE random ( random_id INTEGER(11) unsigned, UNIQUE KEY(random_id) );
      SELECT @max_id := MAX(post_id) FROM pmcms_forum_posts");

      /* Fetch the results otherwise: Command out of sync error */

      while($this->mysqli->more_results())
      {
        $this->mysqli->next_result();
        $this->mysqli->use_result();
      }

      /* Prepare the statement so the query needs to be parsed only once! */

      $stmnt = $this->mysqli->prepare("INSERT INTO random
          SELECT
           ".$this->column_id."
          FROM
            ".$this->table." AS p
          JOIN
            (SELECT (RAND() * @max_id) AS random_id) AS r
          WHERE
            ".$this->column_id." >= r.random_id
          ORDER BY
            ".$this->column_id."
          LIMIT
            1");

      /* Get Random IDs */

      while($this->count > 0 && $this->max_errors > 0)
      {
         $stmnt->execute();

         /* if there was an error - go on */
         if($this->mysqli->errno === 1062)
         {
           $this->max_errors -= 1;
         }
         else
         {
           $this->count -= 1;
         }
      }

      /* If requested, return the result otherwise null*/

      if($return)
        return $this->mysqli->query("SELECT random_id FROM random");

      return null;
    }
  }
?>');
?>

The class expects a MySQLi instance as a parameter as well as the table name and your ID’s column name. Using the method getIDs($count, [ $return = true ]) you can ask for rows. The function uses the same concept as our MySQL procedure

  1. Create a temporary table
  2. Get maximum
  3. Find random IDs
  4. Return resulös

This object oriented solution has another benefit: You don’t have to create a procedure for every table in your database. You could easily extend this class to query for more than this column:

<?php echo $php_hl->highlight('<?php
  /**
   * Example Class for Posts
   *
   * @author Robert Hartung
   * @copyright www.roberthartung.de, 2014
   */

  final class randomPosts extends randomIdList
  {
    public function __construct(mysqli $mysqli)
    {
      parent::__construct($mysqli, \'forum_posts\', \'post_id\');
    }

    /**
     * Returns a different result set
     *
     * @return mysqli_result
     */

    public function getIDs($count)
    {
      parent::getIDs($count, false);

      return $this->mysqli->query("SELECT post_id FROM random JOIN forum_posts ON post_id = random_id");
    }

    public function getRandomPosts($count)
    {
      parent::getIDs($count, false);

      return $this->mysqli->query("SELECT post_id, post_datetime, post_topic_id FROM random JOIN forum_posts ON post_id = random_id");
    }
  }
?>');
?>

We could use this class like this:

<?php echo $php_hl->highlight('<?php
  /** Basic */

  $random = new randomIdList($mysqli, \'pmcms_forum_posts\', \'post_id\');
  $qry_random = $random->getIDs(5);

  while($number = $qry_random->fetch_assoc())
  {
    /* Do something */
  }

  /** Basic 2 */

  $qry_random = $random->getIDs(10);

  while($number = $qry_random->fetch_assoc())
  {
    /* Do something */
  }

  /** Inherited class */

  $random = new randomPosts($mysqli);
  $qry_random = $random->getIDs(5);

  while($number = $qry_random->fetch_assoc())
  {
    /* Do something */
  }

  $qry_random = $random->getRandomPosts(5);

  while($post = $qry_random->fetch_assoc())
  {
    /* Do something */
  }
?>');
?>

This should give you a good idea on how to avoid using MySQL’s ORDER BY RAND(). I appreciate any kind of feedback.

[Dart+Polymer] Styling Elements

Styling custom elements is sometimes tricky. In general the website styles do NOT apply to the custom element. There is a special attribute called “apply-author-styles” that can be set to “true” to apply your website’s styles to your custom element.

In general each custom element should have its  styles separated from the website so it can be reused easily. Let’s have a look at a simple example:

dart_polymer_styling.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Dart polymer styling</title>
    <link rel="import" href="external-styles.html">
    <link rel="import" href="internal-styles.html">
    <style>
      .styled {
        background-color: yellow;
        font-weight: bold;
        font-style: italic;
      }
    </style>
  </head>
  <body>
    <external-styles></external-styles>
    <internal-styles></internal-styles>

    <script type="application/dart" src="dart_polymer_styling.dart"></script>
    <script src="packages/browser/dart.js"></script>
  </body>
</html>

This simple HTML file imports two custom elements from two separate files and defines a small CSS rule.

dart_polymer_styling.dart

import 'package:polymer/polymer.dart';

void main() {
  initPolymer();
}

The corresponding .dart-file is rather small and could be avoided if the init script from polymer is used directly

The two custom elements look like this:

external-styles.dart

<polymer-element name="external-styles">
  <template>
    <div class="styled">Externally Styled</div>
  </template>
  <script type="application/dart" src="external-styles.dart"></script>
</polymer-element>

external-styles.html

import 'package:polymer/polymer.dart';

@CustomTag('external-styles')
class ExternalStylesElement extends PolymerElement {
  bool get applyAuthorStyles =&gt; true;

  ExternalStylesElement.created() : super.created() {
    print(this.toString() + " created");
  }
}

internal-styles.dart

<polymer-element name="internal-styles">
  <template>
    <style>
      .styled {
        color: #f00;
      }
    </style>
    <div class="styled">Internally Styles</div>
  </template>
  <script type="application/dart" src="internal-styles.dart"></script>
</polymer-element>

internal-styles.html

import 'package:polymer/polymer.dart';

@CustomTag('internal-styles')
class InternalStylesElement extends PolymerElement {
  // applyAuthorStyles is false by default

  InternalStylesElement.created() : super.created() {
    print(this.toString() + " created");
  }
}

As you can see here the trick with polymer elements in Dart is to define/overwrite the getter for applyAuthorStyles. Also make sure the <style>-tag is inside of your <template>-tag.

Animatr – GUI & Structure: Nesting Polymer Elements

The first step of rebuilding animatr was to remove all code and totally rebuild the GUI using a single custom element: <animatr-gui> using polymer. The GUI itself has three nested custom elements:

<animatr-titlebar> - Containins the windows’ title, close button, …
<animatr-toolbars> – Wrapper for the toolbars
<animatr-panels> – A wrapper to create a GUI using pannels that can be dragged.

The HTML code of the root element looks like this:

<head>
  <link href="packages/animatr_gui/elements/animatr-ui.html" rel="import" />
</head>
<body>
  <animatr-ui></animatr-ui>
</body>

The corresponding element HTML code is:

<polymer-element name="animatr-ui">
  <meta charset="utf-8"/>
  <link rel="import" href="animatr-titlebar.html">
  <link rel="import" href="animatr-toolbars.html">
  <link rel="import" href="animatr-panels.html">
  <link rel="import" href="animatr-panel-....html">
  <template>
      <animatr-titlebar id="animatr-titlebar"></animatr-titlebar>
      <animatr-toolbars id="animatr-toolbars"></animatr-toolbars>
      <animatr-panels id="animatr-panels"></animatr-panels>
  </template>
  <script type="application/dart" src="animatr-ui.dart"></script>
</polymer-element>
 

The Dart code for the custom element is:


import 'package:polymer/polymer.dart';

import 'dart:html';

import 'package:animatr_gui/elements/animatr-titlebar.dart';
import 'package:animatr_gui/elements/animatr-toolbars.dart';
import 'package:animatr_gui/elements/animatr-panels.dart';

import 'animatr-panel-....dart';

@CustomTag('animatr-ui')
class AnimatrUiElement extends PolymerElement {
  bool get applyAuthorStyles =&gt; true;

  ...PanelElement panel_...;

  AnimatrUiElement.created() : super.created() {
    panel_... = document.createElement('animatr-panel-...');
  }

  AnimatrTitlebarElement titlebar;

  AnimatrToolbarsElement toolbars;

  AnimatrPanelsElement panels;

  @observable Project currentProject;

  @override
  void enteredView() {
    super.enteredView();

    titlebar = $['animatr-titlebar'];
    toolbars = $['animatr-toolbars'];
    panels = $['animatr-panels'];

    titlebar.ui = this;
    panels.ui = this;

    panels.addPanel(panel_..., 'north/east/south/west/center');
   }
}

Nesting elements was complicated first. Assuming you have your project’s root file (mine is called animatr.html), you could import all elements in this file, at a central place. But this had some strange behaviour to me:
The outer elements get created first and thus, before all nested elements. Thus I wasn’t able to correctly query for the nested elements. Because the custom element wasn’t registered Google Dart had some type errors.

The solution was to import other elements within the custom element itself, right before the <template> tag.

I had another problem using special characters. It turned out that the charset settings (e.g. by using <meta charset=”utf-8″/>) is not propagated to custom elements. Thus make sure to include a <meta>-tag in your custom elements if you want to use custom elements.

Animatr – A first introduction

In this post I want to briefly introduce you to my most recent project called “animatr”. Animatr is a tool to create canvas animations. I will focus on advertisements but in theory you can create all types of animations and sizes. Even playing a video within the animation is possible.

The first version was developed about a year ago and was based on fabric.js But very soon I felt the framework limited me to their features. So I quickly decided to rewrite it as a web app on my own. Due to some specific webkit/chrome features I decided to finally rewrite the project using Google’s Dart and to make an app out of it using node-webkit.

The latest and third version is based on Google Dart, Node-Webkit and Polymer. While polymer is the replacement for web-ui.

I will write follow up blogs about concepts and code I use while developing animatr and give you examples. So far I can tell that detailed examples are rare. That’s why I felt to blog about it myself.

My blog

I have been thinking for years now if I should start blogging or not. Finally I decided to do so :)

I will be blogging about all web topics I will deal with. Nevertheless I want to focus on Dart + Polymer I have been working on recently.