<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Electricmonk.nl weblog &#187; sql</title>
	<atom:link href="http://www.electricmonk.nl/log/category/programming/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.electricmonk.nl/log</link>
	<description>Ferry Boender&#039;s ramblings</description>
	<lastBuildDate>Mon, 16 Jan 2012 15:23:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Comment your MySQL schema</title>
		<link>http://www.electricmonk.nl/log/2010/07/05/comment-your-mysql-schema/</link>
		<comments>http://www.electricmonk.nl/log/2010/07/05/comment-your-mysql-schema/#comments</comments>
		<pubDate>Mon, 05 Jul 2010 05:15:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4483</guid>
		<description><![CDATA[Many people may not now, but you can comment your MySQL schema: SQL: good comments conventions]]></description>
			<content:encoded><![CDATA[<p>Many people may not now, but you can comment your MySQL schema:</p>
<p><a href="http://code.openark.org/blog/mysql/sql-good-comments-conventions">SQL: good comments conventions</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2010/07/05/comment-your-mysql-schema/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Maatkit: Tools for MySQL</title>
		<link>http://www.electricmonk.nl/log/2010/06/18/maatkit-tools-for-mysql/</link>
		<comments>http://www.electricmonk.nl/log/2010/06/18/maatkit-tools-for-mysql/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 10:04:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4479</guid>
		<description><![CDATA[Maatkit is a suite of command-line tools for MySQL. It contains some rather nifty things for query analyses, replication, and other stuff. Some of the more interesting highlights: mk-deadlock-loggerExtract and log MySQL deadlock information. mk-log-playerSplit and play MySQL slow logs. mk-error-logFind new and different MySQL error log entries. mk-index-usageRead queries from a log and analyze [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.maatkit.org/">Maatkit</a> is a suite of command-line tools for MySQL. It contains some rather nifty things for query analyses, replication, and other stuff. Some of the more interesting highlights:</p>
<ul>
<li><b><a href="http://www.maatkit.org/doc/mk-deadlock-logger.html">mk-deadlock-logger</a></b><br />Extract and log MySQL deadlock information.</li>
<li><b><a href="http://www.maatkit.org/doc/mk-log-player.html">mk-log-player</a></b><br />Split and play MySQL slow logs.</li>
<li><b><a href="http://www.maatkit.org/doc/mk-error-log.html">mk-error-log</a></b><br />Find new and different MySQL error log entries.</li>
<li><b><a href="http://www.maatkit.org/doc/mk-index-usage.html">mk-index-usage</a></b><br />Read queries from a log and analyze how they use indexes.</li>
<li><b><a href="http://www.maatkit.org/doc/">And many more&#8230;</a></b></li>
</ul>
<p>Found via <a href="http://www.databasejournal.com">databasejournal.com</a>, which has two articles on Maatkit:</p>
<p><a href="http://www.databasejournal.com/features/mysql/article.php/3882031/The-Wonders-of-Maatkit-for-MySQL.htm">The Wonders of Maatkit for MySQL</a> and<br />
<a href="http://www.databasejournal.com/features/mysql/article.php/3886636/article.htm?utm_source=feedburner&#038;utm_medium=feed&#038;utm_campaign=Feed%25253A+DatabaseJournalNews+%252528Database+Journal+News%252529&#038;utm_content=Google+Reader">Even more Maatkit for MySQL</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2010/06/18/maatkit-tools-for-mysql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQuirreL SQL database browser</title>
		<link>http://www.electricmonk.nl/log/2010/03/19/squirrel-sql-database-browser/</link>
		<comments>http://www.electricmonk.nl/log/2010/03/19/squirrel-sql-database-browser/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:26:01 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[libre software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4460</guid>
		<description><![CDATA[I finally found a decent replacement for the MySQLcc database browser: SQuirreL SQL SQuirreL SQL Client is a graphical Java program that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc It&#039;s Java, so it&#039;s slow, but it does everything I want, and [...]]]></description>
			<content:encoded><![CDATA[<p>I finally found a decent replacement for the MySQLcc database browser:</p>
<p><a href="http://www.squirrelsql.org/">SQuirreL SQL</a></p>
<blockquote><p>SQuirreL SQL Client is a graphical Java program that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc</p></blockquote>
<p>It&#039;s Java, so it&#039;s slow, but it does everything I want, and more:</p>
<ol>
<li>Syntax highlighting</li>
<li>Multiple query tabs</li>
<li>Multiple queries in the same tab (select the query and press ctrl-enter to run it)</li>
<li>Export results</li>
</ol>
<p>It has tons of options you can tweak, and it&#039;s got plugins if you want to extend it. It supports just about every relational (and some non-relational) database out there. </p>
<p>Awesome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2010/03/19/squirrel-sql-database-browser/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Filling gaps in data when using aggregates</title>
		<link>http://www.electricmonk.nl/log/2009/05/12/filling-gaps-in-data-when-using-aggregates/</link>
		<comments>http://www.electricmonk.nl/log/2009/05/12/filling-gaps-in-data-when-using-aggregates/#comments</comments>
		<pubDate>Tue, 12 May 2009 20:26:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4363</guid>
		<description><![CDATA[SQL&#039;s aggregate functions (COUNT, MIN, MAX, etc) in combination with GROUP BY is great for generating statistics from a database. It&#039;s quite easy to retrieve a count of the rows in a database grouped by week number or month. When there are no data points in the table for a particular week or month, however, [...]]]></description>
			<content:encoded><![CDATA[<p>SQL&#039;s aggregate functions (COUNT, MIN, MAX, etc) in combination with <tt>GROUP BY</tt> is great for generating statistics from a database. It&#039;s quite easy to retrieve a count of the rows in a database grouped by week number or month. When there are no data points in the table for a particular week or month, however, you will see gaps in the statistics. Take this example:</p>
<p>We create a table that will have an entry for every download done from a site. Also stored is whether the download was done by a registered user or not:</p>
<pre>
CREATE TABLE downloads (
  id integer primary key auto_increment,
  date datetime,
  registered BOOLEAN
);

INSERT INTO downloads VALUES (NULL, '2009-01-01', FALSE);
INSERT INTO downloads VALUES (NULL, '2009-01-01', TRUE);
INSERT INTO downloads VALUES (NULL, '2009-01-01', FALSE);
INSERT INTO downloads VALUES (NULL, '2009-01-02', FALSE);
INSERT INTO downloads VALUES (NULL, '2009-01-02', FALSE);
INSERT INTO downloads VALUES (NULL, '2009-01-03', TRUE);
INSERT INTO downloads VALUES (NULL, '2009-01-05', FALSE);
INSERT INTO downloads VALUES (NULL, '2009-01-05', FALSE);
</pre>
<p>The table data shows us there were three downloads on the first of January, two on the second, one on the third, etc.</p>
<p>Now we can gather the total number of downloads per day with the following aggregation query:</p>
<pre>
SELECT
  DATE(date) AS day,
  COUNT(id) AS downloads
FROM downloads
GROUP BY DAY(date);

+------------+-----------+
| day        | downloads |
+------------+-----------+
| 2009-01-01 |         3 |
| 2009-01-02 |         2 |
| 2009-01-03 |         1 |
| 2009-01-05 |         2 |
+------------+-----------+
</pre>
<p>As you can see from the results, there are no downloads for the fourth of January, and this gap is reflected in the aggregated result. So if you wanted to use this data to generate a chart, you&#039;d have to fill in the gaps somehow. Doing this using a script isn&#039;t that hard, but that has the disadvantage of having create a new script (not to mention a new data table) which then needs to run periodically. It can also quickly become a pain in the ass when data also needs to be presented grouped by week, month, year, etc. </p>
<p>A simple solution to this is to create a filler table that contains zero-valued data (actually no data at all) for every data point you&#039;d want to show. In our case we&#039;ll need a table with an entry for every day that our downloads span:</p>
<pre>
CREATE TABLE dates (
  date datetime
);

INSERT INTO dates VALUES ('2009-01-01');
INSERT INTO dates VALUES ('2009-01-02');
INSERT INTO dates VALUES ('2009-01-03');
INSERT INTO dates VALUES ('2009-01-04');
INSERT INTO dates VALUES ('2009-01-05');
[etcetera]
INSERT INTO dates VALUES ('2009-01-14');
INSERT INTO dates VALUES ('2009-01-15');
</pre>
<p>We can now perform a LEFT JOIN of the downloads table on our dates table, and we can be sure that no dates will be missing. Any dates for which there is no entry in the downloads table will get a row filled with NULLs. We can filter the dates we which to see by putting a WHERE clause on our dates table.</p>
<pre<
SELECT
  DATE(dates.date) AS day,
  COUNT(downloads.id) AS downloads
FROM dates
LEFT JOIN downloads ON downloads.date = dates.date
WHERE dates.date BETWEEN '2009-01-01' AND '2009-01-06'
GROUP BY DAY(dates.date)

+------------+-----------+
| day        | downloads |
+------------+-----------+
| 2009-01-01 |         3 |
| 2009-01-02 |         2 |
| 2009-01-03 |         1 |
| 2009-01-04 |         0 |
| 2009-01-05 |         2 |
| 2009-01-06 |         0 |
+------------+-----------+
</pre>
<p>As you can see, the gaps are nicely filled with 0 values now.</p>
<p>Care must be taken when filtering data from our table with the actual table. Suppose we want to see the number of downloads per day, but only those who were downloaded by registered persons. Normally, we'd do:</p>
<pre>
SELECT
  DATE(date) AS day,
  COUNT(id) AS downloads
FROM downloads
WHERE
  downloads.registered = 0 AND
  downloads.date BETWEEN '2009-01-01' AND '2009-01-06'
GROUP BY DAY(date) ;

+------------+-----------+
| day        | downloads |
+------------+-----------+
| 2009-01-01 |         2 |
| 2009-01-02 |         2 |
| 2009-01-05 |         2 |
+------------+-----------+
</pre>
<p>But with our new filler table, we can't do that, or the empty days will get filtered out again. So instead, we must put the filters not in the WHERE clause, but in the LEFT JOIN clause like so:</p>
<pre>
SELECT
  DATE(dates.date) AS day,
  COUNT(downloads.id) AS downloads
FROM dates
LEFT JOIN downloads ON downloads.date = dates.date AND downloads.registered = 0
WHERE
  dates.date BETWEEN '2009-01-01' AND '2009-01-06'
GROUP BY DAY(dates.date)

+------------+-----------+
| day        | downloads |
+------------+-----------+
| 2009-01-01 |         2 |
| 2009-01-02 |         2 |
| 2009-01-03 |         0 |
| 2009-01-04 |         0 |
| 2009-01-05 |         2 |
| 2009-01-06 |         0 |
+------------+-----------+
</pre>
<p>And presto! We have gap-fillers once again. Now it's easy to change the way our data is grouped. For example, if you want to group by week instead of day:</p>
<pre>
SELECT
  WEEK(dates.date, 3) AS week,
  COUNT(downloads.id) AS downloads
FROM dates
LEFT JOIN downloads ON downloads.date = dates.date AND downloads.registered = 0
WHERE WEEK(dates.date, 3) BETWEEN 1 AND 3
GROUP BY WEEK(dates.date, 3)

+--------+-----------+
| week   | downloads |
+--------+-----------+
|      1 |         4 |
|      2 |         2 |
|      3 |         0 |
+--------+-----------+
</pre>
<p>(Note: week 1 ends on Saturday January 3)</p>
<p><b>UPDATE:</b>: Make sure you always perform the <tt>COUNT()</tt> on a field from the <i>actual</i> data (the table you're LEFT JOINing), or you will get counts of 1 for data that actually has no rows!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2009/05/12/filling-gaps-in-data-when-using-aggregates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL WEEK() function weirdness</title>
		<link>http://www.electricmonk.nl/log/2009/04/13/mysql-week-function-weirdness/</link>
		<comments>http://www.electricmonk.nl/log/2009/04/13/mysql-week-function-weirdness/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 07:04:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4318</guid>
		<description><![CDATA[The other day I had to gather some statistics from a database, and the statistics needed to be grouped and filtered by weeknumbers. I ran into something a bit unexpected. Let&#039;s say we have the follow table definition: CREATE TABLE test ( start DATETIME NOT NULL ); We insert the following dates for testing purposes: [...]]]></description>
			<content:encoded><![CDATA[<p>The other day I had to gather some statistics from a database, and the statistics needed to be grouped and filtered by weeknumbers. I ran into something a bit unexpected.</p>
<p>Let&#039;s say we have the follow table definition:</p>
<pre>
CREATE TABLE test (
  start DATETIME NOT NULL
);
</pre>
<p>We insert the following dates for testing purposes:</p>
<pre>
INSERT INTO test VALUES ('2008-12-27');
INSERT INTO test VALUES ('2008-12-28');
INSERT INTO test VALUES ('2008-12-29');
INSERT INTO test VALUES ('2008-12-30');
INSERT INTO test VALUES ('2008-12-31');
INSERT INTO test VALUES ('2009-01-01');
INSERT INTO test VALUES ('2009-01-02');
INSERT INTO test VALUES ('2009-01-03');
INSERT INTO test VALUES ('2009-01-04');
INSERT INTO test VALUES ('2009-01-05');
</pre>
<p>Those dates span the last week of last year, and the first week of the new year. Now, let&#039;s see what happens when we select the weeknumber from this data using MySQL&#039;s <tt>WEEK()</tt> function:</p>
<pre>
mysql> <b>SELECT start, WEEK(start) FROM test;</b>
+---------------------+-------------+
| start               | WEEK(start) |
+---------------------+-------------+
| 2008-12-27 00:00:00 |          51 |
| 2008-12-28 00:00:00 |          52 |
| 2008-12-29 00:00:00 |          52 |
| 2008-12-30 00:00:00 |          52 |
| 2008-12-31 00:00:00 |          52 |
| 2009-01-01 00:00:00 |           0 |
| 2009-01-02 00:00:00 |           0 |
| 2009-01-03 00:00:00 |           0 |
| 2009-01-04 00:00:00 |           1 |
| 2009-01-05 00:00:00 |           1 |
+---------------------+-------------+
10 rows in set (0.00 sec)
</pre>
<p>As you can see, we get four different weeks for a timespam of only ten days! Apparently, MySQL counts the first days of the year that do not belong to week 1 as week 0. This was certainly not what I expected, as I&#039;m used to calendars that display the last days of the previous year and the first days of the new year as week 1.</p>
<p>MySQL&#039;s default <tt>WEEK()</tt> function could have caused serious data skew for me in this case. Fortunately, using the WEEK() function was too slow (as it had to calculate the weeknumber for each row in the result due to a <tt>WHERE WEEK(column) BETWEEN x AND y</tt> clause in my query), so we calculated the weeknumber using Unix timestamps ourselves. That&#039;s when we found the error. When using the WHERE clause mentioned, and the above data, we could have gotten:</p>
<pre>
mysql> <b>SELECT * FROM test WHERE WEEK(start) BETWEEN 1 AND 52;</b>
+---------------------+
| start               |
+---------------------+
| 2008-12-27 00:00:00 |
| 2008-12-28 00:00:00 |
| 2008-12-29 00:00:00 |
| 2008-12-30 00:00:00 |
| 2008-12-31 00:00:00 |
| 2009-01-04 00:00:00 |
| 2009-01-05 00:00:00 |
+---------------------+
7 rows in set (0.00 sec)
</pre>
<p>Only 7 rows are returned, while we should have gotten 10.</p>
<p>It turns out that MySQL&#039;s <tt>WEEK()</tt> function can operate in eight different ways. From <a href="http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_week">the manual</a>:</p>
<blockquote><p>
<tt>WEEK(date[,mode])</tt></p>
<p>This function returns the week number for date. The two-argument form of WEEK() allows you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0  to 53 or from 1 to 53. If the mode argument is omitted, the value of the default_week_format system variable is used.</p>
<p>The following table describes how the mode argument works.</p>
<pre>
+--------------------------------------------------------------------+
| Mode | First day of week | Range | Week 1 is the first week ...    |
|------+-------------------+-------+---------------------------------|
| 0    | Sunday            | 0-53  | with a Sunday in this year      |
|------+-------------------+-------+---------------------------------|
| 1    | Monday            | 0-53  | with more than 3 days this year |
|------+-------------------+-------+---------------------------------|
| 2    | Sunday            | 1-53  | with a Sunday in this year      |
|------+-------------------+-------+---------------------------------|
| 3    | Monday            | 1-53  | with more than 3 days this year |
|------+-------------------+-------+---------------------------------|
| 4    | Sunday            | 0-53  | with more than 3 days this year |
|------+-------------------+-------+---------------------------------|
| 5    | Monday            | 0-53  | with a Monday in this year      |
|------+-------------------+-------+---------------------------------|
| 6    | Sunday            | 1-53  | with more than 3 days this year |
|------+-------------------+-------+---------------------------------|
| 7    | Monday            | 1-53  | with a Monday in this year      |
+--------------------------------------------------------------------+
</pre>
</blockquote>
<p>So what we really wanted was the <tt>WEEK(column, 3);</tt> mode:</p>
<pre>
mysql> <b>SELECT start, WEEK(start, 3) FROM test;</b>
+---------------------+----------------+
| start               | WEEK(start, 3) |
+---------------------+----------------+
| 2008-12-27 00:00:00 |             52 |
| 2008-12-28 00:00:00 |             52 |
| 2008-12-29 00:00:00 |              1 |
| 2008-12-30 00:00:00 |              1 |
| 2008-12-31 00:00:00 |              1 |
| 2009-01-01 00:00:00 |              1 |
| 2009-01-02 00:00:00 |              1 |
| 2009-01-03 00:00:00 |              1 |
| 2009-01-04 00:00:00 |              1 |
| 2009-01-05 00:00:00 |              2 |
+---------------------+----------------+
10 rows in set (0.00 sec)
</pre>
<p>So, take care when using MySQL&#039;s <tt>WEEK()</tt> function, and always make sure to read the manual on all the functions you use at least once.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2009/04/13/mysql-week-function-weirdness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MyQryReplayer</title>
		<link>http://www.electricmonk.nl/log/2009/03/29/myqryreplayer/</link>
		<comments>http://www.electricmonk.nl/log/2009/03/29/myqryreplayer/#comments</comments>
		<pubDate>Sun, 29 Mar 2009 10:33:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[libre software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4138</guid>
		<description><![CDATA[I&#039;ve written a tool called MyQryReplayer: MyQryReplayer is a tool which can read the MySQL query log and replay an entire session&#039;s worth of queries against a database (SELECT queries only by default). While doing so, it records the time each query took to run, and any queries that failed including their error messages. MyQryReplayer [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ve written a tool called MyQryReplayer:</p>
<blockquote><p>
MyQryReplayer is a tool which can read the MySQL query log and replay an entire session&#039;s worth of queries against a database (SELECT queries only by default). While doing so, it records the time each query took to run, and any queries that failed including their error messages. MyQryReplayer can be used to inspect query performance, and to check a log of queries against a database for possible errors (when upgrading to a new version of MySQL for example).
</p></blockquote>
<p>Get version 0.1 <a href="http://www.electricmonk.nl/Programmings/MyQryReplayer">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2009/03/29/myqryreplayer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL: Remove duplicate rows</title>
		<link>http://www.electricmonk.nl/log/2008/01/22/sql-remove-duplicate-rows/</link>
		<comments>http://www.electricmonk.nl/log/2008/01/22/sql-remove-duplicate-rows/#comments</comments>
		<pubDate>Tue, 22 Jan 2008 13:21:45 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/2008/01/22/sql-remove-duplicate-rows/</guid>
		<description><![CDATA[In the post SQL: Find duplicate rows I explained how you can find duplicate rows in a table. However, if you want to delete them, you&#039;ll run into a little problem as you have to delete all the duplicate rows except for one. Lots of solutions are floating around on the Internet, but most either [...]]]></description>
			<content:encoded><![CDATA[<p>In the post <a href="http://www.electricmonk.nl/log/2007/07/24/sql-find-duplicate-rows/">SQL: Find duplicate rows</a> I explained how you can find duplicate rows in a table. However, if you want to delete them, you&#039;ll run into a little problem as you have to delete all the duplicate rows except for one. Lots of solutions are floating around on the Internet, but most either don&#039;t work at all or don&#039;t work with MySQL. </p>
<p><b>Temporary table</b></p>
<p>One solution that works with MySQL is to create a temporary table to hold the data without duplicates and using the SELECT described in the post mentioned above (without the HAVING clause) and then dropping the original table and recreating it. </p>
<p>Suppose we have a table &#039;Foo&#039; with three columns:</p>
<pre>
Foo:
id int(8) auto_increment
field1 int(8)
field2 int(8)
</pre>
<p>&#039;field1&#039; and &#039;field2&#039; don&#039;t have a unique constraint, so they might contain duplicates. To remove these duplicates, we do:</p>
<pre>
CREATE TEMPORARY TABLE tmp
  SELECT min(id) as min, field1, field2
  FROM Foo
  GROUP BY Foo.field1, Foo.field2;

DROP TABLE Foo;

CREATE TABLE Foo
  SELECT * FROM tmp;
</pre>
<p>This will fill the &#039;tmp&#039; table with unique rows from table &#039;Foo&#039;, and uses the lowest &#039;id&#039; value for each row. So if we have two rows:</p>
<pre>
id  field1  field2
------------------
1   hi      bye
2   hey     see ya
3   hello   goodbye
4   hi      bye
</pre>
<p>We will end up with:</p>
<pre>
id  field1  field2
------------------
1   hi      bye
2   hey     see ya
3   hello   goodbye
</pre>
<p><b>In place</b></p>
<p>The above solution works well, but for some reason I don&#039;t like it. I don&#039;t like making temporary tables. I want to do it with a single query, modifying the table in place. You might call it pigheaded, as the solution presented below doesn&#039;t work as well on large datasets as the one above, but still, I found a way.</p>
<p>Another way to remove duplicates in MySQL is by using the following query:</p>
<pre>
DELETE
  Foo
FROM
  Foo, Foo t2
WHERE
  Foo.field1=t2.field1 AND
  Foo.field2=t2.field2 AND
  Foo.id < t2.id
</pre>
<p>There we go. Explanation: This first does a Cartesion Join on every row where the two fields 'field1' and 'field2' are the same. So if we start with this data:</p>
<pre>
1 A B  <- duplicate
2 A B  <- duplicate
3 A C
</pre>
<p>we get the Cartesian product (all possible combinations) for the 'id' column for every unique set of the 'field1' and 'field2' columns:</p>
<pre>
1 A B 1 A B
1 A B 2 A B
2 A B 1 A B
2 A B 2 A B
3 A C 3 A C
</pre>
<p>Next we say 'AND Foo.id < t2.id'. This will leave us with following row, which is the only one where the left id is smaller than the right id:</p>
<pre>
1 A B 2 A B
</pre>
<p>It then DELETE's that row from Foo (that is, the row '1 A B' - remember that the two rows with left id '2' are actually one and the same row), leaving:</p>
<pre>
2 A B
3 A C
</pre>
<p>There we go. If you want to keep the smallest 'id' for each unique set of 'field1' and 'field2', simple use 'AND Foo.id > t2.id'. It works the same way.</p>
<p>Remember that this is incredibly slow for large tables due to the Cartesion Join, unless you've got indexes on 'field1' and 'field2' and don't have a lot of duplicates. Otherwise, use the temporary table solution above.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2008/01/22/sql-remove-duplicate-rows/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Unexpected SQL Injection</title>
		<link>http://www.electricmonk.nl/log/2007/09/29/4001/</link>
		<comments>http://www.electricmonk.nl/log/2007/09/29/4001/#comments</comments>
		<pubDate>Sat, 29 Sep 2007 08:07:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/2007/09/29/4001/</guid>
		<description><![CDATA[Something every PHP developer should be reading: The Unexpected SQL Injection &#8211; When Escaping Is Not Enough The conclusions: Write properly quoted SQL: Single quotes around values (string literals and numbers) Backtick quotes around identifiers (databases, tables, columns, aliases) Properly escape the strings and numbers: mysql_real_escape_string() for all values (string literals and numbers) intval() for [...]]]></description>
			<content:encoded><![CDATA[<p>Something every PHP developer should be reading: </p>
<p><a href="http://webappsec.org/projects/articles/091007.shtml">The Unexpected SQL Injection &#8211; When Escaping Is Not Enough</a></p>
<p>The conclusions: </p>
<p><blockqoute></p>
<ul style="list-style-type:lower-roman;">
<li>Write properly quoted SQL:</li>
<ol>
<li>Single quotes around values (string literals and numbers)</li>
<li>Backtick quotes around identifiers (databases, tables, columns, aliases)</li>
</ol>
<li>Properly escape the strings and numbers:</li>
<ol>
<li>mysql_real_escape_string() for all values (string literals and numbers)</li>
<li>intval() for all number values and the numeric parameters of LIMIT</li>
<li>Escape wildcard/regexp metacharacters (addcslashes(&#039;%_&#039;) for LIKE, and you better avoid REGEXP/RLIKE)</li>
<li>If identifiers (columns, tables or databases) or keywords (such as ASC and DESC) are referenced in the script parameters, make sure (and force) their values are chosen only as one of an explicit set of options</li>
<li>No matter what validation steps you take when processing the user input in your scripts, always do the escaping steps before issuing the query. Validation is not a substitute for escaping!</li>
</ol>
</ul>
<p></blockqoute></p>
<p>Like my rule #1 of what I like to call Defensive Coding: <i>Don&#039;t be implicit, be explicit</i>. In other words, don&#039;t try to escape things you <i>don&#039;t</i> want in your strings, simply only leave everything you <i>do</i> want in your strings. A column name in a ORDER BY clause should only consist of A-Z, a-z and 0-9. Anything else in the string invalidates that string.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2007/09/29/4001/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Very slow subquery due to HAVING</title>
		<link>http://www.electricmonk.nl/log/2007/09/12/very-slow-subquery-due-to-having/</link>
		<comments>http://www.electricmonk.nl/log/2007/09/12/very-slow-subquery-due-to-having/#comments</comments>
		<pubDate>Wed, 12 Sep 2007 11:30:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/2007/09/12/very-slow-subquery-due-to-having/</guid>
		<description><![CDATA[Today, I was quite mystified by a very slow running query. There was a table named &#039;bar&#039; with about 3000 rows. I wanted to list all the rows that had a duplicate value for a certain field (&#039;foo&#039;), and only those rows. The solution was to build a query that selected the rows where the [...]]]></description>
			<content:encoded><![CDATA[<p>Today, I was quite mystified by a <em>very</em> slow running query. There was a table named &#039;bar&#039; with about 3000 rows. I wanted to list all the rows that had a duplicate value for a certain field (&#039;foo&#039;), and only those rows. The solution was to build a query that selected the rows where the value of the &#039;foo&#039; field was in the results of a subquery that selected &#039;foo&#039; for duplicate values of foo. The query finally looked something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> 
foo
<span style="color: #993333; font-weight: bold;">FROM</span> bar
<span style="color: #993333; font-weight: bold;">WHERE</span> foo <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span>
  <span style="color: #993333; font-weight: bold;">SELECT</span> 
  foo
  <span style="color: #993333; font-weight: bold;">FROM</span> bar  
  <span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> foo
  <span style="color: #993333; font-weight: bold;">HAVING</span> <span style="color: #993333; font-weight: bold;">COUNT</span><span style="color: #66cc66;">&#40;</span>foo<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #cc66cc;">1</span>
<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>The inner query (explained <a href="http://www.electricmonk.nl/log/2007/07/24/sql-find-duplicate-rows/">in this post</a>) was very fast, and returned only two rows. The outer query, when I ran it like this:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> foo <span style="color: #993333; font-weight: bold;">FROM</span> bar <span style="color: #993333; font-weight: bold;">WHERE</span> foo <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>also ran very fast. However the combination of the two was extremely slow. I thought this was weird, since there were only two results in the inner query. A colleague of mine and me took a look at the EXPLAIN of the query, and found out it was actually doing a full join of 3000&#215;3000 rows. The use of HAVING threw me off because it appeared in the inner join. But HAVING is always applied very late in the execution process, just before the results are sent to the client. This means MySQL doesn&#039;t even look at the HAVING to optimize queries. From the manual:</p>
<blockquote><p>
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization.
</p></blockquote>
<p>Putting an index on the &#039;foo&#039; column solved the speed problem, though it&#039;s still not as fast as it could be because it&#039;s still doing a JOIN on &#039;foo&#039; with itself, only this time only doing 2&#215;3000 rows.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2007/09/12/very-slow-subquery-due-to-having/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vim and PHP: tips</title>
		<link>http://www.electricmonk.nl/log/2007/08/31/vim-and-php-tips/</link>
		<comments>http://www.electricmonk.nl/log/2007/08/31/vim-and-php-tips/#comments</comments>
		<pubDate>Fri, 31 Aug 2007 15:01:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[vim]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/2007/08/31/vim-and-php-tips/</guid>
		<description><![CDATA[I&#039;ve been using Vim for years now, but there&#039;s still new stuff to learn. Check out this page for the PDF version of the slides of a talk given by Andrei Zmievski on editing PHP with Vim. His configuration files are also available. Here&#039;s my favourite list of tips: Add the following text to your [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ve been using <a href="http://www.vim.org">Vim</a> for years now, but there&#039;s still new stuff to learn. Check out <a href="http://www.gravitonic.com/blog/archives/000357.html">this page</a> for the <a href="http://www.gravitonic.com/do_download.php?download_file=talks/vancouver-2007/vim-for-php-programmers.pdf">PDF version of the slides</a> of a talk given by Andrei Zmievski on editing <a href="http://www.php.net">PHP</a> with Vim. His <a href="http://www.gravitonic.com/do_download.php?download_file=other/andrei-vim-files.tar.gz">configuration files</a> are also available.</p>
<p>Here&#039;s my favourite list of tips:</p>
<p>Add the following text to your <tt>~/.vim/ftplugin/php.vim</tt> file:</p>
<pre>
set formatoptions+=tcqlro
let php_sql_query=1
let php_htmlInStrings=1
let php_folding = 1
</pre>
<p>This will:</p>
<ul>
<li>Turn on automatic text formatting for PHP so that, for instance, Vim automatically inserts a &#039;*&#039; if you press enter inside a <tt>/* */</tt> comment.</li>
<li>Makes Vim highlight SQL queries in strings.</li>
<li>Makes VIm highlight HTML in strings.</li>
<li>Allows folding on PHP classes and functions. (With the cursor on the first line of a function, press <tt>z-c</tt> to hide the function. <tt>z-o</tt> to show it again. (<b>C</b>lose and <b>O</b>pen the fold)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2007/08/31/vim-and-php-tips/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

