Ranked Searches with SQL
Date : 2007 10 24 Category : Design & UsabilityCreating some sort of search functionality is something we all have to do at one point or another. For the good cases, we have fulltext search or an external program available to us. But sometimes, getting down and dirty with plain SQL is the only option. When going the SQL route, I often see most of the ranking logic going on in the PHP still, like so:
$sql = 'SELECT * FROM Posts WHERE Body LIKE "%term%"'; $rs = $db->query($sql); while($row = mysql_fetch_array($rs, MYSQL_ASSOC)) { $count = substr_count($row['Body'], "term"); // push count into an array } // array has all records. order by count then displayOver the weekend I was playing around with doing all of this in SQL. This immediately became a pain because there is no easy way to count substrings, but what we can do is:
count = (strlen(Body) - strlen(str_replace(Body, term)))/strlen(term)
Let’s plug in some numbers.
Body: Hello, he said. What’s her name? Term: he Count: (32 - 26)/2 = 6/2 = 3And then we can translate that to SQL:
SELECT SUM(((LENGTH(Body) - LENGTH(REPLACE(Body, 'term', '')))/4)) AS OccurrencesNow that we know how to find the amount of occurrences of one string within another, we can build up basic rankings by including groupings, multiple terms, and joins. Below is an example of a query that searches one row across multiple columns and returns them from best match to worst.
SELECT SUM(((LENGTH(Body) - LENGTH(REPLACE(Body, 'term', '')))/4) + ((LENGTH(Body) - LENGTH(REPLACE(Body, 'search', '')))/6)) AS Occurrences FROM Posts GROUP BY PostId ORDER BY Occurrences DESC