Archive for the 'Database' Category

PHP Tag Cloud Remove Common Words

Friday, July 11th, 2008

If you have an application that allows a user to enter arbitrary tags for an entity, you might want to filter their tag input based on certain criteria. This tutorial assumes that you will be working with a space-separated list of tags, e.g. from a form input field. If your tag input is coming from an array, you can try $tags = implode(' ', $tag_array); to prepare it for the rest of the code presented here.

Firstly, we’ll cover the tag filter. A list of the most common English words was compiled by merging data from the following resources:
http://en.wikipedia.org/wiki/Most_common_words_in_English
http://www.askoxford.com/oec/mainpage/oec02/?view=uk
http://esl.about.com/library/vocabulary/bl1000_list1.htm
http://www.deafandblind.com/word_frequency.htm

We can look at the whole list later for reference, but to make an effective tag filter, many had to be removed by hand. Additionally, this script disregards words with less than three letters, so those were removed as well. Many search applications, including the one that inspired this coding, won’t mess with those. Here is a space-separated list of the common English-language words to be filtered:

able about after again all also and any are bad been before being between but came can cause change come could did differ different does don't down each end even every far few for form found four from get good great had has have her here him his how into its just keep let many may might more most much must near need never new next not now off one only other our out over part put said same say seem set should side some still such take than that the their them then there these they thing this three through too two upon use very was way went were what when where which while who will with would you your

Let’s use the following example for a space-separated tag string:

fluffy is freeze you Rocket don't cute boulder fry

The following PHP code will filter out the common words as well as words that contain less than three letters. It only pulls strings with alpha characters, and additionally converts all tags to lowercase:

$tag_filter = array('able', 'about', 'after', 'again', 'all', 'also', 'and', 'any', 'are', 'bad', 'been', 'before', 'being', 'between', 'but', 'came', 'can', 'cause', 'change', 'come', 'could', 'did', 'differ', 'different', 'does', 'don', 'down', 'each', 'end', 'even', 'every', 'far', 'few', 'for', 'form', 'found', 'four', 'from', 'get', 'good', 'great', 'had', 'has', 'have', 'her', 'here', 'him', 'his', 'how', 'into', 'its', 'just', 'keep', 'let', 'many', 'may', 'might', 'more', 'most', 'much', 'must', 'near', 'need', 'never', 'new', 'next', 'not', 'now', 'off', 'one', 'only', 'other', 'our', 'out', 'over', 'part', 'put', 'said', 'same', 'say', 'seem', 'set', 'should', 'side', 'some', 'still', 'such', 'take', 'than', 'that', 'the', 'their', 'them', 'then', 'there', 'these', 'they', 'thing', 'this', 'three', 'through', 'too', 'two', 'upon', 'use', 'very', 'was', 'way', 'went', 'were', 'what', 'when', 'where', 'which', 'while', 'who', 'will', 'with', 'would', 'you', 'your', );

$tags = 'fluffy is freeze you rocket don\'t cute boulder fry';
preg_match_all('/([a-zA-Z]{3,})/', $tags, $matches);
$matches[0] = array_map('strtolower', $matches[0]);
$tags = array_diff($matches[0], $tag_filter);

The $tags array would then be filtered and lowercased, producing the following output with print_r($tags):

Array
(
    [0] => fluffy
    [1] => freeze
    [3] => rocket
    [5] => cute
    [6] => boulder
    [7] => fry
)

If you need to convert it from the array back to a space-separated string, try $tags = implode(' ', $tags);. You may also of course add more words to the word list — that could come in handy with other application-specific functions such as cursing filters.

Here is the full list of common words merged from the above-stated resources, separated by spaces, including those with less than three letters:

a able about act add after again air all also am an and animal answer any are as ask at back bad be been before being between big boy build but by call came can case cause change child city close come company could country cover cross day did differ different do does don't down draw each early earth end even every eye fact far farm father feel few find first follow food for form found four from get give go good government great group grow had hand hard has have he head help her here high him his home hot house how i if important in into is it its just keep kind know land large last late learn leave left let life light like line little live long look low made make man many may me mean men might more most mother move mr mrs much must my name near need never new next night no north not now number of off office old on one only or other our out over own page part people person picture place plant play point port press problem public put read real right round run said same saw say school sea see seem self sentence set she should show side small so some sound spell stand start state still story study such sun take tell than that the their them then there these they thing think this thought three through time to too tree try turn two under up upon us use very want was water way we week well went were what when where which while who why will with woman word work world would write year you young your

Great Circle Distance Function for Oracle

Thursday, March 29th, 2007

The Distance Learning folks wanted a provision to allow users to search for site locations based on the distance from a specified zip code. I did some research and found a few resources. Most of things I found use a single SQL statement, but that didn’t suit our needs so I made it into an Oracle function:

create or replace function gc_dist (lat1 IN number, lon1 IN number, lat2 IN number, lon2 IN number) RETURN number

is

A_lat number;
A_lon number;
B_lat number;
B_lon number;
delta_lat number;
delta_lon number;
pi number;
earth_radius number;
gcdist number;
distance number;

begin

pi := 3.14159265358979323;
earth_radius := 3963.189;

A_lat := (lat1 * pi) / 180;
A_lon := (lon1 * pi) / 180;
B_lat := (lat2 * pi) / 180;
B_lon := (lon2 * pi) / 180;

delta_lat := A_lat - B_lat;
delta_lon := A_lon - B_lon;

-- find great circle distance
gcdist := power(sin(delta_lat / 2), 2) + cos(A_lat) * cos(B_lat) * power(sin(delta_lon / 2), 2);

distance := ROUND(earth_radius * 2 * atan2(sqrt(gcdist),sqrt(1 - gcdist)), 1);

return distance;

end;

I found the “exact” Earth radius on Wikipedia. phpZipLocator was used to get it working with the site. I had to modify the functions a little to suit the specific needs of the application.

Using something like this requires a database of zip codes along with the corresponding latitude and longitude for each. One can be found on the phpZipLocator site, but I don’t think I ended up using that one. I did a lot of research to find the best free one, but unfortunately I don’t remember which one was used (here’s another). As far as commercial solutions, I thought this one looks like one of the better ones. I’ll likely get back into it at some point and look this stuff up again.

Here’s an example usage of the function:

  1. Extract the latitude and longitude from the given zipcode:SELECT lat, lon FROM zipcodes WHERE zip = :ZIP“:ZIP” is the specified zipcode and “zipcodes” is the name of the zipcode table.
  2. Build the first part of the query sans the WHERE clause. In the WHERE clause we’ll specify the radiusSELECT gc_dist(zipcodes.lat, zipcodes.lon, :LAT, :LON) AS distance FROM zipcodes“:LAT” and “:LON” represent the returned values from the first query
  3. Use phpZipLocator’s inradius function to build the WHERE clause that will restrict the results within the specified radius of the specified zipcode. Here is what’s inside the function (I don’t remember what I modified in the function so I’ll just paste the main part):WHERE (POWER((69.1 * (lon - :LON) * COS(:LAT / 57.3)), 2) + POWER((69.1 * (lat - :LAT)), 2)) < (:RADIUS * :RADIUS);“:LON” and “:LAT” were found in the first query, and “:RADIUS” represents the specified radius in miles.
  4. Put them together and run the query!
NOTE: This post is for my IT450 journal; we are required to write database-related journals and submit the URL at the end of the semester.

I Got to Help Install ODU’s Google Mini!

Friday, March 23rd, 2007

So ODU recently purchased a shiny new Google Search Appliance. We’re talking $10,000 worth of web-searching power. It came in a box with the Google logo on the site. Most rack server machines look the same — plain black rectagles in a shape that could probably be used as a food tray. If you look through the racks in the ODU server room you won’t see much variance in the mounted machines.

In comes the Google Mini. Needless to say, its shade of cerulean blue really sticks out in the racks. When you first fire it up it takes about 10 minutes to boot. Once it’s booted you interface it with a laptop using a crossover cable (it comes with some cables). We used my work laptop to set it up. I got to type in the configuration settings for it, but unfortunately I had to pass the computer to David when it came time to set the password :-(
The Google Appliance will soon replace the current Google Custom Search the site is using. Having an in-house search appliance will exponentially increase the flexibility, alowing us to tightly integrate current databases such as those for faculty directories, courses and calenders. The possibilities are endless. The operating system internals of the Google Mini are locked down pretty tightly; they still don’t want people finding out how their actual search algorithms work or what kind of database system they are using.

I took some pictures of the installation process and will probably post them at some point.

It came with a Google T-Shirt! At the next Web Group meeting they’re going to announce who gets it. *crosses fingers*

Google Mini
NOTE: This post is for my IT450 journal; we are required to write database-related journals and submit the URL at the end of the semester.

SQL Injection

Monday, March 5th, 2007

A major concern when developing database-driven websites is SQL Injection. Poorly written code could result in a malicious user running custom queries from his or her browser! Obviously the effects of that sort of security hole could be devasting for a web application.

Since I deal mostly with PHP/MySQL/Oracle, I’ll just be talking about PHP/MySQL/Oracle.

PHP has a built-in MySQL function called mysql_real_escape_string(). If that’s the injection-prevent method you go with, any time user-provided data from a URL query string is going into a database query it should be passed through that function.

If you’re using Oracle, the best and most elegant bet is to take advantage of the query binding features.

An example from the Wikipedia article on SQL injection (modified to fit the Oracle syntax):

$query = $sql->prepare("select * from users where name = :NAME");
$query->execute($user_name);

The execute member function here takes the value of $user_name and “binds” it to “:NAME” in the query. In other words it replaces the data stored in $user_name with the bind variable “:NAME.” However, before dumping it into the query it makes sure that the data in $user_name is not SQL. Regardless of the contents it treats it as query-safe data. When using native Oracle binds there is no need to use other functions such as mysql_real_escape_string().

NOTE: This post is for my IT450 journal; we are required to write database-related journals and submit the URL at the end of the semester.

Google Base

Wednesday, February 28th, 2007

Google to me is practically synonymous with innovation. It’s as if they ignore what other companies are doing and just constantly try to roll out new ideas. Give it about a few months later and other sites like Yahoo! and MSN will be trying to keep up. Point in case: Google Maps comes out (followed by a sudden exitement about AJAX techonology). Now have a look at Yahoo!Maps and Microsoft’s Live.com Maps. It’s kind of upsetting that rather than doing much innovating these companies are just copying Google’s ideas. Microsoft went so far as to use a very similar AJAX system! Yahoo!’s uses flash (which I don’t think is much fun or as impressive as Google’s using AJAX).

Anyway, the point of this post is Google Base. I guess the aforementioned things will eventually be related since it’s only a matter of time before the concept is copied by another company.

It’s a simple database that anyone can utilize via the web! With Google’s philosophy and goal of cataloging knowledge and data, it’s no surprise that the user-provided data can in some cases be searched via certain Google searches. Users can enter any kind of data about things. First you choose a category, then you add items and choose your own attributes! It’s just like creating a regular database but it’s set up in a way where the average user can easily figure it out. If a predefined category is chosen then attributes are automatically added which makes it even easier. You can add images and AdWords as well.

I posted a “Swingline Stapler” as a test!

ALL YOUR GOOGLE BASE ARE BELONG TO US!

NOTE: This post is for my IT450 journal; we are required to write database-related journals and submit the URL at the end of the semester.

SQL Developer

Tuesday, January 23rd, 2007

If you’ve worked with MySQL, you’ve probably worked with phpMyAdmin. If you’ve dug around for other tools, you may have worked with the MySQL Query Browser or even MySQL Administrator. I’m going to go out on a limb here and say if you have, you probably haven’t wanted to go back to CLI. These tools are all high-quality and free, and that certainly adds to their appeal.

So what is an Oracle administrator to do?

SQL Developer. I’m fairly confident that there are no remotely comparable tools available for free. It uses the JAVA VM and runs on Windows, Mac, and Linux.

It turns a mess like this…

CREATE TABLE employees_demo
    ( employee_id    NUMBER(6)
    , first_name     VARCHAR2(20)
    , last_name      VARCHAR2(25)
         CONSTRAINT emp_last_name_nn_demo NOT NULL
    , email          VARCHAR2(25)
         CONSTRAINT emp_email_nn_demo     NOT NULL
    , phone_number   VARCHAR2(20)
    , hire_date      DATE  DEFAULT SYSDATE
         CONSTRAINT emp_hire_date_nn_demo  NOT NULL
    , job_id         VARCHAR2(10)
       CONSTRAINT     emp_job_nn_demo  NOT NULL
    , salary         NUMBER(8,2)
       CONSTRAINT     emp_salary_nn_demo  NOT NULL
    , commission_pct NUMBER(2,2)
    , manager_id     NUMBER(6)
    , department_id  NUMBER(4)
    , dn             VARCHAR2(300)
    , CONSTRAINT     emp_salary_min_demo
                     CHECK (salary > 0)
    , CONSTRAINT     emp_email_uk_demo
                     UNIQUE (email)
    )
   TABLESPACE example
   STORAGE (INITIAL     6144
            NEXT        6144
            MINEXTENTS     1
            MAXEXTENTS     5 );

…into a few clicks and column naming. It visualizes pretty much every aspect of your databases without having to worry about endless CLI commands. It’s got a SQL editor so you can test your statements before implementing them in your applications. Here’s a screenshot (click to enlarge):

NOTE: This post is for my IT450 journal; we are required to write database-related journals and submit the URL at the end of the semester.

Fairly Complex SQL Query

Tuesday, January 16th, 2007

This is probably one of the most complex SQL statement I’ve written. It’s used by ODU Distance Learning to generate a CSV report of programs and locations for admissions. Some of these operations could have been done in PHP… that would have been easier, but I think this way is neat because it returns everything exactly as it needs to go in the report, “just add commas” (with PHP)

SELECT s.name AS site,
       s.site_code,
       (m.name ||
        CASE WHEN p.concentration IS NOT NULL
             THEN ' - ' || p.concentration
             END ||
        CASE WHEN d.abbr IS NOT NULL
             THEN ' (' || CASE WHEN d2.abbr IS NULL
                               THEN d.abbr
                               ELSE d.abbr || '/' || d2.abbr
                          END || ')'
        END) AS prog_name,
       pd.banner_code AS pd_code,
       pl.name AS program_level,
       (CASE WHEN d.abbr IS NOT NULL
             THEN CASE WHEN d2.abbr IS NULL
                       THEN d.abbr
                       ELSE d.abbr || '/' || d2.abbr
                  END
        END) AS degree,
       m.name AS major,
       m.banner_code AS maj_code,
       p.concentration
FROM db.program_data pd
     LEFT OUTER JOIN db.programs p ON p.id = pd.programs_id
                                  AND pd.type = :SYS
     LEFT OUTER JOIN db.degrees d ON d.id = p.degrees_id
     LEFT OUTER JOIN db.programs p2 ON p2.id = pd.programs_id2
     LEFT OUTER JOIN db.degrees d2 ON d2.id = p2.degrees_id
     INNER JOIN db.program_levels pl ON p.program_levels_id = pl.id
     INNER JOIN db.majors m ON p.majors_id = m.id
     INNER JOIN db.r_programs2sites p2s ON pd.id = p2s.programs_id
     INNER JOIN db.sites s ON p2s.sites_id = s.id
ORDER BY site, program_level ASC, major ASC, concentration ASC

Here’s the PHP code that formats the data as Excel-compatible CSV. There are a couple things about it that could stand to be changed, but it works:

echo 'SITE NAME, SITE CODE, PROGRAM NAME, PROGRAM BANNER CODE, LEVEL, DEGREE, MAJOR, MAJOR BANNER CODE, CONCENTRATION' . "\\n";
while (!$rs->EOF) {
	for ($i = 0; $i < 10; $i++) {
		if (strpos($rs->fields[$i], ',') > 0) {
			if (strpos($rs->fields[$i], '"') > 0) {
				$rs->fields[$i] = str_replace('"', '""', $rs->fields[$i]);
			}
			$rs->fields[$i] = '"' . $rs->fields[$i] . '"';
		}
		echo $rs->fields[$i] . ($i != 9 ? ',' : "\\n");
	}
	$rs->MoveNext();
}
NOTE: This post is for my IT450 journal; we are required to write database-related journals and submit the URL at the end of the semester.