Shrewdness: 2009

Monday, May 4, 2009

Ruby Hashes and Default Values

I’m currently experimenting with MacJournal, so this will be a short entry to test the publishing functionality and blogger integration.

Ruby’s Hash/Associative array has some great ways to default values. All of this is done in the Hash constructor:

First, we create a new hash (h) that returns nil for the default value. This is done by passing nothing to the Hash constructor.
>> h = Hash.new
=> {}
>> h[5]
=> nil
>> h["eggs"]
=> nil

No surprises here. Now we need to default a Hash value. We’ll start by creating the Hash and setting the default value to an empty list [].
>> h = Hash.new []
=> {}
>> h[5]
=> []
>> h["eggs"]
=> []

Now the ‘nugget’.
>> h
=> {}

h is STILL empty. The hash only provides us the default value, it does not insert it back in the Hash for us.

‘Nugget’ 2.
The default value that is returned is NOT cloned.
>> h[5] << "spam"
=> ["spam"]
>> h["eggs"]
=> ["spam"]

Our default value has changed! We can also demonstrate this with a string object:
>> h = Hash.new "spam"
=> {}
>> h["eggs"]
=> "spam"
>> h["eggs"] << " is yummy!"
=> "spam is yummy!"
>> h["shampoo"]
=> "spam is yummy!"

You can force a Hash to create a new default value every time if you pass it a block in the constructor. The blocks parameters are the Hash itself and the key for which a default value needs to be created. This form is by far the most flexible as the default value can even be customized for the key, it doesn’t have to be the same one every time.
>> h = Hash.new {|h, k| h[k] = []}
=> {}
>> h[5]
=> []
>> h["eggs"]
=> []
>> h
=> {5=>[], "eggs"=>[]}
>> h[5] << "spam"
=> ["spam"]
>> h["eggs"]
=> []
>> h
=> {5=>["spam"], "eggs"=>[]}

Hope that saves you the 10 minutes it took me to figure this out. It is also detailed in the ruby doc for Hash. Guess I get the RTFRD award for the evening.

Saturday, April 11, 2009

red_eye

I was going to write about OAuth for this entry, but turns out... it's broken.
So, next best thing. I stumbled across the Google Maps Geocoding API a few weeks ago. This is a REST service that is responsible for taking the text you type in the Google Maps search field and making sense of it. Google makes it clear that this service is to be used in conjunction with Google Maps:

Note: the geocoding service may only be used in conjunction with displaying results on a Google map; geocoding results without displaying them on a map is prohibited. For complete details on allowed usage, consult the Maps API Terms of Service License Restrictions.

Before the actual API is detailed, there are some terms that should be known:

Geocoding: The process of taking a human readable address and transforming it into one or more latitude, longitude coordinates.
Reverse Geocoding: The process of taking a latitude, longitude coordinate and transforming it into one or more human readable addresses.
Viewport Biasing: Instructs the geocoder to place a bias on addresses within a provided bounding box. This can help sort out ambiguous addresses like Winnetka (Chicago, IL or Los Angeles, CA?).
Country Code Biasing: Instructs the geocoder to place a bias on addresses within a country. This can help sort out addresses like Toledo (Ohio or Spain?).

The REST interface is easy to understand. The base address is:

http://maps.google.com/maps/geo?

It supports the following query parameters (bold parameters are required):

q: The address or latitude that you want to geocode. "1600 Amphitheatre Parkway, Mountain View, CA" and "40.7142298,-73.9614669" are both valid.
key: Your Google Maps API key. You can get one for free here.
sensor: Does this request come from a device with a location sensor? Unless you are using a mobile device, this is probably false.
output: The format that your client wishes the response to be in. Valid values are: xml, kml, csv and json (default).
oe: The encoding format for the response. In general, this should be 'utf8'.
ll: Used for viewport biasing. This is the center (latitude, longitude) of the bounding box the geocoder is to give preference to. e.g. ll=34.197605,-118.546944
span: Another comma separated list that specifies the length of the sides of the bounding box used for viewport biasing. This only has meaning if the ll parameter is used. I'm not 100% sure what the units are but the example I have is span=0.247048,0.294914
gl: The ccTLD country code to use for country code biasing. See this article for legal values.

I wrote a ruby wrapper around this called red_eye that can be found at github.
Nothing super fancy. Just parses the JSON response and returns ruby structs for results. It also provides some smarts around the parameter handling as well as making them more human readable (e.g. :country vs. :gl). Here is an example of how to use it: (Note: This article assumes that you have red_eye installed to an accessible place, I've never tried to install this as a gem. I usually just fire up irb with a -I./lib from the red_eye root directory.)


require 'red_eye'

#Some useful methods for printing out results
def print_placemark placemark, header = "=========="

  puts header
  puts "#{placemark.address}"
  puts "#{placemark.point.lat},#{placemark.point.lng}"
  puts "Accuracy: #{placemark.accuracy}"
end

def print_result result

  puts "Success: #{result.success?}"
  result.placemarks.each_with_index do |placemark, idx|
    print_placemark placemark, "=========#{idx}========="
  end  
end

def perform_query geocoder, query, options = {}
  
  puts "Query URL:\n#{geocoder.to_url(query, options)}"
  print_result geocoder.geocode(query, options)  
end

#create the work horse.
geocoder = RedEye::GeoCoder.new

#let's find google HQ
perform_query geocoder, "1600 Amphitheatre Parkway, Mountain View, CA"
#reverse geocode
perform_query geocoder, RedEye::LatLng.new(:lat => 40.714224, :lng => -73.961452)

#setup a bias for viewport bias
center = RedEye::LatLng.new :lat => 34.197605, :lng => -118.546944
bias = RedEye::Bias.new :center => center, :span => [0.247048, 0.294914]

#viewport bias geocoding
perform_query geocoder, "Winnetka", :bias => bias
#country bias geocoding
perform_query geocoder, "Toledo", :country => 'es'

Sunday, April 5, 2009

Shrewdness?

Some of you may be wondering about the title of the blog. I'm not trying to be boastful and make up some 'ness'sized version of shrewd as it pertains to me. It is actually an aggregation of apes... no, really, see shrewdness and flock. Buy me a few beers and I 'might' be willing to tell how this pertains to me.

Thursday, April 2, 2009

Back to the Basics

Over the past several weeks of droning over Jira issues and implementing some custom extension points for IBM WebSphere Integration Developer it has become apparent that I have drifted away from some of the fundamentals that make up Computer Science 101. Over the next few months I'll attempt to blend a review of some of these core concepts with some actual 'contemporary' content.

The first thing I'm going to cover is some basic sorting algorithms. Sorting is one of those things that gets covered up nicely in java development with methods like Collections.sort(...). All the gory details remains hidden and you just get back your list sorted according to its elements natural ordering.

Some things to consider when choosing a sorting algorithm:

1. Computational Complexity

Big-O Notation. Most average case sorts can be run in O(n log n). There are a few O(n^2) but those should probably only be considered for small data sets or academic reasons.

2. Stability
The term stability as it applies to sorting refers to the final ordering of sorted elements that are equal. If the ordering remains the same, the sort is said to be stable. For example, the following x, y points are going to be sorted based on their x value:

Original:
(8, 3) (2, 5) (3, 7) (8, 9) (4, 4)

Sorted (Stable):
(2, 5) (3, 7) (4, 4) (8, 3) (8, 9)

Sorted (Unstable):
(2, 5) (3, 7) (4, 4) (8, 9) (8, 3)

The list is sorted, but there are 2 different correct answers!

3. Memory Usage

This of course depends on the implementation of your sort. Often times it is intuitively easier to comprehend and read the code by storing results in separate structures and returning those. It is however possible to implement most of the major sorts using only the original list or minimal external structures. These types of sorts are called "in-place". The 3 sorts presented in code are all implemented as in-place sorts.

The Sorts:

Bubble Sort
The bubble sort is a very simple sort and is probably only valuable for academic purposes or very small data sets. In a bubble sort, the list to sort is constantly iterated. Consecutive items are compared and swapped if out of order. When there are no swaps made on an iteration, the sort is complete, thus it is very fast on an already sorted list.

Something to note is that the largest unsorted element will be swapped into place with each pass. A slight performance improvement would be to decrement the end index in the main loop on each pass since the last item is always sorted. The bubble sort is stable, but on average runs in O(n^2) time. If there are 10 items in the list 100 comparisons must be made to sort it.

Merge Sort
The merge sort is another stable sort (in most implementations) and on average runs in O(n log n) time. It works by taking the input list, splitting it in half, recursively applying the merge sort to each sublist and merging the results back together. The merge sort simply walks down the provided 2 lists(a, b) and 'zips' them together in ascending order into a single list of length a.length+ b.length. There are both in-place and iterative versions of this sort. The example provided is in-place and recursive.

Quick Sort

The quick sort is another popular sort that runs between O(n^2) worst case and O(n log n) average case. It is not a stable sort. The quick sort works by choosing a value called the pivot value. The list is then broken into two separate lists, values <= pivot and values > pivot. The quick sort is then applied to the two sublists with the final result being [LT] + [PIVOT] + [GT]. Selection of the pivot value can affect the timing results. In general a pivot that represents the median value of all the elements in the input list is desirable. This helps split the [LT] and [GT] lists evenly. The tradeoff is that determining an optimal pivot value involves more computational cycles. In the source provided, a random pivot value is selected from the input list.

Results:

3 different data sets of 10000 items were used during the testing. A reverse ordered list, an ordered list and a list of random data (Note: the same random data is used for all tests). Each sort was run on each list 100 times. The following results were obtained:

Average Bubble Sort on Reverse Ordered List with 10000 items: 6372 ms.

Average Bubble Sort on Ordered List with 10000 items: 0 ms.

Average Bubble Sort on Random List with 10000 items: 5870 ms.

Average Merge Sort on Reverse Ordered List with 10000 items: 1427 ms.

Average Merge Sort on Ordered List with 10000 items: 2 ms.

Average Merge Sort on Random List with 10000 items: 725 ms.

Average Quick Sort on Reverse Ordered List with 10000 items: 9 ms.

Average Quick Sort on Ordered List with 10000 items: 8 ms.

Average Quick Sort on Random List with 10000 items: 10 ms.

This is where this entry ends. Analysis on the results would be prudent at some point, but it should be obvious that quick sort just wins, even with non-optimized pivot value selection. I tried to make a handy graph for this but effectively comparing 10 ms to 6372 ms proved a bit more than my google chart api skills could handle.

References:

Wikipedia

Source Code

Shrewdness