Archive for March, 2009

Erlang_logoInspired by Dave Thomas’ “A First Erlang” post I decided to use Erlang to retrieve stock quotes from Google’s Finance API.  I wrote a simple Erlang program that let me explore third party JSON libraries and Erlang’s http library.

In Erlang, the -module directive defines an Erlang module, which is how code is organized in Erlang, the -export directive tells Erlang which functions in this module to expose. My Erlang module is called quote, and I’m storing it in the file quote.erl. I’m also exposing a single function get_stock_quote which accepts a single parameter.

-module(quote).
-export([get_stock_quote/1]).

Next I’ll define an Erlang macro called BASE_URL which contains the base URL of the Google Finance API. The function get_google_url builds the full URL by appending the symbol to the base URL.

-define(BASE_URL, "http://www.google.com/finance/info?client=ig&q=".
get_google_url(Symbol) ->
  ?BASE_URL ++ Symbol.

After retrieving a stock quote for MSFT in our browser I noticed that the data  returned from Google Finance is a JSON(ish) object surrounded by some extraneous text which has to be removed before we can do anything else.  Later I’ll use an external library to convert the string into a JSON object and extract the price and date.

get_stock_quote(Symbol) ->
  %% Don't know why I need the following line
  %% mentioned in inets documentation
  inets:start(),
  URL = get_google_url(Symbol),
  { ok, {_Status, _Headers, Body }} = http:request(URL),
  PureData = lists:subtract(lists:subtract(Body, "// [ "), "] ").

The code to make http request is simple, http:request() returns the HTTP status, headers and body but the call was throwing an exception on my machine; after reading the http documentation included with Erlang I learned that a call inets:start() has to be made before making a http request.  The variable PureData contains the string which will transform into our JSON object, I had to use the lists:subtract function to remove extra characters from the beginning and end of the string.

After some searching (and cursing) I found a third party library, json_parser, on Process One’s Comprehensive Erlang Archive Network. I downloaded a copy of the development version, renamed the source file to json_parser.erl, compiled json_praser and referenced it from quotes.erl with the -import(json_parser) import directive.

The documentation for json_parser is fairly sparse—the dvm_parser function in the library returns a tuple with more data then I need and I’m only interested in the actual JSON data which I’ parsed into the RealData variable and passed onto the parse_json_tuple function which extracts the fields I’m interested in.

{_,{_,RealData},_} = json_parser:dvm_parser(list_to_binary(PureData)),
parse_json_tuple(list_to_tuple(RealData)).

Finally, I parse the data in the RealData tuple and returned the CurrentPrice and Quote time:

parse_json_tuple(RealData) ->
  %% this is ugly and needs to be refactored
  {_,_,_,_,{_,CurrentPrice},_,{_,CurrentTime},_,_,_} = RealData,
  {CurrentPrice, CurrentTime}.

To use the code, you have to compile the quote.erl file by calling the  c(quote) function in the Erlang Shell, then call the get_stock_quote function with the a stock symbol: quote:get_stock_quote(“MSFT”) from the Erlang shell.

This isn’t the best use of Erlang, but I wanted to ease into Erlang before exploring the more complicated recursive and distributed functionality, later I’ll refactor the code to be more Erlang-y and modify the program to retrieve quotes in parallel using Erlang’s  concurrency magic.

*Edit*I Just discovered that Google Finance returns a slightly different dataset when the market is closed, I’ll update the parse_json_tuple function with the changes soon.

Full Listing

   1:  -module(quote).
   2:  -export([get_stock_quote/1]).
   3:  -import(json_parser).
   4:  
   5:  -define(BASE_URL, "http://www.google.com/finance/info?client=ig&q=".
   6:  
   7:  get_google_url(Symbol) ->
   8:      ?BASE_URL ++ Symbol.
   9:  
  10:  parse_json_tuple(RealData) ->
  11:      %% this is ugly and needs to be refactored
  12:     {_,_,_,_,{_,CurrentPrice},_,{_,CurrentTime},_,_,_} = RealData,
  13:      {CurrentPrice, CurrentTime}.
  14:  
  15:  get_stock_quote(Symbol) ->
  16:      %% Don't know why I need the following line
  17:      %% mentioned in inets documentation
  18:      inets:start(),
  19:      URL =  get_google_url(Symbol),
  20:      { ok, {_Status, _Headers, Body }} = http:request(URL),
  21:      PureData = lists:subtract(lists:subtract(Body, "// [ "), "] "),
  22:      {_,{_,RealData},_} = json_parser:dvm_parser(list_to_binary(PureData)),
  23:      parse_json_tuple(list_to_tuple(RealData)).

nowhere_m

Source control is the bedrock of your development effort, it contains the history of your entire project and it’s too important to trust it to VSS.  Visual Source Safe is a horrible source control system and i keeps you from being productive.

Teams using VSS love to use exclusive locks on source files, this only creates contention for files and kills the team’s productivity.  Even worse, VSS’s doesn’t support branching which makes refactoring your code doubly difficult, all your changes have to be made offline and manually merged into the “trunk,” and only is VSS insanely slow it has no concept of transactions, if a bug fix required you to change and check-in multiple files at once, there’s no way to figure out which files were checked-in together using VSS’s history feature.

SVN is a much better, free alternative to VSS, which supports many features that you expect from a modern source control system: branching  and merging and transactions and atomic commits.

If you think your project’s history is stuck in VSS, it’s not, there’s a great migration tool called vss2svn which lets you migrate your entire VSS repository along with the project history to SVN.

You’ll need to download and setup SVN server and create a repository and users for your projects first, I recommend using VisualSVN Server if you’re on Windows.  What follows is a painfully long migration, but it’s well worth it in the end.

  1. Download and extract vss2svn to a hard drive which has plenty of space, as a general rule, you should have enough space to hold three copies of your VSS repository;
  2. After making sure that your repository does not have any files checked out, take your VSS repository offline by disabling all access to it, then create a backup of your VSS directory; at this time, you should also delete/destroy any files/projects you do not wish to migrate to svn.
  3. Run the analyze command on your VSS repository, you’ll have to run analyze several times if the analyze utility encounter errors in your repository– this is very time consuming and but an important step.
  4. Now you’re ready to run the migration utility from the command prompt.   You’ll need to use the following command:  vss2svn.exe –vssdir <dir> where <dir> is the directory which contains the scrsafe.ini file.  vss2svn.exe will generate a svn dump file which you’ll later import into your svn repository;
  5. finally, import the dump file generated by vss2svn by using the following command on the machien where your svn repository resides: svnadmin load /path/to/repository < dumpfile.txt

Introduction to Data Mining by by Pang-Ning Tan, Michael Steinbach, Vipin Kumar

I’ve always maintained that stocks follow repeatable patterns in the short term and a computer should be able to exploit that fairly easily, which is why I’ve always preferred the idea of using pattern recognition over technical analysis in my next automated trading platform.

I needed to have a grasp of the basics of machine learning and data mining which is why I started reading Programming Collective Intelligence.

There are a couple of ideas that I’ve been throwing around in my head—applying pattern recognition and machine learning to financial data—I’m going to use Orange to play with these ideas a little further and I hope, eventually develop a trading strategy.

Having finished PCI, I’m also ready for something more theoretical, so I’ve bought a copy of  Introduction to Data Mining– I’ll read the book over the next few weeks while following along Stanford’s Stats 202 course syllabus, watching their videos and hopefully, form a deeper understanding of which machine learning algorithms to apply to this domain.

Search