Monday, April 30, 2007

Google Linq

The ultimate way to understand LINQ, of course, is to write your own IQueryable magic. I have choosed Google Image as my first pet project for this: The reason for selecting Image over Web search is that Image results are more structured. For example, we can filter on size, type and color. Implementing all these would provide a more complete view of the inner working of LINQ/IQueryable.

Base on the principles of TDD, I shall first define the test cases for the project, for example:

var test = from img in

               MChen.Linq.Google.ImageSearch.Instance

           where img.Description.Contains("linq")

               && img.Size == ImageSize.Small

           select img;

I intend to define ImageSearch as a singleton because, well, there is only one Google… Obviously this can be further extended to Web, Group and Froogle in the future.

As the very first step, I have put togather a crude implementation of IQueryable that simply dumps the Expression tree in CreateQuery function. The output for the query expression above looks like:

Some references that are very useful along the way:

Thursday, April 26, 2007

More on DLINQ and Projection Operator

I posted my observation on DLINQ and Projection Operator on MSDN Forum and soon received a response from Keith J. Farmer at Microsoft.

The answer was somewhat expected: DLINQ intends to translate everything to SQL to avoid any type of client side query for performance reasons. I can understand this for where and orderby clause, but not for select and projection operator I was using.

On the other hand, Keith did recommend a practical solution: put the .Net method call in SQL/CLR so that it can be called as a user defined function in SQL. Then map the UDF in DLINQ so that it can be used from the query:

  • First, create a SQL Server Project in Visual Studio and create the UDF. This is really nothing new and can be done in VS2005.

        [Microsoft.SqlServer.Server.SqlFunction]

        public static SqlString StringFormat(

            SqlString format,

            SqlString o1,

            SqlMoney o2)

        {

            return new SqlString(string.Format(

                format.Value, o1.Value, o2.Value)

            );

        }

  • Deploy the UDF in SQL Server. This can be done inside VS2005 (Build -> Deploy [project name]) or manually (see this post for steps).
  • You probably don’t want to manually write the mapping code for the UDF this time. So use SqlMetal, which is provided as part of .NET 3.5 Beta, to generate the database mapping file:

    sqlmetal /server:localhost /database:northwind

             /code:northwind.cs /functions
             /namespace:MyDLinqTest

  • Now we have mapped the entire Northwind database with all functions, we can use its tables and UDFs in DLINQ query like this:

    Northwind db = new Northwind(ConnStr);

    var list = from p in db.Products

               where p.UnitPrice > 30

               select new {

                  p.ProductID,

                  Description 
                    = db.StringFormat("{0} - {1}",

                      p.ProductName, p.UnitPrice)

               };

Wednesday, April 25, 2007

DLINQ and Projection Operator

Seems I was impressed too early. Projection operation on DLINQ is giving me a bit of grief:

var list = from p in prods

       where p.UnitPrice > 30

       select new {

          p.ProductID,

          Description = string.Format("{0} - {1}",

              p.ProductName, p.UnitPrice)

      };

I got the following exception:

  

Yeah, I know String.Format can’t be translated to SQL, but I’m expecting it to be executed as .NET code. DLINQ is already using CodeDom to generate database access code anyway. Why can’t it generate the SQL that retrieves only directly mapped columns/fields, then produce the rest fields in the anonymous type, like this:

class _AnonymousType

{

    //map to database columns

    private int     _productID;

    private string  _productName;

    private decimal _unitPrice;

 

    //...

    //generate Description as

    public string Description {

        get {

            return string.Format("{0} - {1}",

              _productName,

              _unitPrice);

        }

    }

}

Sure, this would require the Projection Operator be part of the expression tree and the anonymous type being generated at runtime, rather than by the compiler, as it currently is. But limiting the projection in DLINQ to only these operations that can be translated in SQL is probably too much of a sacrifice… In the example above, we can change string.Format to:

     Description = p.ProductName + "-" + p.UnitPrice

 But what if the projection involves more complicated function calls?

Digging DLINQ

Just gave DLINQ a quick shot and it indeed is quite impressive. The way Table and Column attributes are used for database mapping is similar to that of XML Serialization:

    [Table(Name="Products")]

    public class Product

    {

        [Column]

        public int ProductID { get; set; }

 

        [Column]

        public string ProductName { get; set; }

 

        [Column]

        public decimal UnitPrice { get; set; }

    }

Now, you can use LINQ syntax to perform strong typed database query:

public static void DLinqQuery()

{

    string ConnStr =

        @"Server=localhost;Database=Northwind;";

 

    DataContext ctx = new DataContext(ConnStr);

    Table<Product> prods = ctx.GetTable<Product>();

 

    //this is COOL!

    var list = from p in prods

               where p.UnitPrice > 30

               select p;

 

    foreach (var p in list)

    {

        Console.WriteLine("{0,5}\t{1,-32}\t{2,-6}",

            p.ProductID, p.ProductName, p.UnitPrice);

    }

}


Pretty neat, agree? I only wish this were available eailier so that I didn’t have to deal with typed dataset. That was aweful… Some further point of interests:

  • Is the underlying SQL generated by DLINQ effecient? (IQueryable and the expression tree do deserve their own blog entries).
  • Mapping between foreign key relationship to object graphs.
  • Stored Proc and Transaction support

Tuesday, April 24, 2007

How to price a CDS


As with any other derivative contract, the valuation of CDS is (was?) primarily based on replicate portfolio and no arbitrage argument. The picture below (from Merrill Lynch Credit Derivative Handbook 2003) depicts a portfolio that replicates a credit default swap:

  • A fix rate corporate bond is acquired by the investor. This exposes the investor to two major risks: interest rate risk and credit risk. The bond pays treasury rate plus a spread (Sc).

  • The bond purchase is repo financed. This is necessary because par CDS is unfunded transaction.

  • An interest rate swap transaction is then arranged. The investor pays fixed leg and receives LIBOR plus spread (Ss). This effectively eliminates the interest rate risk component.

  • A single counter-party would package the bond and swap into a single asset swap to minimize counter-party risk.

  • The investor is now holding only the credit risk of the bond issuer, which is the same for a CDS contract. Hence the "price" of this portfolio should be the same as a CDS with the same maturity (with adjustments on coupon interval, day count, yadayada...)
One thing to note is that this pricing framework is now mostly theoretical. With exponential growth in the credit market, CDS contracts have become more liquid than corporate bonds. CDS is considered a plain vanilla product and its spread quote is used to derive default probability and price other credit derivative products

Monday, April 23, 2007

How CDS Works

A basic introduction on the mechanics of CDS (credit default swaps).

Breaking Changes in Linq, Syntactic Sugar? Syntactic Heroin?

Breaking changes in Linq

Two days into the Linq jungle, I have noticed quite a few breaking changes in C# 3.0 Beta compare to the CTP release last year. The majority of sample code off the web can no longer compile under the new version. The poor documentation (well, to be fair, it's documentation for Beta...) made it a lot worse:

  • System.Query namespace is now System.Linq.
  • System.Expressions is now System.Linq.Expressions
  • Extension methods ToQueryable and ToEnumerable are now AsQueryable and AsEnumerable
  • Sequence class has been substituted with Enumerable, although this likely has no impact on code compilation.
What's "Syntactic Sugar" anyway?
All C# 3.0 new features are described as "Syntactic Sugar" by someone at some point. This leads me to think what exactly is Syntactic Sugar: Is OOP just a web of Syntactic Sugar(think Objective C, for example)? "Member Method" covers up the underlying function with an added this pointer, and the almighty "Polymorphism" is just a short hand for function pointer tables... So where do we end with this stripping process? "MOV EBP, ESP"? Oh, wait, that's just "Syntactic Sugar" for a bunch of 0s and 1s...
Syntactic Heroin
Some called operator overloading "Syntactic Heroin" because of the potential abusive use. I have the same concern for Extension Method: I see the legit use in certain situations, but it's so easy to abuse the concept by using it to quickly "hack up" a incorrectly designed class contract. It's not hard to see the code would become virtually unmaintainable when extension method is overused.

Ways to skin a LINQ query

Quite naturally, the very first step to understand LINQ is to go behind curtain and see how the magic select/where statement works. Although this has been explained many, many, many times, I still couldn't resist the temptation:

    public class Customer
    {
        public string Name { get; internal set; }
        public int Age { get; internal set; }
    }

    //...
            List<Customer> list = ...;
            var ret = from s in list
                      orderby s.Age descending
                      where s.Name == "Ming"
                      select s;
First, let's take out the syntax sugar out of the query. It's then equivalent to:
            Func<Customer, bool> p1
                = c => c.Name == "Ming";
            Func<Customer, int> p2
                = c => c.Age;
            return list.Where(p1).OrderByDescending(p2);
Still, this uses Lambda Expression. Taking that out, we have:
            var ret = list.Where<Customer>(
                new Func<Customer, bool>(
                    delegate(Customer c) {
                        return c.Name == "Ming";
                    }
            ));
            ret = ret.OrderByDescending<Customer, int>(
                new Func<Customer, int>(
                    delegate(Customer c) { return c.Age; }
            ));
And finally, without the Extension Method and var magic, the C# 2.0 way to write the query:
            IEnumerable<Customer> ret =
                System.Linq.Enumerable.Where<Customer>(
                    list,
                    new Func<Customer, bool>(
                        delegate(Customer c) {
                            return c.Name == "Ming";
                        }
            ));
            ret = System.Linq.Enumerable.
                OrderByDescending<Customer, int>(
                    ret,
                    new Func<Customer, int>(
                        delegate(Customer c) {
                            return c.Age;
                        }
            ));
It's now clear that LINQ is a compiler feature plus corresponding library support (System.Linq.Enumerable, for example). There really isn't anything new from CLR point of view.

Things to look at in C# 3.0

I'm actually a bit worried about the readability of MSIL code, which has been the primary method for me to understand what really happens behind the scenes.

With all the compiler generated members (for generics, iterators and now lambda expression, automatic property and Linq), the MSIL code generated for C# now tastes awfully like Managed C++...

By the next release, there probably is no need for an Obfuscator.

Friday, April 20, 2007

Under the Hood of Iterators and yield statement

Took a brief look at the mechanics of C# iterator and yield statement --- more in terms of the MSIL level implementation than just how to use it.

  • The compiler generates a state machine implementation of the IEnumerator interface internally.
  • Each yield return statement produces a separate state, while each yield break transits the machine to the terminate state
For example, for the function here:
        private string[] _names = { "Foo", "Bar", "Poo"};

        public IEnumerator GetEnumerator() {
            for (int i = 0; i < _names.Length; ++i) {
                yield return _names[i];
            }
        }
The compiler produces a state machine with one working state and a memeber variable that keeps track of the current index into _names. But for the following code:
        public IEnumerator GetEnumerator() {
            yield return "Foo";
            yield return "Bar";
            yield return "Poo";
        }
The compiler would generate a state machine with 3 working states. It seems Reflector chokes on compiler produced iterator code. I don't expect it to reverse engineer the state machine into yield statements, but the C# code it produces is totally out of whack. I end up had to read the MSIL to understand what's going on.

The Orcas is here

  • Visual Studio "Orcas" (Beta) is now available for download (need MSDN Subscription).
  • Or you can download "Orcas" express editions here (Community Technology Preview).

Thursday, April 19, 2007

CME Credit Derivative Future Contracts

CME is tapping into the $30 trillion credit market with its single name and index credit default swap future contracts. Eurex is also set to launch future contract on Itraxx CDS indices. This is hardly surprising given the hyper growth in OTC credit derivative market in the last couple years (also in the news today, the overall size of credit derivative market has been growing at 100% three years in a row). It would be interesting to see the impact of exchange involvements on the overall credit market (I.E. on counter-party credit risk, liquidity, etc).

Wednesday, April 18, 2007

Visual Studio Tools for Office Second Edition

Technically speaking, VSTO 2005 SE is not really the second edition of VSTO. While VSTO 2005 is an independent version of Visual Studio 2005, VSTO SE is more of an add-on to different editions of Visual Studio. For example, VSTO 2005 SE can be installed on top of VSTO 2005. In addition, VSTO SE doesn't really cover all VSTO features. Noticeably, document level customization and action pane are not available in SE. On the other hand, it does add support for Custom Task Panes and Ribbons. And here is an decent overview of the mechanics of VSTO (Outlook is used as an example, but I imagine it should be almost identical for other office products). To my surprise, VSTO still works its way through the good (bad? evil?) old COM interfaces (IDTExtensibility2, etc), with AddInLoader.DLL wraps around managed code. It seems .NET is not treated as a first class citizen even in Office 2007 (to be fair, VSTO does add some nice touches of the managed world, such as added security with separate AppDomain).

Tuesday, April 17, 2007

Agile Software Development with SCRUM

This book has been my bed time reading for the last couple days. The authors' critics on traditional software development methodology and comment on modeling software development as a dynamic process hit the bull's eye. In fact, back in college, when I was taught the waterfall approach in software engineering class, I had my doubts: there are just too many uncertainties in the process to make it a static assembly line. Scrum methodology makes sense because it recognizes and utilizes the dynamics embedded in software development process, instead of trying to control it. I might get Srumatized some time after I get over with this CFA stuff...

Thursday, April 12, 2007

Wednesday, April 11, 2007

The top 10 traders by Trader Daily

The top 10 traders with highest estimated income by Trader Daily. This is also mentioned in today's WSJ. Interestingly, the top trader, John Arnold, got to the top by taking opposite position as Brian Hunter, the trade who is responsible for the collapse of hedge fund Amaranth. Hunter himself was on the same top trader list last year...

NameCityFirmAgeEst. Income
John ArnoldHoustonCentaurus Energy33$1.5-2B
James SimonsEast Setaucket, New YorkRenaissance Technologies Corp.68$1.5-$2B
Eddie LampertGreenwich, ConnecticutESL Investments44$1-1.5B
T. Boone PickensDallasBP Capital78$1B-1.5
Stevie CohenStamford, ConnecticutSAC Capital Advisors50$1B
Stephen FeinbergNew YorkCerberus Capital47$800-900M
Paul Tudor JonesGreenwich, ConnecticutTudor Investment Corp.53$700-800M
Bruce KovnerNew YorkCaxton Associates62$700-800M
Israel EnglanderNew YorkMillennium Management58$600-700M
David ShawNew YorkD.E. Shaw & Co.55$600-700M

Tuesday, April 10, 2007

Legacy Code = Code without tests

Saw this definition from a blog entry here. It origins from book "Working Effectively with Legacy Code". This makes perfect sense in most cases: the fear of making changes in "legacy code" largely derives from the fact that the system is untestable. Before TDD became a rather common software development practise, "legacy code" normally refers to code without detailed documentation. I tend to think of unit/integration tests as a type of documentation under TDD. After all, there is no better way to describe the specification of an application than actually using it.

Thursday, April 5, 2007

MSBuild, NUnit and CruiseControl.Net

I thought this would be the de facto standard setup for most .Net project these days. But it turns out making the whole thing work is far from "standard". Here are the steps I took, for what it's worth:

  • MSBuild is installed with .Net Framework 2.0. Install CruiseControl from here, and NUnit from here.
  • Install MSBuild Community Tasks, which includes a NUnit task for MSBuild.
  • Now you can create NUnit task in MSBuild project like this:
  •    <Import  Project="$(MSBuildExtensionsPath\MSBuildCommunityTasks\MSBuild.Community.Tasks.Targets"/>
       <NUnit Assemblies="myTest.dll"
           ToolPath="C:/Program Files/NUnit 2.4/bin/" 
           OutputXmlFile="TestResult.XML"/>
  • Setup MSBuild task in CruiseControl. Strangely, the custom logger used in the example is NOT part of CruiseControl installation. You need to download it here and put it into CruiseControl installation folder.
  • Use NUnit with CruiseControl. Since NUnit is setup as part of MSBuild, the only thing left to do is to merge the test result output into build log.
  • Finally, if CruiseControl is setup as a windows service and you're compiling a C/C++ project with MSBuild, you might get this error:
  •       Fatal Error C1902: Program database manager mismatch; please check your installation.
    Believe me, this has nothing to do with the pdb files. Unfortunately, link to the hotfix mentioned in the KB article is broken. An alternative is to set your CruiseControl service to run under a different use account than the default SYSTEM. (Yeah, I know...)

Wednesday, April 4, 2007

C-Runtime Libraries, DLL Hell and Side-by-Side Deployment

Starting from Windows XP, Microsoft has created a new "side-by-side" deployment model for native libraries as its latest attempt to solve the "DLL Hell" mess. The concept is pretty much the same as versioning in the managed world: libraries are versioned and signed, then installed in a global cache (WinSxS folder). Different versions of the same library can co-exist in the cache. A manifest is embedded in executable files to identify the version of the library the application was built with. C-Runtime libraries (MSVCRTxx.DLL, MSVCPRTxx.DLL, etc) are treated the same way. The process will throw a nasty error message if the executable or any DLL doesn't have a manifest embedded. If a third party library or an old DLL lying around doesn't have manifest built in, MT.EXE can be used to embed the manifest xml into the DLL directly:

    mt /outputresource:yourlib.dll;#2 -manifest yourlib.dll.manifest
More references in MSDN:

Hedge Fund and Managed Account

A good discussion on Managed Account in Hedge Fund industry. I can understand why managed accounts can become an operational and logistical hassle, at least from IT infrastructure point of view.

Monday, April 2, 2007

Wireless Break Throughs on April 1st (And 4/1st only)

First, free broadband access through local municipal sewage lines (A.K.A toilets) for the pigeon cluster search company:

In case you don't have a bathroom at home, don't worry. There is the more widely applicable RFC that transmits IP packets through Semaphore Flag Signaling System (A.K.A hand waving signals), try speaking "Hi" to your neighbour like this:
           SFS        0     __0      \0      |0
                     /||      ||      ||      ||
                     / \     / \     / \     / \
                      A       B       C       D
           IP-SFS    0x00    0x01    0x02    0x03
           -----------------------------------------
           SFS        0/      0__     0     __0
                     ||      ||      ||\     /|
                     / \     / \     / \     / \
                      E       F       G       H
           IP-SFS    0x04    0x05    0x06    0x07
           -----------------------------------------
           SFS       \0      |0__     0|      0/
                     /|       |      /|      /|
                     / \     / \     / \     / \
                      I       J       K       L
           IP-SFS    0x08    0x09    0x0A    0x0B
           -----------------------------------------
           SFS        0__     0     _\0     __0|
                     /|      /|\      |       |
                     / \     / \     / \     / \
                      M       N       O       P
           IP-SFS    0x0C    0x0D    0x0E    0x0F

              Figure 3: IP-SFS Data Signals.