Best Programming Language for Algorithmic Trading Systems?

By QuantStart Team

One of the most frequent questions I receive in the QS mailbag is "What is the best programming language for algorithmic trading?". The short answer is that there is no "best" language. Strategy parameters, performance, modularity, development, resiliency and cost must all be considered. This article will outline the necessary components of an algorithmic trading system architecture and how decisions regarding implementation affect the choice of language.

Firstly, the major components of an algorithmic trading system will be considered, such as the research tools, portfolio optimiser, risk manager and execution engine. Subsequently, different trading strategies will be examined and how they affect the design of the system. In particular the frequency of trading and the likely trading volume will both be discussed.

Once the trading strategy has been selected, it is necessary to architect the entire system. This includes choice of hardware, the operating system(s) and system resiliency against rare, potentially catastrophic events. While the architecture is being considered, due regard must be paid to performance - both to the research tools as well as the live execution environment.

What Is The Trading System Trying To Do?

Before deciding on the "best" language with which to write an automated trading system it is necessary to define the requirements. Is the system going to be purely execution based? Will the system require a risk management or portfolio construction module? Will the system require a high-performance backtester? For most strategies the trading system can be partitioned into two categories: Research and signal generation.

Research is concerned with evaluation of a strategy performance over historical data. The process of evaluating a trading strategy over prior market data is known as backtesting. The data size and algorithmic complexity will have a big impact on the computational intensity of the backtester. CPU speed and concurrency are often the limiting factors in optimising research execution speed.

Signal generation is concerned with generating a set of trading signals from an algorithm and sending such orders to the market, usually via a brokerage. For certain strategies a high level of performance is required. I/O issues such as network bandwidth and latency are often the limiting factor in optimising execution systems. Thus the choice of languages for each component of your entire system may be quite different.

Type, Frequency and Volume of Strategy

The type of algorithmic strategy employed will have a substantial impact on the design of the system. It will be necessary to consider the markets being traded, the connectivity to external data vendors, the frequency and volume of the strategy, the trade-off between ease of development and performance optimisation, as well as any custom hardware, including co-located custom servers, GPUs or FPGAs that might be necessary.

The technology choices for a low-frequency US equities strategy will be vastly different from those of a high-frequency statistical arbitrage strategy trading on the futures market. Prior to the choice of language many data vendors must be evaluated that pertain to a the strategy at hand.

It will be necessary to consider connectivity to the vendor, structure of any APIs, timeliness of the data, storage requirements and resiliency in the face of a vendor going offline. It is also wise to possess rapid access to multiple vendors! Various instruments all have their own storage quirks, examples of which include multiple ticker symbols for equities and expiration dates for futures (not to mention any specific OTC data). This needs to be factored in to the platform design.

Frequency of strategy is likely to be one of the biggest drivers of how the technology stack will be defined. Strategies employing data more frequently than minutely or secondly bars require significant consideration with regards to performance.

A strategy exceeding secondly bars (i.e. tick data) leads to a performance driven design as the primary requirement. For high frequency strategies a substantial amount of market data will need to be stored and evaluated. Software such as HDF5 or kdb+ are commonly used for these roles.

In order to process the extensive volumes of data needed for HFT applications, an extensively optimised backtester and execution system must be used. C/C++ (possibly with some assembler) is likely to the strongest language candidate. Ultra-high frequency strategies will almost certainly require custom hardware such as FPGAs, exchange co-location and kernal/network interface tuning.

Research Systems

Research systems typically involve a mixture of interactive development and automated scripting. The former often takes place within an IDE such as Visual Studio, MatLab or R Studio. The latter involves extensive numerical calculations over numerous parameters and data points. This leads to a language choice providing a straightforward environment to test code, but also provides sufficient performance to evaluate strategies over multiple parameter dimensions.

Typical IDEs in this space include Microsoft Visual C++/C#, which contains extensive debugging utilities, code completion capabilities (via "Intellisense") and straightforward overviews of the entire project stack (via the database ORM, LINQ); MatLab, which is designed for extensive numerical linear algebra and vectorised operations, but in an interactive console manner; R Studio, which wraps the Rstatistical language console in a fully-fledged IDE; Eclipse IDE for Linux Java and C++; and semi-proprietary IDEs such as Enthought Canopy for Python, which include data analysis libraries such as NumPy, SciPy, scikit-learn and pandas in a single interactive (console) environment.

For numerical backtesting, all of the above languages are suitable, although it is not necessary to utilise a GUI/IDE as the code will be executed "in the background". The prime consideration at this stage is that of execution speed. A compiled language (such as C++) is often useful if the backtesting parameter dimensions are large. Remember that it is necessary to be wary of such systems if that is the case!

Interpreted languages such as Python often make use of high-performance libraries such as NumPy/pandas for the backtesting step, in order to maintain a reasonable degree of competitiveness with compiled equivalents. Ultimately the language chosen for the backtesting will be determined by specific algorithmic needs as well as the range of libraries available in the language (more on that below). However, the language used for the backtester and research environments can be completely independent of those used in the portfolio construction, risk management and execution components, as will be seen.

Portfolio Construction and Risk Management

The portfolio construction and risk management components are often overlooked by retail algorithmic traders. This is almost always a mistake. These tools provide the mechanism by which capital will be preserved. They not only attempt to alleviate the number of "risky" bets, but also minimise churn of the trades themselves, reducing transaction costs.

Sophisticated versions of these components can have a significant effect on the quality and consistentcy of profitability. It is straightforward to create a stable of strategies as the portfolio construction mechanism and risk manager can easily be modified to handle multiple systems. Thus they should be considered essential components at the outset of the design of an algorithmic trading system.

The job of the portfolio construction system is to take a set of desired trades and produce the set of actual trades that minimise churn, maintain exposures to various factors (such as sectors, asset classes, volatility etc) and optimise the allocation of capital to various strategies in a portfolio.

Portfolio construction often reduces to a linear algebra problem (such as a matrix factorisation) and hence performance is highly dependent upon the effectiveness of the numerical linear algebra implementation available. Common libraries include uBLAS, LAPACK and NAG for C++. MatLab also possesses extensively optimised matrix operations. Python utilises NumPy/SciPy for such computations. A frequently rebalanced portfolio will require a compiled (and well optimised!) matrix library to carry this step out, so as not to bottleneck the trading system.

Risk management is another extremely important part of an algorithmic trading system. Risk can come in many forms: Increased volatility (although this may be seen as desirable for certain strategies!), increased correlations between asset classes, counter-party default, server outages, "black swan" events and undetected bugs in the trading code, to name a few.

Risk management components try and anticipate the effects of excessive volatility and correlation between asset classes and their subsequent effect(s) on trading capital. Often this reduces to a set of statistical computations such as Monte Carlo "stress tests". This is very similar to the computational needs of a derivatives pricing engine and as such will be CPU-bound. These simulations are highly parallelisable (see below) and, to a certain degree, it is possible to "throw hardware at the problem".

Execution Systems

The job of the execution system is to receive filtered trading signals from the portfolio construction and risk management components and send them on to a brokerage or other means of market access. For the majority of retail algorithmic trading strategies this involves an API or FIX connection to a brokerage such as Interactive Brokers. The primary considerations when deciding upon a language include quality of the API, language-wrapper availability for an API, execution frequency and the anticipated slippage.

The "quality" of the API refers to how well documented it is, what sort of performance it provides, whether it needs standalone software to be accessed or whether a gateway can be established in a headless fashion (i.e. no GUI). In the case of Interactive Brokers, the Trader WorkStation tool needs to be running in a GUI environment in order to access their API. I once had to install a Desktop Ubuntu edition onto an Amazon cloud server to access Interactive Brokers remotely, purely for this reason!

Most APIs will provide a C++ and/or Java interface. It is usually up to the community to develop language-specific wrappers for C#, Python, R, Excel and MatLab. Note that with every additional plugin utilised (especially API wrappers) there is scope for bugs to creep into the system. Always test plugins of this sort and ensure they are actively maintained. A worthwhile gauge is to see how many new updates to a codebase have been made in recent months.

Execution frequency is of the utmost importance in the execution algorithm. Note that hundreds of orders may be sent every minute and as such performance is critical. Slippage will be incurred through a badly-performing execution system and this will have a dramatic impact on profitability.

Statically-typed languages (see below) such as C++/Java are generally optimal for execution but there is a trade-off in development time, testing and ease of maintenance. Dynamically-typed languages, such as Python and Perl are now generally "fast enough". Always make sure the components are designed in a modular fashion (see below) so that they can be "swapped out" out as the system scales.

Architectural Planning and Development Process

The components of a trading system, its frequency and volume requirements have been discussed above, but system infrastructure has yet to be covered. Those acting as a retail trader or working in a small fund will likely be "wearing many hats". It will be necessary to be covering the alpha model, risk management and execution parameters, and also the final implementation of the system. Before delving into specific languages the design of an optimal system architecture will be discussed.

Separation of Concerns

One of the most important decisions that must be made at the outset is how to "separate the concerns" of a trading system. In software development, this essentially means how to break up the different aspects of the trading system into separate modular components.

By exposing interfaces at each of the components it is easy to swap out parts of the system for other versions that aid performance, reliability or maintenance, without modifying any external dependency code. This is the "best practice" for such systems. For strategies at lower frequencies such practices are advised. For ultra high frequency trading the rulebook might have to be ignored at the expense of tweaking the system for even more performance. A more tightly coupled system may be desirable.

Creating a component map of an algorithmic trading system is worth an article in itself. However, an optimal approach is to make sure there are separate components for the historical and real-time market data inputs, data storage, data access API, backtester, strategy parameters, portfolio construction, risk management and automated execution systems.

For instance, if the data store being used is currently underperforming, even at significant levels of optimisation, it can be swapped out with minimal rewrites to the data ingestion or data access API. As far the as the backtester and subsequent components are concerned, there is no difference.

Another benefit of separated components is that it allows a variety of programming languages to be used in the overall system. There is no need to be restricted to a single language if the communication method of the components is language independent. This will be the case if they are communicating via TCP/IP, ZeroMQ or some other language-independent protocol.

As a concrete example, consider the case of a backtesting system being written in C++ for "number crunching" performance, while the portfolio manager and execution systems are written in Python using SciPy and IBPy.

Performance Considerations

Performance is a significant consideration for most trading strategies. For higher frequency strategies it is the most important factor. "Performance" covers a wide range of issues, such as algorithmic execution speed, network latency, bandwidth, data I/O, concurrency/parallelism and scaling. Each of these areas are individually covered by large textbooks, so this article will only scratch the surface of each topic. Architecture and language choice will now be discussed in terms of their effects on performance.

The prevailing wisdom as stated by Donald Knuth, one of the fathers of Computer Science, is that "premature optimisation is the root of all evil". This is almost always the case - except when building a high frequency trading algorithm! For those who are interested in lower frequency strategies, a common approach is to build a system in the simplest way possible and only optimise as bottlenecksbegin to appear.

Profiling tools are used to determine where bottlenecks arise. Profiles can be made for all of the factors listed above, either in a MS Windows or Linux environment. There are many operating system and language tools available to do so, as well as third party utilities. Language choice will now be discussed in the context of performance.

C++, Java, Python, R and MatLab all contain high-performance libraries (either as part of their standard or externally) for basic data structure and algorithmic work. C++ ships with the Standard Template Library, while Python contains NumPy/SciPy. Common mathematical tasks are to be found in these libraries and it is rarely beneficial to write a new implementation.

One exception is if highly customised hardware architecture is required and an algorithm is making extensive use of proprietary extensions (such as custom caches). However, often "reinvention of the wheel" wastes time that could be better spent developing and optimising other parts of the trading infrastructure. Development time is extremely precious especially in the context of sole developers.

Latency is often an issue of the execution system as the research tools are usually situated on the same machine. For the former, latency can occur at multiple points along the execution path. Databases must be consulted (disk/network latency), signals must be generated (operating syste, kernal messaging latency), trade signals sent (NIC latency) and orders processed (exchange systems internal latency).

For higher frequency operations it is necessary to become intimately familiar with kernal optimisation as well as optimisation of network transmission. This is a deep area and is significantly beyond the scope of the article but if an UHFT algorithm is desired then be aware of the depth of knowledge required!

Caching is very useful in the toolkit of a quantitative trading developer. Caching refers to the concept of storing frequently accessed data in a manner which allows higher-performance access, at the expense of potential staleness of the data. A common use case occurs in web development when taking data from a disk-backed relational database and putting it into memory. Any subsequent requests for the data do not have to "hit the database" and so performance gains can be significant.

For trading situations caching can be extremely beneficial. For instance, the current state of a strategy portfolio can be stored in a cache until it is rebalanced, such that the list doesn't need to be regenerated upon each loop of the trading algorithm. Such regeneration is likely to be a high CPU or disk I/O operation.

However, caching is not without its own issues. Regeneration of cache data all at once, due to the volatilie nature of cache storage, can place significant demand on infrastructure. Another issue is dog-piling, where multiple generations of a new cache copy are carried out under extremely high load, which leads to cascade failure.

Dynamic memory allocation is an expensive operation in software execution. Thus it is imperative for higher performance trading applications to be well-aware how memory is being allocated and deallocated during program flow. Newer language standards such as Java, C# and Python all perform automatic garbage collection, which refers to deallocation of dynamically allocated memory when objects go out of scope.

Garbage collection is extremely useful during development as it reduces errors and aids readability. However, it is often sub-optimal for certain high frequency trading strategies. Custom garbage collection is often desired for these cases. In Java, for instance, by tuning the garbage collector and heap configuration, it is possible to obtain high performance for HFT strategies.

C++ doesn't provide a native garbage collector and so it is necessary to handle all memory allocation/deallocation as part of an object's implementation. While potentially error prone (potentially leading to dangling pointers) it is extremely useful to have fine-grained control of how objects appear on the heap for certain applications. When choosing a language make sure to study how the garbage collector works and whether it can be modified to optimise for a particular use case.

Many operations in algorithmic trading systems are amenable to parallelisation. This refers to the concept of carrying out multiple programmatic operations at the same time, i.e in "parallel". So-called "embarassingly parallel" algorithms include steps that can be computed fully independently of other steps. Certain statistical operations, such as Monte Carlo simulations, are a good example of embarassingly parallel algorithms as each random draw and subsequent path operation can be computed without knowledge of other paths.

Other algorithms are only partially parallelisable. Fluid dynamics simulations are such an example,