Python - log parsing, statistical analysis, and performance graphs

Python - log parsing, statistical analysis, and performance graphs:

Over the past few years, Python has become my favorite language to program in. I do tools development and work regularly on a variety of platforms, so I try to stay versatile with respect to languages. I've written lots of code in a variety of languages over the years (C, C++, Scheme, Pascal, Java, C#, Perl, Python, etc.). For a long time, Perl and Java were my goto languages. Any quick script or heavy text slinging; I would reach for Perl. Any larger project that needs an organized class structure; I would reach for Java. Then a few years back I started banging around Python and learned what a fantastic programming language it can be for _anything_.

The past few days I have been working on a tool for analyzing performance data that an application logs during load tests.

I needed a working version quickly that I can show as a proof of concept. Part of the requirements is that the tools must integrate well with a .NET environment, and be maintainable and extendable by people in the .NET shop. Many people claim Python is a great prototyping language. It has clean syntax and structure and is great for quickly building class libraries. I started to think I would create something quick in Python and then maybe later port it to C#. Well it turned out so nice and so easy to work with, that I can't imagine using anything but Python for it now. (IronPython perhaps?)

so I ended up creating a generic framework in Python and exposed a scriptable API.

It can:
- parse MS Event Logs
- slice the data up into a time-series
- run some statistical calculations on the time series
- output graphs (to gif/png images for web display, or to a GUI with more powerful viewing)

It is built with:
- Python
- Matplotlib
- MS Log Parser 2.2

I had to do a lot of work with crunching data sequences and slicing up time series data.
Python's dynamic typing and simple data structures made it very flexible to handle all the data processing with a minimal amount of code. The most useful thing was Python's List Comprehension features. List Comprehensions are very powerful constructs for list processing that allow you to do some heavy lifting with numeric sequence processing in a very concise way.

MS Log Parser 2.2
Log Parser is a tool from Microsoft that lets you query log files with an SQL dialect. I built a wrapper around this with Python's popen methods.

Matplotlib
Matplotlib is a 2D plotting library written in Python. I created a graphing API for my framework that uses this. It was simple to create and the graphs look great.

corestats.py
One of the classes I wrote was for doing simple statistical calculations. You can grab a copy here if you are interested: http://www.goldb.org/corestats.html

-Corey Goldberg
www.goldb.org