03.14.09
Posted in Python at 10:20 pm by Twm
I recently attended a lunchtime lecture with the Gresham College. If you are in london, you may be interested in their work friendly 1pm and 6pm lectures.
The title of this one was “How to be a Winner: The maths of race fixing and money laundering” by Professor John D Barrow FRS.
There are a couple of interesting topics covered in the talk, including the obligatory monty hall problem (covered on my blog a while back).
One which I like is the ‘how to deal with a weighted coin’ scenario.
If you suspect that a coin is throwing up head or tails too often, then it’s still possible to use it as a fair coin (so long as you are willing to throw away some throws).
Basically, rather than throwing a coin once – you throw it in pairs (discarding throws where the coins are both heads or both tails).
To generate a “heads or tails” decision you read the value of the first throw of the pairs.
- if you throw HT – then you record an outcome of heads
- if you throw TH – then you record an outcome of tails.
- But if it’s HH or TT, then you throw again until you get one of the above.
To model this using probabilities. Imagine you have a coin with has 0.4 probability of throwing a heads and 0.6 bias in favour of tails.
The following probabilities exits:
p(H) = 0.40
P(T) = 0.60
P(HH) = 0.4*0.4 = 0.16
P(TT) = 0.6*0.6 = 0.36
P(HT) = 0.4*0.6 = 0.24
P(TH) = 0.6*0.4 =0.24
You can use the fact that P(HT) and P(TH) are equally likely (regardless of the bias) to make a weighted coin act fair.
More generally:
p(H) = p
P(T) = 1-p
P(HH) = p^2
P(TT) = (1-p)^2 = 1-2p+p^2
P(HT) = p(1-p) = p – p^2
P(TH) = (1-p)p =p – p^2
Or modelling the problem in Python that would be:
def tossWeightedCoin():
if(random.random()<0.40):
return HEAD
else:
return TAIL
def tossFairFromWeighted():
toss1 = HEAD
toss2 = HEAD
while(toss1 == toss2):
toss1 = tossWeightedCoin()
toss2 = tossWeightedCoin()
if(toss1==HEAD):
return HEAD;
else:
return TAIL;
Permalink
09.23.08
Posted in Python at 12:24 am by Twm
This article discusses the practice of screenscraping (web scraping), and includes some advice on screen scraping troublesome ASP web pages.
Screenscraping, the art of automatically grabbing pages off the internet and extracting useful data for your own use, often offers a compelling problem to solve.
No one would argue that screen scraping fulfils some revered place on the comupter science mantle piece: Screenscraping is the pragmatist’s tool for getting stuff done today knowing that the solution may break tomorrow.
One of the biggest rewards I gained from learning Python was the ability to quickly grab data from any old source and make it useful now. Within an big company or organisation which has dozens of disparate tools and database, the skill of scriptin and scraping can save you hours a week in routine admin work or report generation.Â
Read the rest of this entry »
Permalink
03.24.08
Posted in Python at 3:50 am by Twm
For reasons I’m not going to divulge, I needed a timer that beeped every 2 mins to remind me to take a measurement.
I had a program running in 3 mins.
import winsound
import time
while True:
winsound.Beep(2000,200)
time.sleep(120)
Thanks to single use, throw away programs; my life has become so much better.
When I have data in a form that’s difficult to work with, I think nothing of writing a script to transform it. If I have a repetative task which I’ve performed at least three times, I’m always looking for ways of automating it.
I use the python interpreter as my calculator now. With a few stock stats routines, it’s easy to crunch some data from the interpreter prompt. (iPython makes it even more of a pleasure http://www.scipy.org/).
It’s very liberating to take charge of data and program with a tools which just seems so intuitive and expressive. It reminds me why I got into programming. Because computers are supposed to make our lifes easier.
Permalink
11.26.07
Posted in AJAX, Python, javascript at 5:00 am by Twm
Due to popular demand, I’m publishing my iTunes remote source code here
Read the rest of this entry »
Permalink
07.20.07
Posted in Python, maths at 10:25 pm by Twm
Most of us have faked data in columns at some point in our lives, be it the results from the GCSE lab experiment hich went wrong or some time sheet for the week before the holiday. It’s not hard to imagine that humans in general are pretty bad at making up convincing numbers, we are especially bad with picking extremes which makes us bad at generating random numbers.
There are several tests for determining the randomness of data, but there is a suprising amount of data which isn’t random. Quantitive measurments (e.g the height of a building, the lengths of a phali or the value of the stocks on the stock exchange) although may seem random, exhibit an interesting property called Benfords Law which may help in detecting made up values in non random data.
Benford’s law is a curious little observation about non random numbers which states that for a set of numbers, the leading digit will be a ‘1′ around 30% of the time, digit ‘2′ around 18% of the time, and ever decreasing until ‘9′ which will only be present as the first digit 5% of the time.
The classic example is if you measure river lengths in the world and count the number of times 0-9 are the leading digit then Benford’s observation can be confirmed. Since the law is ’scale-invariant’ (it’s not affected by multiplying or dividing the numbers) then it applies weather you measure the length of the rivers in meters, feet or inches.
See the below graph which compares the expected frequency of leading digits in truly random numbers (all 0.1) and compare that with Benford’s numbers.

For any set of data which is known to follow Benford’s law, it makes sense that signficiant deviation from the fequencies of leading digits predicted by Benford could indicate foul play in the data.
The IRS are thought to be running this sort of algorithm in order to filter out suspect TAX claims for closer scruitny.
In the absense of aduquate measurements of penis sizes (sample size = 1), I thought i would try with some more readily available data – the size of files on my hard disk. I tried with both c:\windows and the temp directory in my user area.
Windows directory (sample size = 41)
Digit E A D
1 0.3 0.21 0.09
2 0.18 0.13 0.05
3 0.12 0.11 0.01
4 0.1 0.27 0.17
5 0.08 0.08 0
6 0.07 0.05 0.02
7 0.06 0.05 0.01
8 0.05 0.03 0.02
9 0.05 0.08 0.03
E = Expected frequency (accoring to Benford)
A = Actual observed
D = The difference (i.e zero means no difference)

The sample size is pretty small, but already this shows signs of a the decreasing Benford curve. Note that the digit number five is the only one which matches perfectly.
The thing to note is how close the overal match of the curve is compared to the random case where each digit’s probability is 0.1. So the digit ‘1′ looks way off at 0.2. but is still twice as frequent as the random case (0.1).
The sum of the differences is 0.45.
Temp directory (sample size = 143)
Digit E A D
1 0.3 0.45 0.15
2 0.18 0.17 0.01
3 0.12 0.08 0.04
4 0.1 0.09 0.01
5 0.08 0.08 0
6 0.07 0.04 0.03
7 0.06 0.02 0.04
8 0.05 0.03 0.02
9 0.05 0.03 0.02
E = Expected frequency (accoring to Benford)
A = Actual observed
D = The difference (i.e zero means no difference)

With a sample size of over a hundred, the the Benford curve reveals itself quite nicely. Again note that 5 is spot on, but this time the sum difference is 0.32 which is closer to zero ,meaning that the sample data is a closer match to the prediction than the previous samples.
Made up data (sample size = 36)
Finally, here are the frequencies of leading digit when using a list of made up file sizes (I made them up myself).
Digit E A D
1 0.3 0.14 0.16
2 0.18 0.11 0.06
3 0.12 0.22 0.1
4 0.1 0.19 0.1
5 0.08 0.03 0.05
6 0.07 0.06 0.01
7 0.06 0.06 0
8 0.05 0.08 0.03
9 0.05 0.11 0.07
E = Expected frequency (accoring to Benford)
A = Actual observed
D = The difference (i.e zero means no difference)

With a sum difference of 0.6, the data represents a lousy fit and inspection of the graph confirms that the pattern has been disrupted significantly. It looks as if I have been caught.
link : The source
Permalink
07.14.07
Posted in Python, Symbian, c++ at 9:57 pm by Twm
I got absolutely sick of looking up Symbian leave codes and was glad to come across a list on newlc.
http://newlc.com/Symbian-OS-Error-Codes.html
To ease the lookup further, here is a python script which scapes the website into a database and allows you to lookup error codes from the command line.
e.g:
C:\kerrwhat -12
KErrPathNotFound [Unable to find the specified folder]
Link: The Source
Permalink
09.26.06
Posted in AJAX, Python, javascript at 8:52 pm by Twm
My first AJAX app
I’m now officially Twm2.0 and i feel great.
What is iTunes Remote?
Lets say you have a nice PC hooked up to your swish HI-FI. It’s very inelegant and not very modern to have to go over to the PC every time you want to change the music or mute.Wouldn’t it be great if you could use your mobile phone to view playlists, skip tracks and mute all from the comfort of your armchair or bed.
Enter iTunes remote. It turns any device with a web browser into an iTunes remote control. This could be a slim laptop, or a modern wifi enabled cell phone.
(click image for full screen shot)
What are the main features of iTunes remote?
- Log into iTunes running on a remote PC in the house with any web enabled device
- Click on the arrow next to a playlist to play it
- Browse playlists and select an individual song. One click will cause the host PC to start playing.
- Free text s earch for song,artist. One click on the results causes song to play
- Stop/mute whatever is currently playing on iTunes
- Display of album art for the current song retrieved from iTunes store.
- Valuation of music – it displays how much money you have spent in ITunes store (assuming 79p per track)
- Wake up timer – register a wakeup time and a playlist to gently ease you out of bed in the morining
The Web UI can be served in two flavours:
- Desktop grade which uses advanced browser technologies to emulate the iTunes UI.
- Simple device grade which uses simpler technology to deliver a multi page interface for devices with less power.
What use is it?
It was developed while I had broken my ankle and femur and getting up onto crutches to change the tune was a bit of a pain. So the driving use case is for immobile people to make use of their mobile in a local context.
But, what is great for the immobile is fantastic for the rest of us.
Technical notes
The aim of the design was to provide an excellent remote control on a device without having the user install anything. The chief advantages of this approach is.
- No client install. User just fires up a bookmark in the mobile web browser
- Target mutiple devices – even ones which haven’t been invented yet
- Almost Zero upgrade cost for user. e.g a new cell phone just needs a new bookmark in the web browser
Implementation details
The iTunes remote solution is authored in two parts following the “AJAX” model.
- Seperation of content and display – DHTML + CSS takes care of the layout client side, python produces XML data.
- Currently uses Apache web server to handle mutiple requests
- Apache serves a single AJAX front page for the UI, there are no page reloads.
- The python scripts use the COM interface exposed by iTunes to control the application
Challenges
A key challange in creating a consumer grade remote control is to ensure trouble free install. Bluetooth is a one approach. BT is clearly designed for local area connection of two devices and has concepts such as one time pairing for exactly this sort of use case. BT however requires an application to be written for the target device which contradicts the design goal.
Superficially, the web server based approach seems simple enough in a wifi household. The user installs a service on the PC and then points the mobile web browser to the URL of the server. The problem is that the host PC may be hidden behind a router, and complex configurations of router ports and firewalls.
I’d like to know what the best way of doing this is and will be looking to UPnP for the future. I only want a clean easy way of exposing a web server to another device on the same network (of course without running any code on the device).
Nokia 9500 iTunes Remote photos
The search query page:

The result of selecting a song from the search results:

Permalink
09.18.06
Posted in Development, Python at 12:48 pm by Twm
[Update Jan 2008] Flickr have changed the system for protecting images and this hack no longer works.
Flickr is a photo sharing web site on the web. It has changed the way I take photographs and is a part of my daily internet life.
release all my shots as creative commons license and offer the highest quality images for download. The license means that people can use my images for non-commercial projects free of charge, without my explicit permission so long as I’m credited as the creator.
I have learnt so much about what makes a good photo by studying other’s work as they grow and evolve, but one thing which aways bugged me was people who keep their large images to themselves. As part of the learning process I like to be able to check on some detail in the images, or to see if the photograph is really as sharp as it initially appears in the small version.
Fortunately (or unfortunately for flickr users), you don’t have to be Quincy to deduce the url of the ‘private’ image. There exists a hack to build up the URL from searching the HTML source for the image page. This hack (which doesn’t work as described) is documented here.
I wrote a Python script which maps from a flickr photo URL to the URL of the originally uploaded JPEG. The script retrieves the HTML source via HTTP and parses out the interesting bits using regular expressions.
I used a marvelous regex debugging tool for python called kodos. Here is a screenshot of kodos in action.

Link: The code
Permalink
Posted in Development, Python at 2:29 am by Twm
I was eager to start using python to manipulate my music collection and to do some nifty remote control stuff. So I downloaded the iTunes COM SDK from Apple and was pleasantly surprised by the comprehensive COM model which iTunes exports.
To make use of COM, the ActivePython distribution includes win32com allowing python programs to create and interrogate windows based COM objects such as XML parsers or office applications sucha s excel. When the iTunes application is loaded, it exposes a COM type library called “iTunes Type Library” which can be used to control the app There is no need to be scared by all the COM terminology, programming for iTunes is pretty intuitive. Once the COM connection is in made, simple tasks like setting the volume, or playing a playlist can be done with a single command.
I couldn’t get my noddy example to work. So tried running an example javascripts which come with the SDK to verify that it should work. This also failed so I repaired the installation of iTunes from control panel. On running a second time the javascript executed fine and greeted me with a dialog “created 600 playlists”. Bugger! It made a huge mess of my iTunes.
Th SDK example I chose to run creates a playlist for every album in the library, and it’s not thoughtful enough to create them in a separate folder. This was distressing since I have numerous lovingly crafted playlists for occasion such as “Sunday morning”, “been dumped”, “getting ready to go out”, and they were now drowned by useless album playlists.
So naturally my first iTunes python script is to undo the damage made by the example code. Sripting got me into this mess and scripting will set me free. It took maybe an hour to do which is pretty good considering I have no COM experience.
So a couple of take home messages
- If an application installer tells you to restart your machine now or later, it’s probably worth doing now if you are a COM hacker.
- It’s wise to have some notion of undo/confirmation in your script. e.g if your script deletes duplicate entries, then why not provide two scripts. one which creates a new playlist “duplicates” which allows the user to scan through it to check that all is well, and a second scripts which clears the duplicates playlist from the library for good.
Link: The code
Permalink
« Previous entries Next Page » Next Page »