Marginal solution

November 27, 2013

Okey dokey. Better reveal the solution to the stat-geek marginalisation quiz. There were sixty two votes.

The popular winner was the economics/marginal profits idea, with 31 votes. Plausible but wrong.

The second most popular was the “marginal interest” idea. Well… this is what the term has more or less drifted into meaning, because (almost) everybody has forgotten the true origin. So… wrong.

Nobody voted for “lost in the mists of time”, which proves you all care. How nice.

Only two people voted for EB Margin being the pseudonym of WR Gossett. This disappointed me, both because it is funny and because it was supposed to be a cunning false trail. WS Gosset in fact published his papers under the name of “Student”, which is why we have the “Student’s t test’.

So of course the correct answer was other. Sorry if that was an annoying tactic, but I think if I’d made the right answer one of the choices, it would have been too obvious. Amongst the 6 suggestions, two were for our amusement :

“I’d write the reason here but there’s not enough room in the margin”
“To marginalise those who don’t know”

and four were were spot on or more or less right

“Refers to margins of a contingency table”
“thought it was to do with averaging rows, with answer stuck in the margin”
“your are projecting the 2D pdf onto the “margin” of the plot”
“Sweeping the probability to the edge (=margin) of the paper?”

Sounds like the first two people knew, and the second two deduced the right answer. If you were one of those people, award yourself an extra biscuit at coffee time, and feel free to announce yourself.

Just to it spell out..  As physicists, we nearly always think in abstract mathematical terms, so we think of  “marginalisation” as a calculus problem – an integral. Even when thinking visually we picture a joint probability distribution as a smooth surface in three dimensions. But early statisticians were often concerned with tables of numbers, and worked on paper. Think of a joint frequency distribution as a grid of numbers in cells. Then  add up a row, and write the answer in the margin. When you have done this for all the rows, read down that margin, and – voila – the marginal distribution for y.

Don’t start me on regression…


A quiz of marginal interest

November 20, 2013

Two things we know.

(1) Scientific terminology is burdened with the baggage of history, which now makes no logical sense. So… early type galaxies are the ones with late type stars? Errr… And which of these terms relates to a sequence in time? Neither. Right. Very helpful.

(2) When you have to teach something, you finally figure out things that have been bugging you for years.

(3) Nobody expects the … oh. Anyway. Often (1) and (2) combine to make a particularly thick fog.

For some time the term “marginalisation” had been nagging at me, but I ignored it because I had other stuff to get on with. I am referring to the term in statistics. You have a probability density function of two variables, f(x,y), but decide that y is “interesting” and x is “uninteresting”. You then integrate over x to get a PDF p(y) for y alone. This known as “marginalising over x”.

So here is the quiz. My guess is only about seven people will want to take it, but I can’t resist it.

Rule (a) Andrew Liddle is not allowed because I already told him the answer in the pub. Rule (b) No Googling. Rule (c) Never talk about  Stats Club.


Truth, Belief, and Action

August 27, 2013

My daughter is doing a medical degree. At dinner the other day, I mentioned that a few years back everybody seemed to think that doctors would be replaced by expert systems. Did that happen? Oh no, she said, that’s never going to happen. Its the doctor’s job to decide. Hmm. I see a scientist’s job, much of the time, as a dogged persistence in avoiding deciding, as you hunt down the sometimes stubborn truth. You have to steer carefully between the Scylla of shallow herd fashion and the Charybdis of renegade self delusion, but the aim is constant – to discover what really is the case.

Of course we have statistical methods for dealing with uncertainty, whether it be missing information or true randomness. But even here, as scientists, we avoid jumping to a conclusion, as a fundamentally unsound thing to do. All I can do is tell you that on Hypothesis A, you would have been pretty unlikely to get that measurement. Doesn’t necessarily mean its wrong though… (Pour beer. Cue usual frequentist vs Bayesian argument. Fail to come to conclusion. Drink more beer.)

But for much of our worldly lives, its not about truth, and its not about decision – its about action. You can see this trio as a chain. You cannot take a sensible action unless you have made a wise decision. You cannot take a wise decision unless you know what is and what is not. Each step limits the landscape for the next, but does not fix the path. Well thats what Hume said, which is good enough for me, as he is an Edinburgh Local Hero. Got a statue on the High Street and everything.

We see this every day in public policy – should we punish Assad? Should we allow fracking? Anybody care to postulate the relevant probability distributions in the Syrian case? Thought not. What makes these debates so difficult is not just that we have to act before all the options or their consequences are clear; or that we have to decide whats going on before we know all the facts; its that different people are not even trying to achieve the same ends; and sometimes they don’t even realise this.

A curious and frustrating example is racial profiling. If your aim is to maximise the number of terrorists you stop, regardless of anything else, its hard to deny the statistical fact that if you randomly stop young asian looking men with beards you will do better than if you randomly stop middle aged white women. But if your aim is to minimise the number of terrorists you create over a period of years, you could be making a big mistake.

A few days back, I followed a Twitter link to this beautiful little video. A black American woman explains how she was asked out of the blue for two types of ID, and looked up in a bad-check book, at a supermarket checkout. Her white sister in-law, immediately in front of her, was not asked for ID. The  sister used her white privilege to step in and address the inequity, which is the political point of the story.

However what I found intriguing is that the woman telling this depressingly normal story is so clearly middle class, articulate, intelligent and trustworthy. It sounds like the checkout girl was not being mean, but dim. At the back of her head was not necessarily emotional dislike, but instinctive statistical reasoning – if I stop black people, I will find more bad checks. Well this is probably true, but its a bit like the old gag about the price of fish in Billingsgate market being correlated with the size of women’s feet in China. Most bad checks will be written by members of the impoverished underclass. Due to hundreds of years of social, economic, and political repression, black people in the USA make up a larger then average fraction of the underclass. But the woman in that video is patently not a member of the impoverished stressed out underclass. So what’s depressing is that this isn’t obvious to a supermarket checkout girl. Why can’t she read the signals?

So.. I guess education, in the largest sense, is the answer. Maybe we can’t avoid profiling. We just want better profiling. Academic readers can draw the analogy with citation statistics and divert the conversation as they wish.

Anyway. Got some grant applications to re-read.


The pale cast of thought

October 7, 2007

Discovered a great new blog last week – the AstroStat Slog. It reminded me how Astronomy has some things in common with social science. Pause for spluttering by hard nosed physical scientists. What I have in mind is that its hard to do controlled experiments. You have to let Nature do the experiments for you, and then work out whats going on. This causes all sorts of problems, but in particular you often need to be good at statistics, to stop yourself being a sucker for all those fascinating flukes.

Even worse, often you don’t even know the underlying probability distributions of your variables. Assuming everything is a Gaussian is a dangerous mistake. Cue non-parametric statistics. Years back in the 80s chum Martin turned me on to the Mann-Whitney U-test, the Spearman rank correlation co-efficient, and other delights. I was hooked. Martin’s secret weapon was a book called “Non parametric statistics for behavioural scientists” by Sidney Siegel. It was full of arcane secrets but still written in plain English. Impressive. An updated version with John Castellan is available.

Why is doing statistical analysis so addictively satisfying ? I see it as a kind of merger of thought and action.

As an Experimental Scientist, you need to think hard and slow; to be absurdly rigorous and avoid the traps set by sloppy thought; to root out the assumptions you didn’t realise you were making; to experiment and test in a carefully controlled manner, and conclude only when you are sure. Don’t jump in. Stop, Think, Test.

The Busy Executive (or the Neolithic Hunter..) laughs. Pause and you are dead. Sicklied o’er wi’ the pale cast of thought you do not want to be. Sure, you don’t want to be dumb. But over the years, you build up your experience, knowledge, and instinct. It waits inside you, a coiled spring, ready for release when you need a Decision. Even if you have plenty of time, you never have all the information you need. Assemble what you have, run it through your judgement mill, and go for the best bet.

The Astronomer’s instinct is to be just like the experimental scientist but its impossible. You just can’t do the experiments you want. You have to make do with the ragbag of facts Nature has provided. Like the executive you need to take a decision with incomplete information, but you want it to be somehow impartial and rigorous. Well tough. You can’t absolutely and safely decide. You just can’t. But given any possible decision, you CAN say how likely it is you are being fooled. I can’t tell you which horse will win, but I can tell you whether I will accept that wager.

I’d better stop now or I am in danger now of drifting into the sticky morass that is the Frequentist-Bayesian debate.

Once I almost read a paper called “Are probabilities propensities ?” but it was lunchtime so I decided not to.