# Sexy stats and vegetable soup

*Trigger warning: Any statisticians out there, look away now.*

There’s a new kid on the block in the world of statistics, and it’s called regression. Actually, it’s not that new. The newest, sexiest statistical whizz-bangery is something called structural equation modelling, but I don’t understand that, so you’ll have to make do with regression. Welcome to part three in my series on understanding scientific papers.

Nine out of ten of the papers I read these days use regression to explore their data. Statistically, it’s pretty complicated, but conceptually, it’s not that difficult to understand. And I’m going to explain it to you with the help of some vegetable soup. If you can follow a recipe, you can glance at a regression table in a scientific paper and grasp in an instant what’s happening. Just to reiterate, my Soup Statistics Method is scientifically and mathematically unsound. If you know a thing or two about statistics, this will probably give you apoplexy. But for the rest of us, I find it works well enough and gets the job done. So let’s start cooking!

We’re going to make soup. Three of them actually. Vegetable soup, French onion soup, and carrot and lentil soup. Here, take a look at the recipes:

So you can see, stock is in all the soups, although you put a little less into the carrot and lentil because you want it to be thicker; there are a couple of onions in the vegetable soup, but there are obviously a lot more of them in the French onion soup, so onions are more important to the flavour of the French onion soup than they are to the flavour of the vegetable soup and so on.

Now, imagine you’ve tasted the perfect vegetable soup. In fact, you’ve got a bowl of the stuff in front of you and it’s amazing. But you don’t have the recipe, so you’re going to try and recreate it using what you know about the kinds of things that go into vegetable soup and a little bit of experimentation. For the first run, you make a batch using just stock, onions and carrots. You try ten cups of stock, two onions and two carrots. It’s not half bad — in fact, it’s probably around 60% of the flavour of the perfect soup, but it’s clearly not all the way there. So you try again, and this time you add three sticks of celery. Better. Now you reckon you’re about 70% of the way there, but still not perfect. You decide to add a leek. It’s nice, but doesn’t make a whole lot of difference because the soup already has two onions in it, and they have a more powerful flavour. You decide you’re only up to about 73% of the perfect flavour.

If you started over with stock, carrots and leeks, without the onions, the leeks might seem more influential. But once you’ve got the onions in there, they don’t make as much of an additional contribution. You noticed the original had peas in it, so you throw in a cup of peas. That takes you to 75% of the flavour of the perfect soup. So with stock, onions, carrots, celery, leeks and peas, you have managed to explain three-quarters of the perfect-flavoured soup, but there’s still another 25% that you can’t explain. There are obviously some other ingredients you don’t know about — a few more vegetables, some kick-ass seasoning, and so on. These things make the difference between “very nice” and “yep, got it!” But you decide that’s probably enough work for one day. You’ll settle at 75%, eat your soup, and maybe in a future experimental cookery session, you’ll try and figure out what else goes into the perfect vegetable soup. On the other hand, if you were on a budget and trying to make the best soup from the fewest ingredients, you’d probably stop after stock, carrots, onions, and celery. Yes, the leek and the peas improve the soup a bit, but not enough to warrant the extra cost.

Now you’re going to have a go at the carrot and lentil soup. You start the same way you did with the vegetable soup — ten cups of stock, two onions and two carrots. The onions are very overpowering so you decide to take them out (minus two onions), and add more carrots, which takes you to 70% of the flavour you’re after. Obviously, you need to add lentils, and you throw in five cups. Almost perfect. 95% of what you wanted to achieve. If only you’d known the magic ingredients were a teaspoon of tarragon and a dash of nutmeg, then you’d have got it 100%. But 95% is still pretty impressive and uses only three ingredients.

This is what we’re trying to do when we do regression — kind of. Seriously, statisticians, if you’re reading this, I did warn you!

To the right are some numbers from one of my own studies — don’t be scared, I’ll walk you through it step by step. I was interested in how weight stigma affects binge eating frequency, amongst other things. So I gathered a lot of data from nearly 400 people (if you were one of them, thanks so much) and then crunched some numbers.

First, I did a regression where I included things I expected to affect binge eating. I chose gender, age, BMI and whether or not you were dieting. This is equivalent to the basic “stock, carrots and onions.” Looking at the table, the B column gives me the amounts of each “ingredient” that best explains binge eating, when I use only these four ingredients. Ignore the top line (constant) — it serves a mathematical purpose and you don’t need to worry about what it means. I’ve put it in here because papers you look at will probably include it also.

The p column tells you the statistical significance; sort of like saying, how probable is it that this result is a total fluke.* For no really good reason, traditionally, anything less than .05 is considered to be significant (in the statistical sense); that is, you’re probably safe believing it’s not a fluke. The *smaller* the p value, the less and less likely that the result is a fluke and the more likely it is meaningful.

Right, let’s look at age first. It has a negative B value. What this means is that there is a negative (or opposite) relationship, so as age increases, binge eating frequency decreases. The B value actually tells you exactly how much it decreases. For every one unit increase in age (measured in years), binge eating frequency (measured in times per week) goes down by 0.003. So if person A is 30 years old, and person B is 40 years old (ten unit change in age), you’d predict that the older one would binge eat 0.03 (ten times the B value) fewer times per week than the younger one. This is a very small number. The impact of age isn’t big. It’s there, but it’s not big. In contrast, BMI has a positive relationship with binge eating frequency. In other words, as BMI goes up, so does binge eating. It’s also a small effect, but it is there. And both of these have p values under .05, so these do seem to be making a significant contribution to the “recipe.” I’ll come back to this.

Gender is slightly more complicated because it’s not measured on a scale (usually). For calculation purposes, men were coded as “0” and women as “1” (no offence fellas, it doesn’t mean I don’t like you). So the negative relationship means that as gender “goes up” (i.e., gets closer to 1, or more female), binge eating goes down. This is a bit surprising because although increasing in men, disordered eating tends to be more common in women. Having said that, -0.039 is a really small number, so the effect isn’t big. And if you look at the second column, the p value is huge! Remember, we want small p values not big ones. In other words, there is no real relationship between gender and binge eating (in this group of people, using only these four ingredients). For all intents and purposes, the effects for men and women were the same.

Dieting status was also a negative B value. You need to know that there were three options on the dieting question I asked people. People who were dieting to lose weight were coded as “1,” people who were watching what they ate so as not to gain weight were coded “2,” and people who were not dieting in any way were coded as “3.” So the negative relationship tells us that as dieting status “goes up” (i.e., gets closer to 3, or not dieting), binge eating goes down. Or think of it the other way: weight loss dieters binge more than non-dieters, which fits with most of our personal experiences.** And if you look, the p value is way less than .05, so it tells you that dieting in an important ingredient in this particular recipe.

The final column tells you the R-squared value. This tells you how close to “perfect” your recipe is. Think of it like the 50%, 60%, etc. in getting the vegetable soup right. It’s given as a proportion between 0 and 1, where 0 is completely wrong and 1 is equal to 100% right. Multiplying the R-squared by 100 (move the decimal point two places to the right) gives you the actual percentage. So this first recipe (model 1) accounts for 5.4% of the differences in binge eating frequency between my participants. Which is actually not much; there is still almost 95% you can’t explain.

And this is where statistical significance can differ from real, actual life significance. Statistically, this model is “significant,” and statistically, age, gender, BMI etc. are significant, but as we saw above, each individual contribution to the effect on binge eating is really small and, overall, the entire recipe is only good for around 5% of the differences in binge eating. So based on just knowing somebody’s age, gender, BMI and dieting status, yes, you could make an informed guess about their frequency of binge eating. And yes, your guess would be better than if you didn’t know any of those things about them, but overall, in real-life terms, not a heck of a lot better.

But that was just the basics — the ingredients I knew from previous experience that were likely to be important. What I was really interested in was stigma. So in the next step, I added in the scores related to how often subjects had experienced stigma or discrimination. This is model 2, or the second recipe. This whole model now explains 6.3% of the variation in binge eating frequency (from the R-squared value). So it’s a little better than the last recipe, but not a lot. Think about adding leeks to your vegetable soup. Looking at the B value for stigma, it’s positive, which you might expect. The more stigma you experience, the more you binge eat. Again, the effect is small, and it doesn’t quite reach statistical significance, but it’s there.

One interesting thing here, though, is that adding new ingredients can change the overall contribution of other ingredients already in there. It’s not just additive. The main change here is that BMI has become non-significant. Its p value has gone above .05. So if you didn’t know to look at stigma (model 1), you’d think that fat people binge eat more. If you were a modern day obesity scientist, you’d say that there is a causal relationship between fatness and binge eating but you’d be wrong. Once you account for experiences of stigma, BMI doesn’t contribute much any more. Again, think leeks and onions. What seems to be happening here is that fat people are stigmatised more, and stigma is what’s driving the binge eating. If you compared my 4oo fat people to a different 400 fat people who had never been stigmatised (good luck finding them), you’d probably find that even though they were fat, not having been stigmatised meant that there didn’t seem to be a relationship between BMI and binge eating. It’s the stigma that’s driving it, not the weight. But I didn’t do that so I’m just speculating here.

For my third and final attempt at getting this vegetable soup right, I also included scores on questions designed to see how much you believed bad things about yourself because you were fat (internalised stigma). Things like “I don’t know why anybody attractive would want to date me because of my weight,” and so on. This is where it gets really interesting. If you look at model 3, it now explains nearly 19% of the variance in binge eating frequency. That may not seem a lot compared to our 70% vegetable soup recipes, but in real-life terms it’s a lot! And it’s a bloody big jump from model 2.

Internalised stigma seems to be a really, really important ingredient in this recipe. Even though the actual B value is still quite small, its p value is below .oo1 and the software doesn’t report numbers that small. But the really cool thing is if you look at what happened to the p values of the other ingredients when you added internalised stigma. Age has become non-significant, dieting status has become really, really non-significant, and even experienced stigma is now heftily above the .05 level. In other words, before you thought to look at internalisation (or before you included it in your soup recipe), it seemed like age and dieting were important. But now you see they’re not. They were really capturing some of the effects of internalised stigma.

As you get older, you tend to have less internalised stigma, and so you are less likely to binge eat. If you have a recipe with just age, you’ll see that older people binge eat less. But you wouldn’t have been able to tell that this was actually because they had lower levels of internalised stigma. Once you look at the stigma, that actually explains the effect and age becomes non-important.

Another example: there is a strong relationship between shoe size and spelling ability! In general, as your feet get bigger, the better you can spell. This is true. But once you take into account how old people are, the shoe size drops out of the equation. It was acting as a proxy for age. Toddlers tend not to be able to spell very well. As they get older, they get better at spelling (hopefully). That their feet get bigger at the same time as the rest of them does is just a by the by. We say age is a “confounding factor” — if you only looked at shoe size and spelling, you’d find a relationship, but age was the confounding factor. By not including it, you’re not really getting the true story.

In the same way, dieting status also became non-significant once we looked at internalisation. Think of dieting as shoe size and binge eating as spelling. What’s happening here is the more you believe the crap about fat people and think you are worthless, the more likely you are to binge eat. And, by the by, the more likely you are to try and lose weight. So again, without looking at stigma, we seem to be able to predict binge eating from dieting, but really, internalised stigma is responsible for both the dieting and the binge eating. In fact, once we looked at those negative self-beliefs, ALL of the other included ingredients became unimportant. So the take home message is this: don’t believe the hype. Other people might assume we are worth nothing, but you’ll be a lot better off if you don’t agree with them!

Right, fancy a go at reading an actual paper? This is a link to a paper on the prevalence of eating disorders in 1998 versus 2008 in Australia. It’s a free paper, so you can read the whole thing. Or, if you only want to look at the regression, you can just click on Table 4. The title of the table tells you what the outcomes are: physical health and mental health quality of life (as opposed to my example, where the outcome was binge eating frequency). There are a lot of extra columns***, but you can get all you need to know by looking at the ‘”ingredients”: the B column, the p value, and the R-squared. Also, they used a slightly different regression method so they’ve only printed the final recipe (no steps). But one glance, and you should have a pretty good idea of what they found. Enjoy. And if you have any questions, pop them in the comments.

——————————————————————-

* Technically, this is not what the p value means, but for the lay person, it’ll do.

** Technically, if you know your stats, I’m describing correlation, not B coefficients, but again, close enough for the lay person.

*** If you’re a total nerd (or a bit more advanced) and want to know what all those columns actually mean, read on. If you are a sane person, seriously, don’t bother! The B value (first column) is the regression coefficient. If you drew a graph of, for example, objective binge eating and MCS (these examples are from the paper), the B value would be the slope of the line. For every one point increase in objective binge eating, you’d get a 7.27 point decrease in mental health quality of life. For every one year increase in age, you’d get a .04 increase in MCS, and so on. Binge eating is measured in how often they thought they self-reported binging (how they measure things is explained in the methods, but the paper doesn’t tell us what numerical scale is used, although they do tell you the name of the questionnaire they used so you’d have to look it up from there), so let’s say hypothetically from 0 (never) to 5 (multiple times a day). Because age is measured in years (say from zero to 100), is a one year increase more or less important than a one point increase in the bingeing scale? We don’t know because they’re in different units. Think of the standard error (SE) of the coefficient as the variability between different people (again, close enough), and it has the same units as the B column (e.g., age, binge frequency). The 95% confidence interval (CI) is how confident you can be in the number it came up with, and it tells you that if you ran the experiment 100 times (or in my case, if I collected data 100 times over), 95 times out of 100, you’d get a value between -9.58 and -4.96. These numbers are both negative, so 95% of the time, you would get a negative number and you can be pretty sure that the -7.27 is a reasonable estimation. *If* the 95% confidence interval ranged from a negative to a positive (if it included zero somewhere in the middle), the estimation of that negative 7.27 would be very unreliable. You want your confidence intervals to be either both negative or both positive, but not one of each. The *Beta* (*B*) value is just the B divided by the SE and this serves to cancel out the units. So while you can’t compare one unit of binge eating frequency with one year of age to see which is more important, the *Beta* values are comparable. They have no units. They’re called “standardised coefficients,” in case you care. The bigger the number the more important. Whether it’s positive or negative doesn’t matter; that tells you whether MCS goes up or down, but the number itself is the size of the change. Big numbers mean big change. So now you can see that binge eating was more important than age, although they work in opposite directions. The t statistic is a test of whether the *Beta* value is significantly different from zero, and the P value is actually a measure of the significance of the t-test. Aren’t you glad you know this. Now you can go back to looking at just the B (the regular, unstandardised, no italics version) and p and understanding equally well. You’re welcome 🙂

Totally just turned in an assignment where I said a small p-value=no statistical significance. I guess it’s a good thing I already have extra credit 😉 Thanks for breaking this down; it’s really helpful!

Oops 🙂

LOVE, LOVE, LOVE this article.