Skip to content

Randomized Controlled Trials for Dummies (and for smart people like you too!)

January 20, 2014

Weight LossFat HealthFat ScienceExerciseWeight Loss SurgeryDiet Talk

In my previous post, I talked about why “normal” people shouldn’t be afraid of science and that you don’t need a specialist degree to follow along with the majority of papers in the fields Research Analysismost likely to interest readers of this blog. This time, I’m going to talk you through a published paper, showing you what to look out for and how to interpret it. I’m going to start with an experimental study – that’s one where different groups of people did different things. If there’s interest, I’ll write some more posts about different types of studies.

Quick note, in most of the scientific literature, weight categories are split by body mass index (BMI), and the word “overweight” is usually used for individuals with a BMI between 25 and 29.9, and “obese” for individuals with a BMI of 30 or higher. I don’t use these words when I blog because they medicalise body size, but I do use them when I discuss papers that also used them. If it makes you feel any better, when I talk to scientific audiences about my own work, I use the word “fat.” It totally freaks them out!

OK, off we go.

A paper will most likely be made up of an Introduction, a Methods section, Results, and a Discussion.

Introduction — The Introduction section will be background material: obesity is terrible blah blah blah. The Introduction is also supposed to identify why there is a gap in the literature and what question has yet to be answered adequately. This then provides the rationale for the present study. So the most useful part of the Introduction is the last couple of sentences, where they explain what the purpose of the study is and what they expect to find. You can read the whole Introduction if you’re interested, but if you want to know about the particular study, you can skip this bit for now and dive right in.

Methods — The Methods section is where the nuts and bolts are. The quality (or otherwise) of the study is predominantly in here. The main things you need to look at in an experimental study are in the acronym PICO: Population, Intervention, Comparator/Control, and Outcome. These should all be covered in the Methods section. I’ll take them one at a time.

PopulationWho was being studied? Men, women, or both? Children or adults? How old were they? Was it a wide range, for example, or just elderly people, or people in their 20s, etc.? In the case of a study on overweight/obese people, how did they define overweight/obese? By BMI? If so, did obese mean anybody with a BMI over 30, or did they limit it to a certain range (e.g., 30 to 40 only)? Did they just use “overweight” for anybody with a BMI over 25 (not so common these days, but you do still see it sometimes)?

The next question is a biggie: where did the subjects come from? How did the researchers find these people? This can make a really, really big difference. If they advertised for people in local papers, what did the advertisement say? They don’t always tell you this, but it’s worth bearing in mind. Did they advertise for overweight people to take part in a study of exercise? If they did, then the people who responded to the ad are likely to be more interested in exercise than people who didn’t respond, and this might mean the results of the study aren’t as applicable to other types of people. Did they advertise through local weight loss groups? If they did, then the people they get are overweight and unhappy about it. These people might not be the same as fatter people who are not going to weight loss groups. Did they get them from a diabetes service at a hospital? This means these people already have diabetes (obviously). Is it in a wealthy area or a deprived one? Did they come from doctor referral? In which case, what instructions were the doctors given? Were they asked to refer everybody who met certain criteria, or was there some subjective input from the doctor?

For example, in a study of Polycystic Ovarian Syndrome (PCOS), if you only take people who have been referred to a specialist for PCOS, referral bias might be a problem. Doctors are more likely to refer fatter people to a specialist for PCOS, even if they don’t have it, and less likely to refer thinner people, even if they do. So analysis of a PCOS clinic that found a majority of the patients were fat does NOT prove that fat people are at higher risk of PCOS – it only shows that fat people are more likely to be sent to a PCOS clinic. This was actually demonstrated in a recent study, if you’d like to read it.

And so on. None of these are necessarily problematic per se – you have to get your participants somehow – but they are really important to consider when you interpret the results.

Intervention — What did they do to them? This should normally be explained in quite a lot of detail. The questions you need to ask will vary a lot depending on the type of study it was. But, generally speaking, is it clear what they did? Is there anything they didn’t explain that you’d like to know? Can you see any obvious problems? Do you think the intervention would be a good way of testing what they wanted to know? You’d hope the answer would be yes, but sometimes it might not be!

Control or Comparison group — Who did they compare their intervention group to? There are a few options here. If there was just one group of people and they were measured before and after, then those people act as their own control group. For example, if you wanted to see if diet XYZ caused weight loss, you could take 10 people, weigh them, put them on the diet for 12 weeks, and weigh them again at the end to see if they lost weight. This is a before and after study. It gives you some information, but there’s a lot you don’t know. Was there something special about this group of people that influenced how much weight they lost? Would these people have lost weight even if they hadn’t gone on the diet? Just thinking they were taking part in a weight loss study might influence their choices in small ways and lead to weight loss, and it might have nothing to do with diet XYZ – this is called the placebo effect. You can’t answer any of these questions from a before and after study.

RCT flow imageUsually, people will conduct a study with an intervention group and a control group. For example, you might have 10 people on diet XYZ and another 10 people who are on a waiting list for treatment, but who don’t get to go on the diet until after the study (because it wouldn’t be fair to deprive them of the wonderful treatment that the other group got just because they’d been “unlucky” enough to be allocated to the control group).

An alternative would be to have what’s called an active control. In an active control, the control group does something, as opposed to nothing, but it’s a different something from the first group. So you might compare two diets – one group would be on diet XYZ, the other group on diet ABC. Or it might be an alternative to this diet, usually something that is currently ‘standard care’: one group goes on diet XYZ with diet sheets and other resources, while the other group may just get a handout about healthy eating. Sometimes, you’d get more than two groups, and these may be combinations of the above.

Ideally, you want the two groups to be as identical as possible in every way except the intervention. For example, if you put all the women on diet XYZ and all the men into a control group, that’s not a good comparison. One way to achieve the similarity you need is to take a group of people and then randomise them into one of the groups. This means that when person A walks through the door, there is an equal chance of them ending up in group 1 or group 2. Nowadays, randomisation is usually done by computer, but sometimes people will use other methods (some of which are rather less random than others!). They might allocate the first 10 people to group 1 and the next to group 2. But the first 10 people who responded might have been especially motivated, making them a bit different to the next 10 people. If you ever see the words “quasi-randomisation,” take it with a very large pinch of salt. Anyway, they should report how they did the randomisation.

So to sum up, a controlled study is better than an uncontrolled study (if the control is appropriate), and a randomised controlled study is the best kind. You might wonder why people don’t always use this method. Answer: resources. You need more people, it’s more complicated, there’s more organisation involved, and it’s more expensive. Just going from a before and after to a controlled study doubles the number of people you needed. If you wanted an intervention group, an active control, and a wait-list control (probably the best combination), now you’re up to three times as many.

Plus, there’s the issue of “blinding.” This has nothing to do with vision, as such. Blinding is when people don’t know which group they are in. So in a drug study, where you have an active drug and a placebo, the placebo should look and taste exactly like the active drug, be in the same kind of package, same size, same colour, etc. Both drugs should only be identified by a code number (so the placebo doesn’t say “placebo” on it!). The patient wouldn’t know which group they were in (single blind). Even better is when neither the patient nor the doctor/scientist knows either (double-blind), that way subtle, non-verbal signals can’t be picked up by the patient. Somewhere, there will be a list of what each ID was and whether subjects were on the active drug or the placebo. This list must be kept safe until the very end of the study, after all of the data have been collected.

Sometimes it’s impossible to blind a study. For example, if half of your participants are on diet XYZ and the other half get a healthy eating handout, it’s pretty obvious which group they are in. The best thing to do here is, first, not tell participants about the groups in the study – they only know they are taking part in a study on “nutrition,” for example. They don’t know they’re on a diet and some other people are just getting a handout. Also, the handout people don’t know there are another group of people who are being put on a full-blown diet. Not surprisingly, people don’t like to think that they’re in the placebo group! picket our office

The other thing to do is to have separate people to measure the outcomes who weren’t involved in the rest of the study, and blind them to who was in which group. For example, you could get a nurse who wasn’t involved in the study in any way and had no previous contact with the participants and have her do all the weighing at the end of the diet study. The nurse wouldn’t be told which participants were in which group, so she couldn’t make little rounding up or rounding down judgment calls, either intentionally or subconsciously, based on what she though the outcome would or should be for each group. And that brings us to our final letter.

Outcome — What outcome measure was used? Was it a reasonable choice? Can you think of something better that could have been chosen? Why didn’t they use that? It’s not always practical to use the best method though. For example, if you’re interested in whether an intervention increases heart muscle, the most objective test would be to cut out the heart and measure it! Alternatively, you might think underwater weighing is more accurate than weighing on a scale, but that requires people to get into a swimsuit and be dunked under water in a giant tub while blowing all the air out of their lungs. (Sidenote: not all fat people submerge easily – I don’t. No matter what they tried when we did that module during my masters, they couldn’t get me completely underwater. I float. Natural buoyancy!) Or perhaps a BodPod or a DEXA scan would have been better. But that’s a $30,000 piece of equipment.

Did they do the next best thing, and how might that compromise have affected the results? How did they measure the outcomes? How reliable was this method? For example, did they weigh people in a lab or did they ask people to measure their weight at home and just call it in? Was everybody weighed in the same way? All on the same scale by the same person? Was the scale calibrated beforehand to make sure it was accurate? Was this repeated at regular intervals if it was a big study? Was there a protocol for clothing? If some people had their shoes on and were wearing three pounds of bling and other people weren’t, that would affect the results.

If you read my previous post and recall the example of the newspaper story about British men being the best lovers, questions that could come to mind would be:

  • How do they know?
  • Who did they ask? How did they measure this? Did they get a team of volunteers to sleep with men from different countries? (Probably not!)
  • Or did they just do a survey of some kind? How many people answered the survey? 10? 100? 1,000?
  • Where did they find these people? Random dialing phone interviews? Ads in adult magazines? How might that have biased the results?
  • What questions were on the survey? How did they define “best”? The most foreplay? The highest percentage of orgasms given? The willingness to try new things? And so on and so on.

The final thing in the Methods section is an explanation of what statistical analysis was done. Unless you have studied stats, you probably won’t understand it. Don’t worry, just skip right over it. See my previous post for why this shouldn’t worry you too much.

Results — And that finally brings us to what the study found and how it was reported. There will usually be some text, some tables and/or some graphs. The graphs should give you a relatively straightforward overview of the main findings in an easy-to-understand format. Tables are also worth looking over, even if they include stats that you don’t understand. The text shouldn’t duplicate results presented in tables and graphs, so it’s not just the same thing in written form. They are complimentary, and you’ll need to read/look at both.

So, what did they tell you and what did they miss out? Did they measure four different outcomes but only report one? Why? Did they recruit 300 people, but only 120 are in the results? What happened to the other 180? Did they drop out? Authors don’t always tell you this stuff, and you need to keep an eye on the numbers. Use a calculator if you have to – you’d be amazed how often numbers it Table 1 or Figure 2 don’t match what’s in the text. Sometimes that’s because something is going on that you haven’t been told about. Sometimes it’s just typos and sloppy proofreading. Both of these possibilities tell you something about the study and the authors.

Going back to dropout rates, diet studies have notoriously high rates of people dropping out – they get fed up because they haven’t lost as much weight as they’d like. Or they splurge and are embarrassed because they haven’t stuck to their diet, so they don’t come back. And then there’s normal levels of attrition when life intervenes – sometimes people just have to drop out for a variety of reasons (e.g., they lost their job, moved house, had to go and take care of their sick mother). This happens – not much you can do about it, other than recruit enough people at the start to give you a bit of wiggle room if you lose some so that your results are still useful. The question for the reader is how much dropout occurred, was it the same in both groups, and what did the researchers do about it?

Going back to diet studies, if 5% of the people in the diet group dropped out, but 30% of the control group did, that suggests a problem. If it was about the same in both groups, that’s better. Were the people who dropped out different in some way from the people who didn’t? Were they heavier to start with? Or did more women than men drop out? Researchers should look at this and report it. And what did they do with the numbers? Let’s say you start with 300 people in a diet study; 150 of them are on the diet and another 150 are in the control group. Let’s say, 70 of the dieters dropped out and only 80 were left at the end of the study. These 80 lost an average of four kilos (nine pounds) each. Can we say that the diet causes four kilos of weight loss in most people? Perhaps the 70 who dropped out didn’t lose any weight, or maybe gained weight. By only looking at the people still in the study at the end, the results could be misleading.

There’s a name for this. It’s called a per protocol (PP) analysis. It means only analyzing the people who stuck to the plan. The alternative is called an intention-to-treat analysis (ITT). This means that you analyse everybody who started the study and who you intended to be included. Obviously, if they don’t turn up at the end of the study and you don’t have their final weight, you don’t know how much weight they lost. So how do researchers deal with that? Often in a diet study, they will use the last weight they had a reading for. So if people are weighed at week 6 and week 12, but they dropped out of the study after week 6, the researchers might have used the week 6 number. While this is better than ignoring them altogether (as presumably it will show a smaller weight loss at week 6 than week 12, so the average for the group as a whole would be lower than if they’d been excluded totally), it might still be misleading.  If these people were struggling with this type of diet, and being on it for six weeks increased their cravings and they actually gained a huge amount of weight over the next few weeks, which was why they didn’t come back to be weighed again, then not taking into account that diet XYZ makes some people gain a lot of weight will be overestimating the effectiveness of the diet.  (Of course, I’m talking short-term effectiveness here. We all know how effective diets are over the long-term. Not.)

Next question: what do the results MEAN? What numbers are reported? Is there any difference between the groups (or between the start and the finish)? This is usually shown with a statistical significance test, or P value. You’ll probably see something like this:

Diet group XYZ lost 3.2 kg, diet group ABC lost 2.5kg, and the control group lost 0.4 kg (p < .05).

Don’t be scared. Look at the numbers. Ignoring the P for a minute, it looks like the two diets produced a small amount of weight loss, and the control group didn’t lose much. The XYZ group seems to have lost a little bit more than the ABC group, but it’s not a lot. Is it an important difference? Is XYZ “better” than ABC?  This is what a statistical significance test will tell you. Is there a difference between the groups that is big enough that it probably wasn’t a fluke? At a glance, I’d say diet versus control, yes; diet XYZ vs. ABC, probably not.

The main statistic is that P value. By convention (and for no really good reason), most people use a cut-off value of .05. If the P value is less than .05, that constitutes a significant result. In normal English, the difference probably wasn’t a fluke. It’s not 100% guaranteed, but probably. If there were only two groups (diet versus control) and P < .05, then you know that there was a difference between these groups. If there were three groups, like in my example above, you have to read more carefully. It could mean that somewhere in the three groups there was a statistically significant difference, but it doesn’t mean all three groups were different from each other. It could mean the two diet groups were different from the control group, but not necessarily from each other. You have to read carefully to see what they did, or this information might be in the small print in a table, for example. It’ll be in there somewhere.

The other thing to remember is that statistical significance doesn’t always mean the results are “significant” in the everyday sense of the word. As study size increases, you are more likely to get a statistically significant finding with a smaller group difference. For example, a big randomized controlled trial of a new expensive hypertension drug found that the drug was more effective than the existing commonly-used medication. If the difference between the two groups was 3 mmHG (BP units), that isn’t a lot, even if it was statistically significant. “Normal” blood pressure is about 120/80. High blood pressure is 160/100. If your blood pressure is 160 and this expensive new drug reduces it to 157, it’s still not worth spending the money on, even if it was statistically relevant. It won’t make a difference to your health.

Because of the study size effect, you will sometimes see, especially in a systematic review that combines data from multiple studies, that the 3,000 people in the diet-plus-exercise group lost an average of 4.5 kg in a year, and the 3,000 people in the diet-only group lost an average of 2.5 kg in a year, and there’s a P value of < .05. That’s a statistically significant result, “proving” that diet-plus-exercise is more effective for weight loss than diet alone. These numbers are fairly accurate: adding exercise to a diet study tends to give roughly 2 kg (4 lb) extra weight loss a year. But really? An extra 2 kg might be statistically significant, but what’s the impact in real life? (BTW, to hell with dieting, but don’t give up on exercise. It’s fab.)

And this is why you need to be very careful about headlines, and even paper titles and abstracts. Often they will suggest there was a difference between treatment groups, and there was. But it doesn’t mean it was particularly impressive. A trial of Weight Watchers found that WW participants lost an average of five pounds in two years. The study was flawed for lots of reasons, but Ragen Chastain’s response is my favourite: “I could lose five pounds in two years just by exfoliating regularly!”

One of my own particular bug-bears is the National Weight Control Registry, who are experts at this kind of misleading crap. One of their published papers has the title “Weight loss maintenance in successful weight losers: surgical versus non-surgical methods” and found no difference between the groups. The conclusion (with much fanfare and publicity) was that maintenance of weight loss after dieting was as good as after surgery. But if you scratch the surface, yes, the results were similar in both groups, but they were shitty results – both groups were regaining (big surprise). I blogged about this one here.

Conclusion — This section can be very interesting, but is open to biases in interpretation, where the authors plug their own agenda. Reading between the lines is key, but at least this bit should be written in normal sentences and easy-ish to understand. This is where the authors should explain the implications of their study — the bigger picture. What do their results mean? Are they consistent with what other people have found? If not, why not? Do their results suggest we should be doing something different with patients? Do their results raise new questions? (They always do.) What kind of studies could answer these questions? And then, the obligatory, “More research is needed.”

Acknowledgements and Conflicts of Interest — This section should tell you if any of the authors are affiliated with institutions who might have a commercial interest in the outcomes, or if the study was funded by such an organisation. Theoretically, if science is truly unbiased, this shouldn’t matter. But it does seem to have an effect.

Title and Abstract — Which brings me to my final point. The first thing you will see if you look at the study is the title. The second is the abstract, which should summarize all those key PICO points above. This is a good way of getting a grasp of what is being done and why, but if you really want to know what happened in the study, you have to read the paper. It’s like buying a car. First glance at what it looks like will give you some ideas of what to expect, but you really need to go a little deeper before making that decision!

Free in PMCSo, fancy putting all that to the test? I’m linking to a randomized controlled diet study from 1999. This link will take you to the abstract, where you can have a quick search for the PICO information. Then, at the top right hand side of the page, there is an icon saying “Free in PMC.” Click on that to get to the full report. If there’s anything you don’t understand, just put it in the comments and I’ll try and answer them. Have fun!

Never Diet Again Sigs

11 Comments leave one →
  1. Feminist Cupcake permalink
    January 20, 2014 10:49 am

    Awesome Post!

  2. January 20, 2014 3:26 pm

    I’ve taken numerous science classes and learned the “scientific method” ad nauseum. However, I’ve never seen it explained in detail this great!

  3. Mich permalink
    January 20, 2014 7:02 pm

    This was never explained to me in courses about the “scientific method”. Only how classical studies were done, and a general overview of how it should be done today. But not a run-down of what a published article is about.


  4. Leila Haddad permalink
    January 21, 2014 7:00 am

    Thank you so much for this! This will clearly help me in future discussions with my scientist parents…

  5. January 21, 2014 2:46 pm

    Thanks everyone, glad it’s useful. Leila – can’t your scientist parents help you in future discussions with themselves?!

    Would anyone like to see more posts like this on other types of papers/studies? Or is it a bit dry?

    • January 22, 2014 1:21 pm

      “Dry” isn’t always a bad thing. I’m picking at this piece little by little. Concentration issues, doncha’ know, but I’ll keep working at it. Yes, more like this would be welcome, at least by me. 🙂

    • Oxymoronictonic permalink
      January 23, 2014 3:55 pm


    • Nof permalink
      January 23, 2014 4:56 pm

      So much more like this!

  6. Leila Haddad permalink
    January 21, 2014 5:41 pm

    I would love to see more of these, please, keep them coming

  7. Judybat25 permalink
    January 24, 2014 3:50 am

    This is wonderful! Thank you so much for this!!

  8. August 1, 2017 6:28 am

    yay very helpful and concise, thank you so much for writing this

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: