zakruti.com » Knowledge, science, education » Crash Course

P-Hacking: Crash Course Statistics #30

video description

Rating: 4.0; Vote: 1

Today we're going to talk about p-hacking (also called data dredging or data fishing. P-hacking is when data is analyzed to find patterns that produce statistically significant results, even if there really isn't an underlying effect, and it has become a huge problem in science since many scientific theories rely on p-values as proof of their existence! Today, we're going to talk about a few ways researchers have hacked their data, and give you some tips for identifying and avoiding these types of problems when you encounter stats in your own lives. XKCD's comic on p-hacking
Date: 2022-04-04

← How Not to Set Your Pizza on Fire: Crash Course Engineering #15

Drugs, Dyes, & Mass Transfer: Crash Course Engineering #16 →

Related videos

Have you ever had an imaginary friend Watch this

• TED-Ed

Why did the East Try to Reconquer the West

• Knowledgia

Tarantino’s Muse Or A Hollywood Victim - Uma Thurman - Full Biography

• Celebrity Biographies

The Dark History Of Slavery - Compilation

• Weird History

F1 Chief Mechanic Answers F1 Car Questions - Tech Support - WIRED

• WIRED

How The UAE Just Got Wrecked In Yemen

• RealLifeLore

Comments and reviews: 10

MrShysterme
How to get a tiny p-value every time? It's very easy conceptually. Increase your sample size. In fact, in areas of science where very large sample sizes are the norm, the p-values are almost always less than 0. 05. Hmm. so if with very large studies p-values are kind of pointless. does this mean that p-values are kinda dumb in general. Yes, and many, many people think that and write about it.
So how did p-value type analyses become so popular? Easy. It provides a veneer of objectivity when scientific decision making is not totally objective. You can't have a machine spit out a yes or no, usually. Also, with moderate sample sizes, a low p-value indicates a relatively strong strength of effect. Hmm. so why not just look at strength of effect directly and lay everything out so that individuals can decide upon it? Well, that's a better approach.
Cue first year math students, people that have taken one stats course, or a semi-competent stats teacher/researcher to call me crazy.
Research this question about null hypothesis and significance testing and the controversy around it. There are hundreds of resources.
reply

Patrick
So, in order to believe any statistic we see, and even then the statistic might itself be a chance result, we need Ph. D. 's in statistical analysis, an army of scientist employees, and sophisticated lab and computer equipment to verify claims. Oh, and access to the journals said statistic was published in, each of which will cost a pretty penny.
Not just CC, but in general, the media and experts do a fine job of outlining the problem but rarely give solutions and even when they do, the solutions are impractical or theoretical.
The same media and experts, due to shenanigans like p-hacking, have broken the public's trust in what they have to offer.
My solution: mandatory statistics classes beginning in Kindergarten. This will take forty years for any critical mass of trust to return to the societal influencers because in forty years, today's kindergarteners will have the power to influence and the old guard of prevaricating and unethical doyens will have died or be too old to dictate the direction of society in any meaningful way.
Or, wait for the Matrix plugin seats. They're coming!
reply

CanuckMonkey13
I just want to say, when you calculate that with twenty tests all looking for p < 0. 05, there is a 65% chance that we will get a false positive, you said that this -might be higher than you would expect-. My statistically challenged brain sees it the other way, though: if each test has a 5% chance of giving a false positive and we run 20 tests, then we get a 100% chance of a false positive because 20 - 5% = 100%!
(Of course I understand why this is wrong, and you've explained it very well over the course of this series--I just wanted to point out what seemed to me to be the obvious mistake to make)
reply

Martin
Actually, there is a small misconception in this video (the table in the beginning. When we reject the alternative hypothesis that does not mean that the the null hypothesis is true. It simply means that we do not have statistical evidence to reject the null hypothesis - we cannot say with 100 % statistical certainty that Ho is true. Andy Fields writes in 'Discovering Statistics in SPSS': -If the p-value is greater than 0, 05 you can decide to reject the alternative hypothesis but that is not the same as the null hypothesis being true- (Fields, 2018: 76.
reply

daddyleon
During my Ba and MA I certainly did feel the strong push to -just try this or that- so the p-values would be a little bit more acceptable. I never really got a straight and clear answer how that wasn't similar to cheating. But everyone seemed to think it was quite okay - That was a weird experience.
Gaslighting, that's called right?
reply

NaoTa
how to get p-hacking out of -for profit- research?
1-make only -no profit- & massively perreviewed research significant enough for laws.
2-implement UBI and/or have a -national scientific research fund- that allows for scientists to follow through theier research within ethical conditions for themselfs and their results.
reply

John
P-hacking which study contributed to not vaccinating? Because Dr. Wakefield's paper was only a case study. Or are you referring to Dr. William Thompson's divergence from the proposed analysis plan in the CDC and thus p-hacking to REMOVE the significance? Gotta keep our facts straight from the -lies, damn lies-.
reply

ncooty
All of that and you didn't mention the common terms -fishing- and -exploratory analyses-, nor how such approaches can be used ethically to generate new hypotheses, or be methodologically accommodated, such as with screening and hold-out samples (cross-validation.
reply

TimeWizud
But that information useless in that it does not offer any details about which variables (jelly beans) are significant or even how many! By adjusting your p-values you've CHANGED your null hypothesis. There are better ways of doing this!
reply

Diego
Hi! I really like your videos, I'll eventually go through all of them. Please make a video on multi-variate analysis. I' beginning to understand them but would really like for you to explain them (PCA, CVA)
reply

Add a review, comment