Umm, isn't that the topic of the article in the first place? It may not be a widespread or long-standing stereotype, but any study that generalizes to the whole population from a sample size of 5000 people is making a stereotype, not a solid conclusion. That's my opinion, anyhow. I am not a fan of statistics because they can say whatever you want them to.
That's an opinion. Mine is that you can't. Or, more to the point, that with 5000 people you CAN make statistical statements, but they say whatever you want them to. Say we're doing a medical survey about some new drug. We want to see if there are side effects. So we test a broad spectrum of people. Half male, half female. Equal numbers hispanic, black, white, asian, indian. Now we have 500 of each sub-category, which still ain't bad if you're just testing for racial/sex effects. But what if there are side effects based on age, activity levels, sun exposure, pre-existing conditions, pregnancy, or something else? Once you get down to that granularity, you may have only five or less. So your SINGLE affected diabetic pregnant mexican woman suddenly is 20% of your sample, which isn't realistic since your "sample" is now 5 people.
And that's if you ARE willing to go to the effort to get equal numbers over all sub-sub-categories, which nobody actually does. Convenience is a huge factor in statistics the way it's done today. Proximity to the testing centers is critical, which already biases by income level, age, social class (depending on the survey), activity level, and a host of other things. Internet surveys are even worse because you have to have a computer, internet access, and time, which blocks out a lot of people. There are a bunch of other factors that bias these tests as well. And sure, you can "correct" for them by using weighting factors, but those weighting factors are practically voodoo. They are like what cosmologists use to balance their equations when the equations don't match reality. And the sad part is that each weighting factor requires ANOTHER weighting factor to correct for the prior one's errors. If I remember correctly, cosmologists have three weighting factors now, IN ONE EQUATION, and none of those factors have any "real" units.
It's my opinion that statistics become valuable and realistic models of a given population only when you surpass something between 100,000 and 1,000,000 data points. But then again, that's one opinion. You'd need at least another 99,999 to get an accurate picture of what "people" think about stastics. :)
You know, I actually took statistics when I was in college. For my major, which was a social science. Yes, it's possible to misrepresent data, but it's up to the reader of each study to look at the study's methodology and understand for him- or herself exactly how the study was put together and why. That is, what its null hypothesis was.
I'm surprised that, as a scientist, you would put so much doubt into a branch of mathematics.
For the record, it takes about 400 responses to have a statistically sound study. Over 1000 will bring the stats to a better p value. 5000 is incredibly strong.
Polls have validity. Statistics do point to data points in the world. That is not just my opinion; it's something I've been schooled in. I don't doubt what you say about what you study; I take you on your word that you have been taught by scholars who also have been taught by scholars. If you wish to dismiss the study of statistics just because you have an opinion that it may not be valid, that's on you. But your opinion does not make the whole of statistical study invalid.
Science is BUILT on doubt. Without doubt science is just a mutual admiration society. Most of the work scientists do is proving one another wrong. My paper that I'm working on (if successful) will prove everyone before me wrong. So, yes, I do have doubts about entire branches of science and math. In particular I think that cosmology is one (short) step away from a religion, and large portions of space science are just wrong. In fact, some things I was taught in classes as little as two years ago have been proven to be absolute crap, and yet at the time they were taught as established facts. I WANT you to doubt what I say! You should. Doubt demands better proofs.
The problem I have with statistics in general is that it's probably the most misused branch of math OR science in the world. Advertisements use statistics to sell their products, and aren't held to any ethical standards in their so-called studies (four out of five dentists, etc). Politicians use studies to "prove" that the people are behind them. It becomes hard to trust ANY statistical study when so many are obviously biased and thus meaningless. On top of all of this, I hear stories from my friends who work in biology, medicine, and other fields that use statistics heavily about studies that are tainted by biased sample sets, thrown out "bad" samples, and the like. On the whole, the technique is just too subjective for me to believe in very well.
Too much of science is about publishing, no matter whether you have good results or bad, just to stay afloat. This is why I'm a bad scientist; I want results that mean something, not just the first, easiest thing I can find, and that's apparently not how things are done. This exact quality is what makes me a good engineer. In engineering, something either works or it doesn't, and you can characterize the ways in which it fails until you find something that works. In science, you come up with some kind of idea, "prove" it in any way possible, and then wait for someone to disprove it. Which, since you practically EXPECT the counter-argument, leads to sloppy methods. HUGE sample sets tends to offset that sloppiness because it's a lot harder to discount 50,000 "bad" results than it is to throw out the seventeen that you don't like and pretend that they didn't exist.
I'm not going to argue this with you. Your stance is well outside the bounds of everything I've been taught about statistics as a science and a method. If you want to argue with me, fine; you'll also have to argue with the vast majority of social scientists who, without the use of statistics, would not be able to make any statements about societal movements and progressions at all.
It's fine to denigrate social science as a soft science. But by saying that statistics have no validity whatsoever, you're saying that social science is not scientific at all.
You seem to be wanting a straight tally of data instead of statistics, which would invalidate statistics altogether. Statistics is based on sampling. This seems to be the crux of what you refuse to accept, but is also the crux of what statistics is based upon.
I'm not trying to make a "your science vs. my science" fight here. I'm not the one using terms like "soft science". Firstly, I didn't even KNOW that social science was "your science" until you mentioned it. Secondly, my science SUCKS; it's not where I shine. And thirdly, I judge each field based on its own merits AS I SEE THEM, not based on other fields or how others tell me I should see them. If that offends you, I'm sorry, but that's just how it is. I see statistics and, by extension I suppose, social science as far too easy to abuse for partisan purposes. CAN it be a useful tool? Of course it can! But it's far more prone to data-shaping than any other field I can think of.
One of the biggest problems I had with ANY science lab was that we were doing "experiments" that had pre-determined outcomes. For example, in physics lab we were supposed to "prove" things like conservation of energy or momentum, demonstrate inelastic collisions, etc. In chemistry lab we had to show that chemical X mixed with chemical Y turned some given color, or had to be raised to temperature T before boiling. So, if we wanted to get good grades, our results had damned well better match up with the expected values to a certain tolerance, not paying any attention to what actually happened. In my opinion, that's not science, that's babysitting with technology. Basically everyone had to "chem-lab" their data to match the expected results and nobody learned anything. The reason I bring this up is that I've not only heard of but SEEN a lot of this sort of thing in stastical studies, where the pre-conceived notion influences either the methodology (i.e. sampling choice, sample culling, etc.) or the results obtained from the sampling. It's too open to interpretation, which is by definition subjective. A republican and a democrat looking at the same sample data may come up with different interpretations because their purposes are different, and that's unlikely in most other fields from what I've seen.
The upshot of this is that any statistical survey absolutely REQUIRES some way to determine any pre-conceived notions of the researcher. A statement of purpose, a resume and/or bio, whatever. When reading one of these you have to try to de-filter the biases of the researcher. I'm sure there are loads of reputable researchers who do unbiased studies that give real, useful results, but there are so many disreputable ones that it takes loads of work to cull out the crap. And, since the results can be stated in very simple terms (again, four out of five dentists, or 64% of democrats) those numbers are waved around as if they were facts without taking into account all of what went into getting them. I'm too much of an engineer to be a good scientist, and too hard of a scientist to implicitly trust statistics.
Not to belabor the point, but I'm too rusty with the info myself. So here are some primers recently fed into DailyKos (not for hard scientists but for political junkies, so the info will be soft):
(no subject)
Date: 2006-06-13 06:37 pm (UTC)This headline amuses me greatly.
(no subject)
Date: 2006-06-13 07:34 pm (UTC)(no subject)
Date: 2006-06-13 08:22 pm (UTC)(no subject)
Date: 2006-06-13 09:19 pm (UTC)(no subject)
Date: 2006-06-14 01:13 pm (UTC)(no subject)
Date: 2006-06-14 04:07 pm (UTC)And that's if you ARE willing to go to the effort to get equal numbers over all sub-sub-categories, which nobody actually does. Convenience is a huge factor in statistics the way it's done today. Proximity to the testing centers is critical, which already biases by income level, age, social class (depending on the survey), activity level, and a host of other things. Internet surveys are even worse because you have to have a computer, internet access, and time, which blocks out a lot of people. There are a bunch of other factors that bias these tests as well. And sure, you can "correct" for them by using weighting factors, but those weighting factors are practically voodoo. They are like what cosmologists use to balance their equations when the equations don't match reality. And the sad part is that each weighting factor requires ANOTHER weighting factor to correct for the prior one's errors. If I remember correctly, cosmologists have three weighting factors now, IN ONE EQUATION, and none of those factors have any "real" units.
It's my opinion that statistics become valuable and realistic models of a given population only when you surpass something between 100,000 and 1,000,000 data points. But then again, that's one opinion. You'd need at least another 99,999 to get an accurate picture of what "people" think about stastics. :)
(no subject)
Date: 2006-06-14 05:16 pm (UTC)You know, I actually took statistics when I was in college. For my major, which was a social science. Yes, it's possible to misrepresent data, but it's up to the reader of each study to look at the study's methodology and understand for him- or herself exactly how the study was put together and why. That is, what its null hypothesis was.
I'm surprised that, as a scientist, you would put so much doubt into a branch of mathematics.
For the record, it takes about 400 responses to have a statistically sound study. Over 1000 will bring the stats to a better p value. 5000 is incredibly strong.
Polls have validity. Statistics do point to data points in the world. That is not just my opinion; it's something I've been schooled in. I don't doubt what you say about what you study; I take you on your word that you have been taught by scholars who also have been taught by scholars. If you wish to dismiss the study of statistics just because you have an opinion that it may not be valid, that's on you. But your opinion does not make the whole of statistical study invalid.
(no subject)
Date: 2006-06-14 06:18 pm (UTC)The problem I have with statistics in general is that it's probably the most misused branch of math OR science in the world. Advertisements use statistics to sell their products, and aren't held to any ethical standards in their so-called studies (four out of five dentists, etc). Politicians use studies to "prove" that the people are behind them. It becomes hard to trust ANY statistical study when so many are obviously biased and thus meaningless. On top of all of this, I hear stories from my friends who work in biology, medicine, and other fields that use statistics heavily about studies that are tainted by biased sample sets, thrown out "bad" samples, and the like. On the whole, the technique is just too subjective for me to believe in very well.
Too much of science is about publishing, no matter whether you have good results or bad, just to stay afloat. This is why I'm a bad scientist; I want results that mean something, not just the first, easiest thing I can find, and that's apparently not how things are done. This exact quality is what makes me a good engineer. In engineering, something either works or it doesn't, and you can characterize the ways in which it fails until you find something that works. In science, you come up with some kind of idea, "prove" it in any way possible, and then wait for someone to disprove it. Which, since you practically EXPECT the counter-argument, leads to sloppy methods. HUGE sample sets tends to offset that sloppiness because it's a lot harder to discount 50,000 "bad" results than it is to throw out the seventeen that you don't like and pretend that they didn't exist.
(no subject)
Date: 2006-06-14 06:31 pm (UTC)It's fine to denigrate social science as a soft science. But by saying that statistics have no validity whatsoever, you're saying that social science is not scientific at all.
You seem to be wanting a straight tally of data instead of statistics, which would invalidate statistics altogether. Statistics is based on sampling. This seems to be the crux of what you refuse to accept, but is also the crux of what statistics is based upon.
(no subject)
Date: 2006-06-14 07:53 pm (UTC)One of the biggest problems I had with ANY science lab was that we were doing "experiments" that had pre-determined outcomes. For example, in physics lab we were supposed to "prove" things like conservation of energy or momentum, demonstrate inelastic collisions, etc. In chemistry lab we had to show that chemical X mixed with chemical Y turned some given color, or had to be raised to temperature T before boiling. So, if we wanted to get good grades, our results had damned well better match up with the expected values to a certain tolerance, not paying any attention to what actually happened. In my opinion, that's not science, that's babysitting with technology. Basically everyone had to "chem-lab" their data to match the expected results and nobody learned anything. The reason I bring this up is that I've not only heard of but SEEN a lot of this sort of thing in stastical studies, where the pre-conceived notion influences either the methodology (i.e. sampling choice, sample culling, etc.) or the results obtained from the sampling. It's too open to interpretation, which is by definition subjective. A republican and a democrat looking at the same sample data may come up with different interpretations because their purposes are different, and that's unlikely in most other fields from what I've seen.
The upshot of this is that any statistical survey absolutely REQUIRES some way to determine any pre-conceived notions of the researcher. A statement of purpose, a resume and/or bio, whatever. When reading one of these you have to try to de-filter the biases of the researcher. I'm sure there are loads of reputable researchers who do unbiased studies that give real, useful results, but there are so many disreputable ones that it takes loads of work to cull out the crap. And, since the results can be stated in very simple terms (again, four out of five dentists, or 64% of democrats) those numbers are waved around as if they were facts without taking into account all of what went into getting them. I'm too much of an engineer to be a good scientist, and too hard of a scientist to implicitly trust statistics.
(no subject)
Date: 2006-06-14 09:13 pm (UTC)Part 1: Measures of Central Tendency
Part 2: Measures of Spread
Part 3: Evaluation of Arguments
Part 4: Polls and Surveys