Saturday, February 9, 2013

Replication of Experiments

Often there is a question whether one should run lab or field experiments. While there are different merits of both methods, there is one merit of lab experiments that does not seem to get sufficient attention: They are very easy to replicate!

Replication seems to me an important part of research. In economics, I feel replication is maybe a bit more common than in psychology: Economists can have a great career even if they never come up with their own bias or their own game. While we do not have many papers that are pure replications of previous results, many papers will have a control treatment that essentially amounts to a replication of a previous result.

As such I am also not completely in favor of unifying the software, exact procedures we use in a specific experiment. Often we have to make many small choices, and the results should not depend on those choices. But if we all use the same software, we'll not easily find out...

Psychology seems to have a maybe slightly bigger problem with replication, at least they are a lot in the news lately. I'd welcome suggestions and comments as to the hypothesis that economists may have a better time with replication because different research often get to replicate the main treatment of a new result as they use it as a control in their own paper. This I think is not the standard in psychology.

Here is a link to Ed Yong's feature for Nature on the problems of replication in psychology "Replication studies: Bad copy -- In the wake of high-profile controversies, psychologists are facing up to problems with replication."

Here's a recent Chronicle article By Tom Bartlett "Power of Suggestion: The amazing influence of unconscious cues is among the most fascinating discoveries of our time­—that is, if it's true"

"Psychology may be simultaneously at the highest and lowest point in its history. Right now its niftiest findings are routinely simplified and repackaged for a mass audience; if you wish to publish a best seller sans bloodsucking or light bondage, you would be well advised to match a few dozen psychological papers with relatable anecdotes and a grabby, one-word title. That isn't true across the board. Researchers engaged in more technical work on, say, the role of grapheme units in word recognition must comfort themselves with the knowledge that science is, by its nature, incremental. But a social psychologist with a sexy theory has star potential. In the last decade or so, researchers have made astonishing discoveries about the role of consciousness, the reasons for human behavior, the motivations for why we do what we do. This stuff is anything but incremental."

and

"Fairly or not, social psychologists are perceived to be less rigorous in their methods, generally not replicating their own or one another's work, instead pressing on toward the next headline-making outcome."

That's why, when we teach Experimental Economics, we often emphasize series of experiments.

And that's also why I am very grateful for everyone who ran gender competition experiments which provided a wealth of replication of our, perhaps surprising, findings on gender differences in competitive attitudes. Thank you all!

Friday, February 8, 2013

Fabricated data

More from Uri Simonsohn:

"Just post it: The lesson from two cases of fabricated data detected by statistics alone."
The abstract reads:
"I argue that requiring authors to post the raw data supporting their published results has, among many other benefits, that of making fraud much less likely to go undetected. I illustrate this point by describing two cases of fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraudster’s explanations for his anomalous results."

In the introduction he writes:

"I illustrate how raw data can be analyzed for such purposes through two case studies. Each began by noting that summary statistics reported in a published paper were too similar across conditions to have originated in random samples, an approach to identifying problematic data that has been employed before (Carlisle, 2012; Fisher, 1936; Gaffan & Gaffan, 1992; Kalai, McKay, & Bar-Hillel, 1998; Roberts, 1987; Sternberg & Roberts, 2006). These preliminary analyses of excessive similarity motivated me to contact the authors and request the raw data behind their results. Only when the raw data were analyzed did these suspicions rise to a level of confidence that could trigger the investigations of possible misconduct that were eventually followed by the resignation of the researchers in question."

I really like this figure: I guess making up data isn't as easy as it sounds...


Thursday, February 7, 2013

False-Positives


Uri Simonsohn has several papers on scientific conduct, that should be a rule not only in psychology.

The first paper, "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant" is by Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn in Psychological Science, V22(11), pp.1359-1366


Their abstract reads:
"In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (less or equal to .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process."

A striking Figure from the paper is the following:

"[..] Figure 1 shows the false-positive rates from additional simulations for a researcher who has already collected either 10 or 20 observations within each of two conditions, and then tests for significance every 1, 5, 10, or 20 per-condition observations after that. The researcher stops collecting data either once statistical significance is obtained or when the number of observations in each condition reaches 50. Figure 1 shows that a researcher who starts with 10 observations per condition and then tests for significance after every
new per-condition observation finds a significant effect 22% of the time."



Stay tuned for more...

Wednesday, February 6, 2013

Designing Reputation Mechanisms

Designing feedback mechanisms seems like a big area for market design where theory and experiments can go hand in hand, and where there are still lots of open questions.

Gary Bolton, Ben Greiner and Axel Ockenfels have a new paper "Engineering Trust - Reciprocity in the Production of Reputation Information" that is forthcoming in Management Science.
The abstract reads:
"Reciprocity in feedback giving distorts the production and content of reputation information in a market, hampering trust and trade efficiency. Guided by feedback patterns observed on eBay and other platforms we run laboratory experiments to investigate how reciprocity can be managed by changes in the way feedback information flows through the system, leading to more accurate reputation information, more trust and more efficient trade. We discuss the implications for theory building and for managing the redesign of market
trust systems."


Tuesday, February 5, 2013

Knowledge, or Theory and Practice

Today, just a quick piece of wisdom from a kid of my friends:
Question: "Hey, do you know how to swim"
Answer: "Yes. But only out of the water."



Monday, February 4, 2013

Gender and Generosity


Stefano DellaVigna, John A. List, Ulrike Malmendier, and Gautam Rao have a new working paper on "The Importance of Being Marginal: Gender Differences in Generosity"
The abstract reads:
"Do men and women have different social preferences? Previous findings are contradictory. We provide a potential explanation using evidence from a field experiment. In a door-to-door solicitation, men and women are equally generous, but women become less generous when it becomes easy to avoid the solicitor. Our structural estimates of the social preference parameters suggest an explanation: women are more likely to be on the margin of giving, partly because of a less dispersed distribution of altruism. We find similar results for the willingness to complete an unpaid survey: women are more likely to be on the margin of participation."

In their conclusion they write:
"This study uncovers an important relationship between gender and giving patterns: there are gender differences in social preferences, but it is important to go beyond considering differences in means –important gender differences may be at the margin. This leads women to give more in certain situations, but not in others, and also to be more sensitive to social cues."

This reminds me of the paper by Andreoni, Jim and Lise Vesterlund. 2001. “Which is the fair sex? Gender differences in altruism.” Quarterly Journal of Economics 116(4): 293-312 (that the Della Vigna et al cite)


Their abstract reads:
"We study gender differences in altruism by examining a modified dictator game with varying incomes and prices. Our results indicate that the question “which is the fair sex?” has a complicated answer—when altruism is expensive, women are kinder, but when it is cheap, men are more altruistic. That is, we Žfind that the male and female “demand curves for altruism” cross, and that men are more responsive to price changes. Furthermore, men are more likely to be either perfectly selfish or perfectly selfless, whereas women tend to be “equalitarians” who prefer to share evenly."

The conclusion there too points that:
"This study finds that, depending on the price of giving, either sex can be found to be more altruistic. When the price of giving is low, men appear more altruistic, and when the price is high, women are more generous. Stated differently, men are more likely to be either perfectly selfish or perfectly selfless, whereas women care more about equalizing payoffs. This leads to demand curves for altruism that cross and those for men are more price-elastic.
        There are several important consequences of this result. First, this finding can potentially unify a literature that has thus far been fractured by inconsistent findings. By showing that differences in altruism depend on the price, we can begin to organize studies that sometimes found men to be more altruistic and sometimes women. Second, this indicates a need for more attention to sex differences in experimental economics. If differences appear with respect to altruism, they may appear in other behavior as well. This, in turn, means that researchers would be wise to assure that their experimental findings are the result of economic incentives and not of varying sex compositions of their control and treatment groups."

It's good that those papers are consistent with each other...

Sunday, February 3, 2013

Coffee and Compliments

I knew there was an excellent reason for early morning coffee:

The Scientific American writes about a study from PLOS one


"Scientists assigned 66 subjects to one of two groups. Half got a 200-milligram caffeine tablet, a dose equal to almost three cups of coffee. The other half received a sugar tablet. Thirty minutes later the volunteers were shown strings of letters, and had to decide as fast as they could if a string formed a word or was just gibberish. The volunteers recognized words with positive associations much faster than either negative or neutral words.Other studies have shown that positive words tend to be recognized more quickly, but the caffeine increases the gap.

So next time you wake up with a grumpy sweetheart, your compliments might be appreciated more if they have a cup of coffee."

In that spirit, off to get the best coffee from Palo Alto.

Friday, February 1, 2013

Gender and Test Taking

Katie Baldiga had a paper on gender differences in Test taking "Gender Differences in Willingness to Guess". The abstract reads:
"Multiple-choice tests play a large role in determining academic and professional outcomes. Performance on these tests hinges not only on a test-taker’s knowledge of the material but also on his willingness to guess when unsure about the answer. In this paper, we present the results of an experiment that explores whether women skip more questions than men. The experimental test consists of practice questions from the SAT II subject tests; we vary the size of the penalty imposed for a wrong answer and the salience of the evaluative nature of the task. We …find that when no penalty is assessed for a wrong answer, all test-takers answer every question. But, when there is a small penalty for wrong answers, women answer signi…ficantly fewer questions than men. We see no differences in knowledge of the material or con…fidence in the test-takers, and differences in risk preferences fail to explain all of the observed gap. We show that, conditional on their knowledge of the material, test-takers who skip questions do signifi…cantly worse on our experimental test, putting women and other test-takers that are less willing to guess at a disadvantage".