Most active commenters
  • bunderbunder(3)

←back to thread

437 points Vinnl | 30 comments | | HN request time: 1.036s | source | bottom
1. explodes ◴[] No.43984193[source]
Wouldn't it be nice if policy changes were accompanied by an A/B testing plan to evaluate their impact? I have always thought so. I have also seen a major pitfall of A/B testing that real humans can hand-pick and slice data to make it sound as positive or negative as wanted. Nonetheless, the more data the better.
replies(9): >>43984911 #>>43984957 #>>43985195 #>>43985261 #>>43985635 #>>43986010 #>>43986433 #>>43990930 #>>43996358 #
2. ◴[] No.43984911[source]
3. jeffbee ◴[] No.43984957[source]
Unfortunately, the possibility exists that the moment of introducing the A/B test requirement will be strategically chosen to freeze the status quo in the way the chooser prefers.
4. Calwestjobs ◴[] No.43985195[source]
test A - before

test B - after

what are you talking about ?

replies(3): >>43985239 #>>43985257 #>>43985425 #
5. shermantanktop ◴[] No.43985239[source]
“A/B in time” suffers from inability to control for other factors that might vary over time. In this case, that could be the economy or other transit policies.

But sometimes it’s the only possible approach.

6. shadowgovt ◴[] No.43985257[source]
Generally, that's considered to introduce counfounding factors on the time axis ("did we see improvement because we changed something or because flu season hit and people stayed home") that you'd prefer to mitigate by running your A and B simultaneously.

But in the absence of the ability to run them simultaneously, "A is before and B is after" can be a fine proxy. Of course, if B is worse, it'd be nice if you could only subject, say, 5% of your population to it before you just slam the slider to 100% and hit everyone with it.

replies(1): >>43986182 #
7. sc68cal ◴[] No.43985261[source]
We already had A/B testing of congestion pricing. The A test was without congestion pricing in NYC, and has been tested for decades.
replies(3): >>43985550 #>>43985682 #>>43990138 #
8. Ntrails ◴[] No.43985425[source]
"before" and "after" introduces a large axis of noise

The problem is that for A/B testing to really work you need independent groups outcomes. As soon as there is any bias in group selection or cross group effect it's very hard to unpick.

9. bunderbunder ◴[] No.43985550[source]
That's not an A/B test because it has no way of controlling for broader economic trends over time. How do you figure out if what you're seeing is because of that one thing that changed, or the enormous list of other things that also changed around the same time?

A more valid design would be randomly assigning some cities to institute congestion pricing, and other cities to not have it. Obviously not feasible in practice, but that's at least the kind of thing to strive toward when designing these kinds of studies.

replies(3): >>43985801 #>>43991066 #>>43994453 #
10. aredox ◴[] No.43985635[source]
Yeah, let's do that for everything: safety belts, safety on gun triggers, melamine in milk, etc...

Do you A/B test your comments too?

11. s1artibartfast ◴[] No.43985682[source]
An important part of testing is establishing assessment criteria and collecting data.

I wish more laws would pre-state what their intended outcome and success would look like.

12. jannyfer ◴[] No.43985801{3}[source]
That would be a bad design for an A/B study (and NYC congestion pricing is not a “study” anyway), because cities are few and not alike and have an enormous list of other things that are different. What NYC equivalent would you pick?

In any case, not every policy change needs to be an academic exercise.

replies(1): >>43986301 #
13. aidenn0 ◴[] No.43986010[source]
> Wouldn't it be nice if policy changes were accompanied by an A/B testing plan to evaluate their impact? I have always thought so. I have also seen a major pitfall of A/B testing that real humans can hand-pick and slice data to make it sound as positive or negative as wanted. Nonetheless, the more data the better.

Policies have different effects depending on how likely people judge them to be long-term changes. Construction along a route will cause people to temporarily use alternative forms of transportation, but not e.g. sell their car or buy a long-term bus pass.

Yes, the inability to know counterfactuals will make judging policies more subjective than we might like. The closest we get to A/B testing is when different jurisdictions adopt substantially similar policies at different times. For example, this was done to judge improvements from phasing out leaded-gasoline, since it was done at different times and rates in different areas.

replies(1): >>43986399 #
14. Calwestjobs ◴[] No.43986182{3}[source]
yes, but how the hell he proposes to make A/B testing of "whole Manhattan policy"? build another Manhattan just for test? makes no sense. whole manhattan is important. not 5%. so no 5%. a/b test can be done only for things which affect one person, like for example GUI etc, big group under test but effect on individuals,

in such big scale a/b test is tool to deceive, not to get to right conclusion

replies(1): >>43987238 #
15. bunderbunder ◴[] No.43986301{4}[source]
Yup, that is indeed a part of the problem. You'll notice I did say, "Obviously not feasible in practice."

I've got a textbook on field experiments that refers to these kinds of questions as FUQ - acronym for "Fundamentally Unanswerable Questions". You can collect suggestive evidence, but firmly establishing cause and effect is something you've just got to let go of.

16. stemlord ◴[] No.43986399[source]
please don't quote the entire comment you're replying to
replies(1): >>43988799 #
17. aclatuts ◴[] No.43986433[source]
The real world isn't A/B tests. No government is going to spend millions on equipment and infrastructure on a congestion zone because some engineers are like "Let's just test this out. I have done zero research on what could possibly happen, but it would be fun to see what the results are."
replies(1): >>43989611 #
18. shadowgovt ◴[] No.43987238{4}[source]
It is, indeed, much easier to do A/B testing online in environments you control than IRL.

(Purely hypothetically: one could identify 10% of the island as operating under the new rules and compare outcomes. This is politically fraught on multiple levels and also gives messy spatial results.)

19. sc68cal ◴[] No.43988799{3}[source]
sometimes people edit their post after the fact. It is important sometimes to quote it, to ensure that context is preserved
20. listenallyall ◴[] No.43989611[source]
When you write it out like that, it seems to make total sense! But then you read grant proposals that get funded - in things like the social sciences and humanities, and even conventional science and health - millions of dollars essentially just throwing darts to see what sticks.
replies(1): >>43989863 #
21. czzr ◴[] No.43989863{3}[source]
Surely you see the difference between working in a development environment and working in production?
replies(1): >>43992193 #
22. tonymet ◴[] No.43990138[source]
you haven't described the observations or the sample
replies(1): >>43990587 #
23. ericpauley ◴[] No.43990587{3}[source]
TFA describes this extensively. The observations are traffic speed, bus timeliness, and over a dozen other metrics. The samples are sub-areas of NYC.
replies(1): >>43991948 #
24. notatoad ◴[] No.43990930[source]
unfortunately, building a second NYC for the purposes of A/B testing isn't feasible.

but we have before and after data to compare - that's what this article is about. and the congestion pricing plan included requirements to publish data specifically for the purposes of comparison between last year and this year.

25. JumpCrisscross ◴[] No.43991066{3}[source]
> randomly assigning some cities to institute congestion pricing, and other cities to not have it

Cities are stupidly heterogenous. These data wouldn't be more meaningful than comparing cities with congestion pricing to those without. (And comparing them from their congestion eras.)

replies(1): >>44006238 #
26. tonymet ◴[] No.43991948{4}[source]
Anything negative ?
27. listenallyall ◴[] No.43992193{4}[source]
The comment to which I replied was referring to the cost, not to the implementation
28. sorcerer-mar ◴[] No.43994453{3}[source]
Everyone knows how you can conduct good experiments in a land of frictionless spherical cows.
29. rsynnott ◴[] No.43996358[source]
What a good idea. Simply build another Manhattan for the purpose.
30. bunderbunder ◴[] No.44006238{4}[source]
What you're telling me here is that you aren't aware of what the randomization is for in randomized controlled trials.

"Our treatment units are stupidly heterogeneous" is exactly the problem it solves. A century's worth of developing increasingly sophisticated statistical techniques for making do without random assignment has thus far failed to accomplish anything than provisional mitigations that are notoriously easy to use incorrectly in practice.