Changes since congestion pricing started in New York

1. explodes ◴[14 May 25 13:18 UTC] No.43984193[source]▶

Wouldn't it be nice if policy changes were accompanied by an A/B testing plan to evaluate their impact? I have always thought so. I have also seen a major pitfall of A/B testing that real humans can hand-pick and slice data to make it sound as positive or negative as wanted. Nonetheless, the more data the better.

replies(9): >>43984911 #>>43984957 #>>43985195 #>>43985261 #>>43985635 #>>43986010 #>>43986433 #>>43990930 #>>43996358 #

2. ◴[14 May 25 14:23 UTC] No.43984911[source]▶

>>43984193 (TP) #

3. jeffbee ◴[14 May 25 14:26 UTC] No.43984957[source]▶

>>43984193 (TP) #

Unfortunately, the possibility exists that the moment of introducing the A/B test requirement will be strategically chosen to freeze the status quo in the way the chooser prefers.

4. Calwestjobs ◴[14 May 25 14:47 UTC] No.43985195[source]▶

>>43984193 (TP) #

test A - before

test B - after

what are you talking about ?

replies(3): >>43985239 #>>43985257 #>>43985425 #

5. shermantanktop ◴[14 May 25 14:51 UTC] No.43985239[source]▶

>>43985195 #

“A/B in time” suffers from inability to control for other factors that might vary over time. In this case, that could be the economy or other transit policies.

But sometimes it’s the only possible approach.

6. shadowgovt ◴[14 May 25 14:53 UTC] No.43985257[source]▶

>>43985195 #

Generally, that's considered to introduce counfounding factors on the time axis ("did we see improvement because we changed something or because flu season hit and people stayed home") that you'd prefer to mitigate by running your A and B simultaneously.

But in the absence of the ability to run them simultaneously, "A is before and B is after" can be a fine proxy. Of course, if B is worse, it'd be nice if you could only subject, say, 5% of your population to it before you just slam the slider to 100% and hit everyone with it.

replies(1): >>43986182 #

7. sc68cal ◴[14 May 25 14:53 UTC] No.43985261[source]▶

>>43984193 (TP) #

We already had A/B testing of congestion pricing. The A test was without congestion pricing in NYC, and has been tested for decades.

replies(3): >>43985550 #>>43985682 #>>43990138 #

8. Ntrails ◴[14 May 25 15:06 UTC] No.43985425[source]▶

>>43985195 #

"before" and "after" introduces a large axis of noise

The problem is that for A/B testing to really work you need independent groups outcomes. As soon as there is any bias in group selection or cross group effect it's very hard to unpick.

9. bunderbunder ◴[14 May 25 15:14 UTC] No.43985550[source]▶

>>43985261 #

That's not an A/B test because it has no way of controlling for broader economic trends over time. How do you figure out if what you're seeing is because of that one thing that changed, or the enormous list of other things that also changed around the same time?

A more valid design would be randomly assigning some cities to institute congestion pricing, and other cities to not have it. Obviously not feasible in practice, but that's at least the kind of thing to strive toward when designing these kinds of studies.

replies(3): >>43985801 #>>43991066 #>>43994453 #

10. aredox ◴[14 May 25 15:20 UTC] No.43985635[source]▶

>>43984193 (TP) #

Yeah, let's do that for everything: safety belts, safety on gun triggers, melamine in milk, etc...

Do you A/B test your comments too?

11. s1artibartfast ◴[14 May 25 15:25 UTC] No.43985682[source]▶

>>43985261 #

An important part of testing is establishing assessment criteria and collecting data.

I wish more laws would pre-state what their intended outcome and success would look like.

12. jannyfer ◴[14 May 25 15:34 UTC] No.43985801{3}[source]▶

>>43985550 #

That would be a bad design for an A/B study (and NYC congestion pricing is not a “study” anyway), because cities are few and not alike and have an enormous list of other things that are different. What NYC equivalent would you pick?

In any case, not every policy change needs to be an academic exercise.

replies(1): >>43986301 #

13. aidenn0 ◴[14 May 25 15:53 UTC] No.43986010[source]▶

>>43984193 (TP) #

> Wouldn't it be nice if policy changes were accompanied by an A/B testing plan to evaluate their impact? I have always thought so. I have also seen a major pitfall of A/B testing that real humans can hand-pick and slice data to make it sound as positive or negative as wanted. Nonetheless, the more data the better.

Policies have different effects depending on how likely people judge them to be long-term changes. Construction along a route will cause people to temporarily use alternative forms of transportation, but not e.g. sell their car or buy a long-term bus pass.

Yes, the inability to know counterfactuals will make judging policies more subjective than we might like. The closest we get to A/B testing is when different jurisdictions adopt substantially similar policies at different times. For example, this was done to judge improvements from phasing out leaded-gasoline, since it was done at different times and rates in different areas.

replies(1): >>43986399 #

14. Calwestjobs ◴[14 May 25 16:07 UTC] No.43986182{3}[source]▶

>>43985257 #

yes, but how the hell he proposes to make A/B testing of "whole Manhattan policy"? build another Manhattan just for test? makes no sense. whole manhattan is important. not 5%. so no 5%. a/b test can be done only for things which affect one person, like for example GUI etc, big group under test but effect on individuals,

in such big scale a/b test is tool to deceive, not to get to right conclusion

replies(1): >>43987238 #

15. bunderbunder ◴[14 May 25 16:17 UTC] No.43986301{4}[source]▶

>>43985801 #

Yup, that is indeed a part of the problem. You'll notice I did say, "Obviously not feasible in practice."

I've got a textbook on field experiments that refers to these kinds of questions as FUQ - acronym for "Fundamentally Unanswerable Questions". You can collect suggestive evidence, but firmly establishing cause and effect is something you've just got to let go of.

16. stemlord ◴[14 May 25 16:25 UTC] No.43986399[source]▶

>>43986010 #

please don't quote the entire comment you're replying to

replies(1): >>43988799 #

17. aclatuts ◴[14 May 25 16:28 UTC] No.43986433[source]▶

>>43984193 (TP) #

The real world isn't A/B tests. No government is going to spend millions on equipment and infrastructure on a congestion zone because some engineers are like "Let's just test this out. I have done zero research on what could possibly happen, but it would be fun to see what the results are."

replies(1): >>43989611 #

18. shadowgovt ◴[14 May 25 17:40 UTC] No.43987238{4}[source]▶

>>43986182 #

It is, indeed, much easier to do A/B testing online in environments you control than IRL.

(Purely hypothetically: one could identify 10% of the island as operating under the new rules and compare outcomes. This is politically fraught on multiple levels and also gives messy spatial results.)

19. sc68cal ◴[14 May 25 20:25 UTC] No.43988799{3}[source]▶

>>43986399 #

sometimes people edit their post after the fact. It is important sometimes to quote it, to ensure that context is preserved

20. listenallyall ◴[14 May 25 21:54 UTC] No.43989611[source]▶

>>43986433 #

When you write it out like that, it seems to make total sense! But then you read grant proposals that get funded - in things like the social sciences and humanities, and even conventional science and health - millions of dollars essentially just throwing darts to see what sticks.

replies(1): >>43989863 #

21. czzr ◴[14 May 25 22:31 UTC] No.43989863{3}[source]▶

>>43989611 #

Surely you see the difference between working in a development environment and working in production?

replies(1): >>43992193 #

22. tonymet ◴[14 May 25 23:09 UTC] No.43990138[source]▶

>>43985261 #

you haven't described the observations or the sample

replies(1): >>43990587 #

23. ericpauley ◴[15 May 25 00:24 UTC] No.43990587{3}[source]▶

>>43990138 #

TFA describes this extensively. The observations are traffic speed, bus timeliness, and over a dozen other metrics. The samples are sub-areas of NYC.

replies(1): >>43991948 #

24. notatoad ◴[15 May 25 01:26 UTC] No.43990930[source]▶

>>43984193 (TP) #

unfortunately, building a second NYC for the purposes of A/B testing isn't feasible.

but we have before and after data to compare - that's what this article is about. and the congestion pricing plan included requirements to publish data specifically for the purposes of comparison between last year and this year.

25. JumpCrisscross ◴[15 May 25 01:47 UTC] No.43991066{3}[source]▶

>>43985550 #

> randomly assigning some cities to institute congestion pricing, and other cities to not have it

Cities are stupidly heterogenous. These data wouldn't be more meaningful than comparing cities with congestion pricing to those without. (And comparing them from their congestion eras.)

replies(1): >>44006238 #

26. tonymet ◴[15 May 25 04:50 UTC] No.43991948{4}[source]▶

>>43990587 #

Anything negative ?

27. listenallyall ◴[15 May 25 05:47 UTC] No.43992193{4}[source]▶

>>43989863 #

The comment to which I replied was referring to the cost, not to the implementation

28. sorcerer-mar ◴[15 May 25 12:47 UTC] No.43994453{3}[source]▶

>>43985550 #

Everyone knows how you can conduct good experiments in a land of frictionless spherical cows.

29. rsynnott ◴[15 May 25 15:57 UTC] No.43996358[source]▶

>>43984193 (TP) #

What a good idea. Simply build another Manhattan for the purpose.

30. bunderbunder ◴[16 May 25 14:54 UTC] No.44006238{4}[source]▶

>>43991066 #

What you're telling me here is that you aren't aware of what the randomization is for in randomized controlled trials.

"Our treatment units are stupidly heterogeneous" is exactly the problem it solves. A century's worth of developing increasingly sophisticated statistical techniques for making do without random assignment has thus far failed to accomplish anything than provisional mitigations that are notoriously easy to use incorrectly in practice.