The tests I'd delete are the ones that just test that the code is written in a particular way instead of testing the expected behaviour of thr code.
Couple years ago I helped to bring a project back on track. They had a notoriously flakey part of test suite, turned out to be caused by a race condition. And a very puzzling case of occasional data corruption - also, turns out, caused by the same race condition.
Often times a flakey test is not flakey because it was well-written and something else strange is failing. Often times the test reveals something about the system that is somewhat non-deterministic, but not non-deterministic in a detrimental way. When you have multiple levels of abstraction and parallelization and interdependent behavior, fixing a single test becomes a time consuming process that is difficult to work with (because it's flakey, you can't always replicate the failure).
If a test fails in CI and the traceback is unclear, many people will re-run once and let it continue to flake. Obvious flakes around time and other dependencies are much easier to spot and fix, so they are. It's only the weird ones that lead to pain and regret.
The commonest type I see is one where instead of waiting until expected behaviour is exhibited with a suitable timeout, the test sleeps for some shorter period and then checks to see if the behaviour was exhibited.
These tests not only flake occasionally when the CI server or dev laptop is under unusual load, but worse, accumulate until the test suite is so full of "short" sleeps that the full set of test takes half an hour to run.
Often the sleeps were seen as being acceptable because the plan was to run the tests in parallel, but then the increased load results in the tests becoming flakey.
Once you have dozens of these flaking tests for this or other reasons, it becomes a project in itself to refactor them back to something sane.
Flakey tests should always be fixed immediately unless you're in the middle of an incident or something.
Most flakiness ends up being a bug in the test or nondeterminism exhibited by the code which users dont actually care about.