ByteDance sacks intern for sabotaging AI project

1. aimazon ◴[21 Oct 24 07:08 UTC] No.41901475[source]▶

>>41900402 (OP) #

The context is here: https://github.com/JusticeFighterDance/JusticeFighter110

replies(2): >>41901534 #>>41901598 #

2. yapyap ◴[21 Oct 24 07:19 UTC] No.41901534[source]▶

>>41901475 (TP) #

whats this mean for us non chinese folk

replies(1): >>41901589 #

3. xvector ◴[21 Oct 24 07:30 UTC] No.41901589[source]▶

>>41901534 #

Translated by ChatGPT.

Summary:

10/18:

Translation of the provided text:

Title: Urgent Warning

The “reputation washing” behavior of Tian Keyu has been extremely harmful

For the past two months, Tian Keyu has maliciously attacked the cluster code, causing significant harm to nearly 30 employees of various levels, wasting nearly a quarter’s worth of work by his colleagues. All records and audits clearly confirm these undeniable facts:

1. Modified the PyTorch source code of the cluster, including random seeds, optimizers, and data loaders.

2. Randomly killed multi-machine experiment processes, causing significant experiment delays.

3. Opened login backdoors through checkpoints, automatically initiating random process terminations.

4. Participated in daily troubleshooting meetings for cluster faults, continuing to modify attack codes based on colleagues’ troubleshooting ideas.

5. Altered colleagues’ model weights, rendering experimental results unreproducible.

It’s unimaginable how Tian Keyu could continue his attacks with such malice, seeing colleagues’ experiments inexplicably interrupted or fail, after hearing their debugging strategies and specifically modifying the attack codes in response, and witnessing colleagues working overnight with no progress. After being dismissed by the company, he received no penalties from the school or advisors and even began to whitewash his actions on various social media platforms. Is this the school and advisors’ tolerance of Tian Keyu’s behavior? We expect this evidence disclosure to attract the attention of relevant parties and for definitive penalties to be imposed on Tian Keyu, reflecting the social responsibility of higher education institutions to educate and nurture.

We cannot allow someone who has committed such serious offenses to continue evading justice, even beginning to distort facts and whitewash his wrongdoing! Therefore, we decide to stand on behalf of all justice advocates and reveal the evidence of Tian Keyu’s malicious cluster attack!

Tian Keyu, if you deny any part of these malicious attack behaviors, or think the content here smears you, please present credible evidence! We are willing to disclose more evidence as the situation develops, along with your shameless ongoing attempts to whitewash. We guarantee the authenticity and accuracy of all evidence and are legally responsible for the content of the evidence. If necessary, we are willing to disclose our identities and confront Tian Keyu face-to-face.

Thanks to those justice advocates, you do not need to apologize; you are heroes who dare to speak out.

Link to the inquiry recording of Tian Keyu: https://www.youtube.com/watch?v=nEYbYW--qN8

Personal homepage of Tian Keyu: https://scholar.google.com/citations?user=6FdkbygAAAAJ&hl=en

GitHub homepage of Tian Keyu: https://github.com/keyu-tian

10/19:

Clarification Regarding the “Intern Sabotaging Large Model Training” Incident

Recently, some media reported that “ByteDance’s large model training was attacked by an intern.” After internal verification by the company, it was confirmed that an intern from the commercial technology team committed a serious disciplinary violation and has been dismissed. However, the related reports also contain some exaggerations and inaccuracies, which are clarified as follows:

1. The intern involved maliciously interfered with the model training tasks of the commercial technology team’s research project, but this did not affect the official commercial projects or online operations, nor did it involve ByteDance’s large model or other businesses.

2. Rumors on the internet about “involving over 8,000 cards and losses of millions of dollars” are greatly exaggerated.

3. Upon verification, it was confirmed that the individual in question had been interning in the commercial technology team, and had no experience interning at AI Lab. Their social media bio and some media reports are incorrect.

The intern was dismissed by the company in August. The company has also reported their behavior to the industry alliance and the school they attend, leaving further actions to be handled by the school.

replies(1): >>41901668 #

4. ◴[21 Oct 24 07:31 UTC] No.41901598[source]▶

>>41901475 (TP) #

5. theginger ◴[21 Oct 24 07:42 UTC] No.41901668{3}[source]▶

>>41901589 #

Hanlon's razor comes to mind

https://en.m.wikipedia.org/wiki/Hanlon%27s_razor

replies(2): >>41901750 #>>41904478 #

6. xvector ◴[21 Oct 24 08:01 UTC] No.41901750{4}[source]▶

>>41901668 #

No. This isn't a Hanlon's Razor scenario.

If you look at what he did it was definitely 100% actively malicious. For instance, his attack only executes when running on >256 GPUs. He inserted random sleeps to slow down training time and was knowledgeable enough to understand how to break various aspects of the loss function.

He then sat in meetings and adjusted his attacks when people were getting close to solving the problem.

replies(2): >>41901890 #>>41922162 #

7. ninjin ◴[21 Oct 24 08:24 UTC] No.41901890{5}[source]▶

>>41901750 #

Certainly looks malicious, but what on earth would be his motive? He is an MSc student for heaven's sake and this tarnishes his entire career. Heck, he has published multiple first-author, top-tier papers (two at NeurIPS and one at ICLR) and is on par with a mid-stage PhD student that would be considered to be doing outstandingly well. The guy would (is?) likely to be on track for a great job and career. Not saying he did not do what was claimed, but I am unsure about any motive that fits other than "watching the world burn".

Also, what kind of outfit is ByteDance if an intern can modify (and attack) runs that are on the scale of 256 GPUs or more? We are talking at least ~USD 8,000,000 in terms of the hardware cost to support that kind of job and you give access to any schmuck? Do you not have source control or some sort of logging in place?

replies(2): >>41902125 #>>41902734 #

8. rfoo ◴[21 Oct 24 09:10 UTC] No.41902125{6}[source]▶

>>41901890 #

> but what on earth would be his motive

Rumors said that his motivation would be to just actively sabotage colleague's work because managers decided to give priority on GPU resources to those who were working on DiT models, and he works on autoregressive image generation. I don't know what exactly was his idea, maybe he thought that by destroying internal competitors' work he can get his GPU quotas back?

> Also, what kind of outfit is ByteDance if an intern can modify (and attack) runs that are on the scale of 256 GPUs or more?

Very high. These research labs are basically run on interns (not by interns, but a lot of ideas come from interns, a lot of experiments executed by interns), and I actually mean it.

> Do you not have source control or some sort of logging in place?

Again, rumors said that he gained access to prod jobs by inserting RCE exploits (on unsafe pickle, yay, in 2024!) to foundation model checkpoints.

replies(1): >>41903025 #

9. runeblaze ◴[21 Oct 24 10:46 UTC] No.41902734{6}[source]▶

>>41901890 #

A lot of ML outfits are equipped with ML experts and people who care about chasing results fast. Security in too many senses of the word is usually an afterthought.

Also sort of as you also hinted, you can't exactly lump these top-conference scoring PhD student-equivalents with typical "interns". Many are extremely capable. ByteDance wants to leverage their capabilities, and likely wants to leverage them fast.

replies(1): >>41903048 #

10. ninjin ◴[21 Oct 24 11:37 UTC] No.41903025{7}[source]▶

>>41902125 #

Thanks, that is at least plausible (but utterly stupid if true) and tells me why I would not be a good cop. Holding off further judgement on the individuals involved until we have more details.

I do understand that interns (who are MSc and PhD students) are incredibly valuable as they drive progress in my own world too: academia. But my point was not so much about access to the resources, as the fact that apparently they were able to manipulate data, code, and jobs from a different group. Looking forward to future details. Maybe we have a mastermind cracker on our hand? But, my bet is rather on awful security and infrastructure practices on the part of ByteDance for a cluster that allegedly is in the range of ~USD 250,000,000.

replies(1): >>41905340 #

11. ninjin ◴[21 Oct 24 11:41 UTC] No.41903048{7}[source]▶

>>41902734 #

Basic user separation is not asking much though, or are we expected to believe that at ByteDance everyone has a wheel bit at a cluster worth many many millions? Let us see what turns up. Maybe they had a directory with Python pickles that were writeable by everyone? But even that is silly on a very basic level. As I said in another comment, I could be wrong and we have a mastermind cracker of an intern. But I somewhat doubt it.

replies(2): >>41905741 #>>41906101 #

12. 93po ◴[21 Oct 24 14:21 UTC] No.41904478{4}[source]▶

>>41901668 #

this is closer to occam's since i think the most likely scenario here is malicious reputation damage - it's more likely someone has it out for this intern rather than this intern actually having done literally anything he's accused of

13. rfoo ◴[21 Oct 24 15:41 UTC] No.41905340{8}[source]▶

>>41903025 #

Agree on this being stupid.

> my bet is rather on awful security and infrastructure practices

For sure. As far as I know ByteDance does not have an established culture of always building secure systems.

You don't need to be a mastermind cracker. I've used/built several systems for research computing and the defaults are always... less than ideal. Without a beefier budget and a lot of luck (cause you need the right people) it's hard to have a secure system while maintaining a friendly, open atmosphere. Which, as you know, is critical to a research lab.

Also,

> from a different group

Sounds like it was more like a different sub-team of the same group.

From what I heard I'd also argue that this could be told as a weak supply chain attack story. Like, if someone you know from your school re-trained a CLIP with private data, would you really think twice and say "safetensors or I'm not going to use it"?

14. xvector ◴[21 Oct 24 16:23 UTC] No.41905741{8}[source]▶

>>41903048 #

That level of security is true for most big tech companies :) You mistake thinking that large and well funded = secure. They clearly have an audit trail but no preventative controls, which is sadly the standard for move fast environments in big tech.

15. runeblaze ◴[21 Oct 24 17:05 UTC] No.41906101{8}[source]▶

>>41903048 #

I think we are converging at an opinion. Internal actors can be hard to detect, and honestly there is a reason at places like Google interns are treated with heightened security checks (my guess -- they learned to do so after some years).

Btw one of the rumors has that it is even difficult to hire engineers to do training/optimization infra at one of those ML shops -- all they want to hire are pure researcher types. We can imagine how hard it will be to ask for resources to tighten up security (without one of these incidents).

16. panenw ◴[23 Oct 24 05:45 UTC] No.41922162{5}[source]▶

>>41901750 #

where did you find what he did?