Most active commenters
  • consumer451(3)

←back to thread

1015 points QuinnyPig | 20 comments | | HN request time: 1.543s | source | bottom
1. consumer451 ◴[] No.44564348[source]
Important details from the FAQ, emphasis mine:

> For users who access Kiro with Pro or Pro+ tiers once they are available, your content is not used to train any underlying foundation models (FMs). AWS might collect and use client-side telemetry and usage metrics for service improvement purposes. You can opt out of this data collection by adjusting your settings in the IDE. For the Kiro Free tier and during preview, your content, including code snippets, conversations, and file contents open in the IDE, unless explicitly opted out, may be used to enhance and improve the quality of FMs. Your content will not be used if you use the opt-out mechanism described in the documentation. If you have an Amazon Q Developer Pro subscription and access Kiro through your AWS account with the Amazon Q Developer Pro subscription, then Kiro will not use your content for service improvement. For more information, see Service Improvement.

https://kiro.dev/faq/

replies(4): >>44565507 #>>44565980 #>>44567912 #>>44569042 #
2. srhngpr ◴[] No.44565507[source]
To opt out of sharing your telemetry data in Kiro, use this procedure:

1. Open Settings in Kiro.

2. Switch to the User sub-tab.

3. Choose Application, and from the drop-down choose Telemetry and Content.

4. In the Telemetry and Content drop-down field, select Disabled to disable all product telemetry and user data collection.

source: https://kiro.dev/docs/reference/privacy-and-security/#opt-ou...

replies(1): >>44566830 #
3. lukev ◴[] No.44565980[source]
This brings up a tangential question for me.

Clearly, companies view the context fed to these tools as valuable. And it certainly has value in the abstract, as information about how they're being used or could be improved.

But is it really useful as training data? Sure, some new codebases might be fed in... but after that, the way context works and the way people are "vibe coding", 95% of the novelty being input is just the output of previous LLMs.

While the utility of synthetic data proves that context collapse is not inevitable, it does seem to be a real concern... and I can say definitively based on my own experience that the _median_ quality of LLM-generated code is much worse than the _median_ quality of human-generated code. Especially since this would include all the code that was rejected during the development process.

Without substantial post-processing to filter out the bad input code, I question how valuable the context from coding agents is for training data. Again, it's probably quite useful for other things.

replies(4): >>44566597 #>>44566992 #>>44567646 #>>44568596 #
4. consumer451 ◴[] No.44566597[source]
There is company, maybe even a YC company, which I saw posting about wanting to pay people for private repos that died on the vine, and were never released as products. I believe they were asking for pre-2022 code to avoid LLM taint. This was to be used as training data.

This is all a fuzzy memory, I could have multiple details wrong.

5. m0llusk ◴[] No.44566830[source]
Is there a way to confirm this works or do we just have to trust that settings will be honored?
replies(3): >>44566856 #>>44567741 #>>44568474 #
6. consumer451 ◴[] No.44566856{3}[source]
You could place some unique strings in your code, and test it to see if they appear as completions in future foundation models? Maybe?

I am nowhere near being a lawyer, but I believe the promise would be more legally binding, and more likely to be adhered to, if money was exchanged. Maybe?

The "Amazon Q Developer Pro" sub they mention appears to be very inexpensive. https://aws.amazon.com/q/pricing/

7. janstice ◴[] No.44566992[source]
I suspect the product telemetry would be more useful - things like success of interaction vs requiring subsequent editing, success from tool use, success from context & prompt tuning parameters would be for valuable to the product than just feeding more bits into the core model.
8. recursivecaveat ◴[] No.44567646[source]
The human/computer interaction is probably more valuable than any code they could slurp up. Its basically CCTV of people using your product and live-correcting it, in a format you can feed back into the thing to tell it to improve. Maybe one day they will even learn to stop disabling tests to get them to pass.
replies(1): >>44569522 #
9. Waterluvian ◴[] No.44567741{3}[source]
Just like using an AI model, you can’t actually know for sure that it won’t do anything malicious with what interfaces you give it access to. You just have to trust it.
replies(1): >>44569703 #
10. metadat ◴[] No.44567912[source]
This is the inevitable decline where we all eventually don't care about the source code instructions anymore, just like the transition from assembly to C. Sorry in advance, I'm a privacy holdout too, but this isn't the interesting part of what's happening. I tried Kiro and it is on par with Claude or Crystal, nothing special at all.

Within the next couple of years there's going to be a 4-for-1 discount on software engineers. Welcome to The Matrix. You'd best find Morpheus.

Check out the comments on https://news.ycombinator.com/item?id=44567857 and tell me what the alternative future is. Best wishes and good luck.

11. pmontra ◴[] No.44568474{3}[source]
As for everything else: trust, possibly enhanced by the fear of consequences for the other party.

How do we know if random internet service sells our email / password pair? They probably store the hashed password because it's easier (libraries) than writing their own code, but they get it as cleartext every time we type it in.

replies(2): >>44568663 #>>44569692 #
12. nicewood ◴[] No.44568596[source]
I think it's less about the code output, but about the process of humans iterating and adjusting the LLM-drafted requirements and design. Claude Code et al. are good enough, the bottleneck is IMO usually the context and prompt by now. So further improving that by optimizing for and collecting data about the human interaction seems like a good strategy to me.

Essentially, the user labels (accept/edit) data (design documents) for the agent (amazon)

13. Quekid5 ◴[] No.44568663{4}[source]
> How do we know if random internet service sells our email / password pair? They probably store the hashed password because it's easier (libraries) than writing their own code, but they get it as cleartext every time we type it in.

For that, we can just use a unique password per service. That's not really a thing for code.

14. anonnon ◴[] No.44569042[source]
> your content is not used to train any underlying foundation models (FMs).

This implies your "content" may be used for anything else, including training non-foundation LLMs. Frankly, even if their disclaimer were broader, I'd still probably not trust them.

replies(1): >>44569386 #
15. 0xEF ◴[] No.44569386[source]
As you shouldn't! Only a rube trusts a rule for which there is no real enforcement or punishment is a mere fine. If the erosion of consumer privacy has not taught us that simply stating "we won't use/sell your data" is the biggest lie of the 21st century, then I don't know what will.
replies(1): >>44576466 #
16. rusk ◴[] No.44569692{4}[source]
> How do we know if random internet service

Audits. Obviously not every service is going to be in a jurisdiction that proactively audits data processors and controllers. Another thing to consider before you hand over your data.

17. dkga ◴[] No.44569703{4}[source]
Well, you can at least check if there is network traffic to AWS or something similar.
replies(2): >>44569735 #>>44570311 #
18. yurishimo ◴[] No.44569735{5}[source]
But wouldn't that look the same as actually querying the model? Or am I missing the joke?
19. Waterluvian ◴[] No.44570311{5}[source]
There’s always ways to mitigate malicious behaviour once it’s already happening.
20. anonnon ◴[] No.44576466{3}[source]
The well-poisoning effect is especially strong in the AI space based on how blatant the big players have been in disregarding intellectual property law and how their crawlers behave like DDOS bot farms.