Most active commenters
  • mdaniel(7)

194 points kbumsik | 51 comments | | HN request time: 1.648s | source | bottom
1. mdaniel ◴[] No.41889095[source]
All this mocking when moto exists is just :-( https://github.com/awslabs/git-remote-s3/blob/v0.1.19/test/r...

Actually, moto is just one bandaid for that problem - there are SO MANY s3 storage implementations, including the pre-license-switch Apache 2 version of minio (one need not use a bleeding edge for something as relatively stable as the S3 Api)

replies(3): >>41889194 #>>41889525 #>>41896008 #
2. SahAssar ◴[] No.41889194[source]
Do you mean boto (the python SDK for AWS)?

EDIT: They probably do not, I'm guessing they mean https://docs.getmoto.org/en/latest/index.html ?

replies(2): >>41889381 #>>41890011 #
3. flakes ◴[] No.41889381{3}[source]
moto server for testing S3 is pretty great. It’s about the same experience as using a minio container to run integration tests against.

I use this, and testing.postgresql for unit testing my api servers with barely any mocks used at all.

replies(1): >>41890282 #
4. notpushkin ◴[] No.41889525[source]
> there are SO MANY s3 storage implementations

I suppose given this is under the AWS Labs org, they don’t really care about non-AWS S3 implementations.

replies(1): >>41889981 #
5. philsnow ◴[] No.41889926[source]
I'm surprised they just punt on concurrent updates [0] instead of locking with something like dynamodb, like terraform does.

[0] https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

replies(3): >>41890047 #>>41890892 #>>41890916 #
6. mdaniel ◴[] No.41889981{3}[source]
Well, I look forward to their `docker run awslabs/the-real-s3:latest` implementation then. Until such time, monkeypatching api calls to always give the exact answer the consumer is looking for is damn cheating
replies(2): >>41890151 #>>41890250 #
7. mdaniel ◴[] No.41890011{3}[source]
Happy 10,000th Day to you :-D Yes, moto and its friend localstack are just fantastic for being able to play with AWS without spending money, or to reproduce kabooms that only happen once a month with the real API

I believe moto has an "embedded" version such that one need not even have in listen on a network port, but I find it much, much less mental gymnastics to just supersede the "endpoint" address in the actual AWS SDKs to point to 127.0.0.1:4566 and off to the races. The AWS SDKs are even so friendly as to not mandate TLS or have allowlists of endpoint addresses, unlike their misguided Azure colleagues

replies(1): >>41890155 #
8. mdaniel ◴[] No.41890047[source]
I thank goodness I have access to a non-stupid Terraform state provider[1] so I've never tried that S3+dynamodb setup but, if I understand the situation correctly, introducing Yet Another AWS Service ™ into this mix would mandate that callers also be given a `dynamo:WriteSomething` IAM perm, which is actually different from S3 in that in S3 one can -- at their discretion -- set the policies on the bucket such that it would work without any explicit caller IAM

1: https://docs.gitlab.com/ee/user/infrastructure/iac/terraform...

9. chrsig ◴[] No.41890151{4}[source]
it wouldn't be unprecedented. dynamodb-local exists.
10. SahAssar ◴[] No.41890155{4}[source]
> Happy 10,000th Day to you :-D

Sorry, not sure what you mean?

replies(1): >>41890180 #
11. mdaniel ◴[] No.41890180{5}[source]
https://xkcd.com/1053/
replies(1): >>41895121 #
12. notpushkin ◴[] No.41890250{4}[source]
Agreed, haha. Well, I think it should work with Minio & co. just as well, but be prepared to have your issues closed as unsupported. (Pesonally, I might give it a go with Backblaze B2 just to play around, yeah)
13. neeleshs ◴[] No.41890282{4}[source]
There is also testcontainers. Supports multiple languages. Uses containers though.

https://testcontainers-python.readthedocs.io/en/latest/

14. fortran77 ◴[] No.41890687[source]
Amazon has deprecated Amazon Code Commit, so this may be an interesting alternative.
replies(1): >>41894558 #
15. ncruces ◴[] No.41890892[source]
Google Cloud Storage is good enough to implement locks all by itself: https://reddit.com/r/golang/comments/t52d4f/gmutex_a_global_...

Doesn't S3 provide primitives to do the same? At least since moving to strong read-after-write consistency?

PS: I wrote the above package. Happy to answer questions about it.

replies(1): >>41892398 #
16. noctune ◴[] No.41890916[source]
S3 recently got conditional writes and you can use do locking entirely in S3 - I don't think they are using this though. Must be too recent an addition.
17. Scribbd ◴[] No.41890917[source]
This is something I was trying to implement myself. I am surprised it can be done with just an s3 bucket. I was messing with API Gateways, Lambda functions and DynamoDB tables to support the s3 bucket. It didn't occur to me to implement it client side. I might have stuck a bit too much to the lfs test server implementation. https://github.com/git-lfs/lfs-test-server
replies(1): >>41891514 #
18. tonymet ◴[] No.41891227[source]
how does it handle incremental changes? If it’s writing your entire repo on a loop, I could see why AWS would promote it.
replies(1): >>41894524 #
19. chx ◴[] No.41891514[source]
Client side is, while interesting, of limited use as every CI and similar tool won't work this. This seems like a sort of automation of wormhole which I guess is neat https://github.com/cxw42/git-tools/blob/master/wormhole
20. Evidlo ◴[] No.41891680[source]
For the LFS part there is also dvc which works better than git-lfs and natively supports S3.
replies(3): >>41892558 #>>41896488 #>>41898528 #
21. kbumsik ◴[] No.41892398{3}[source]
Conditional write is just added to S3 2 month ago: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3...
replies(1): >>41898472 #
22. bagavi ◴[] No.41892558[source]
Dvc is great tool!
replies(1): >>41892610 #
23. lenova ◴[] No.41892610{3}[source]
I haven't heard of dvc, so I had to google it, which took me to: https://dvc.org/

But I'm still confused as to what is dvc is after a cursory glance at their homepage.

replies(1): >>41892904 #
24. x3n0ph3n3 ◴[] No.41892890[source]
Wow, AWS really wants to get rid of CodeCommit.
25. chatmasta ◴[] No.41892904{4}[source]
It was on the front page contemporaneously with this comment that recommended it, so you know it was an unbiased recommendation. :)
26. zmmmmm ◴[] No.41894393[source]
Just remember, the mininum billing increment for file size is 128KB in real AWS S3. So your Git repo may be a lot more expensive than you would think if you have a giant source tree full of small files.
replies(3): >>41894455 #>>41896027 #>>41896785 #
27. afro88 ◴[] No.41894455[source]
Looks like it uses bundles rather than raw files: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
28. afro88 ◴[] No.41894524[source]
Looks like it uses bundles, which handle incremental changes: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
replies(1): >>41907242 #
29. adobrawy ◴[] No.41894558[source]
In what use case it can be interesting alternativd?

Limited access control (e.g. CI pass required), so not very useful for end users. For machine-to-machine it's an additional layer of abstraction when a regular tarball is fine.

30. milkey_mouse ◴[] No.41894598[source]
You can also do this with Cloudflare Workers for fewer setup steps/moving parts:

https://github.com/milkey-mouse/git-lfs-s3-proxy

31. xena ◴[] No.41895023[source]
How do you install this? Homebrew broke global pip install. Is there a homebrew package or something?
replies(1): >>41896032 #
32. misnome ◴[] No.41895121{6}[source]
How do you know they are in the US?
33. CGamesPlay ◴[] No.41895203[source]
If you are interested in using S3 as a git remote but are concerned with privacy, I built a tool a while ago to use S3 as an untrusted git remote using Restic. https://github.com/CGamesPlay/git-remote-restic
34. mattxxx ◴[] No.41895254[source]
This seems wrong, since you can't push transactionally + consistently in S3.

They address this directly in their section on concurrent writes: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

And in their design: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

But it seems like this is just the wrong tool for the job (hosting git repos).

35. WhyNotHugo ◴[] No.41895724[source]
git-annex also has native support for s3.
replies(1): >>41896403 #
36. remram ◴[] No.41896008[source]
Unfortunately there's been a few vulnerability since that old Minio release. For something you expose to users, it's a problem.
replies(1): >>41896068 #
37. justin_oaks ◴[] No.41896027[source]
That 128KB only applies to non-standard S3 storage tiers (glacier, infrequent access, one zone, etc)

S3 standard, which is likely what people would use for git storage, doesn't have that minimum file size charge.

See the asterisk sections in https://aws.amazon.com/s3/pricing/

replies(1): >>41898990 #
38. mdaniel ◴[] No.41896032[source]
FWIW, their helpers make things pretty cheap to create new Formula by yourself

    $ brew create --python --set-license Apache-2 https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
    Formula name [git-remote-s3]:
    ==> Downloading https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
    ==> Downloading from https://codeload.github.com/awslabs/git-remote-s3/tar.gz/refs/tags/v0.1.19
    ##O=-#   #
    Warning: Cannot verify integrity of '84b0a9a6936ebc07a39f123a3e85cd23d7458c876ac5f42e9f3ffb027dcb3a0f--git-remote-s3-0.1.19.tar.gz'.
    No checksum was provided.
    For your reference, the checksum is:
      sha256 "3faa1f9534c4ef2ec130fac2df61428d4f0a525efb88ebe074db712b8fd2063b"
    ==> Retrieving PyPI dependencies for "https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz"...
    ==> Retrieving PyPI dependencies for excluded ""...
    ==> Getting PyPI info for "boto3==1.35.44"
    ==> Getting PyPI info for "botocore==1.35.44"
    ==> Excluding "git-remote-s3==0.1.19"
    ==> Getting PyPI info for "jmespath==1.0.1"
    ==> Getting PyPI info for "python-dateutil==2.9.0.post0"
    ==> Getting PyPI info for "s3transfer==0.10.3"
    ==> Getting PyPI info for "six==1.16.0"
    ==> Getting PyPI info for "urllib3==2.2.3"
    ==> Updating resource blocks
    Please run the following command before submitting:
      HOMEBREW_NO_INSTALL_FROM_API=1 brew audit --new git-remote-s3
    Editing /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/g/git-remote-s3.rb
They also support building from git directly, if you want to track non-tagged releases (see the "--head" option to create)
39. mdaniel ◴[] No.41896068{3}[source]
I would hope my mentioning moto made it clear my comment was about having an S3 implementation for testing. Presumably one should not expose moto to users, either
40. doctorpangloss ◴[] No.41896121[source]
https://alanedwardes.com/blog/posts/serverless-git-lfs-for-g...

I’ve used this guy’s CloudFormation template since forever for LFS on S3.

GitHub has to lower its egregious LFS pricing.

41. matrss ◴[] No.41896403[source]
I think this is more about storing the entire repository on s3, not just large files as git-lfs and git-annex are usually concerned with. But coincidentally, git-annex somewhat recently got the feature to use any of its special remotes as normal git remotes (https://git-annex.branchable.com/git-remote-annex/), including s3, webdav, anything that rclone supports, and a few more.
42. matrss ◴[] No.41896488[source]
There is also git-annex, which supports S3 as well as a bunch of other storage backends (and it is very easy to implement your own, it just has to loosely resemble a key-value store). Git-annex can use any of its special remotes as git remotes, like what the presented tool does for just S3.
43. chrsig ◴[] No.41896785[source]
also the puts are 5x as expensive as the get operations
44. laurencerowe ◴[] No.41898472{4}[source]
Unfortunately this functionality is much more limited in S3 as you can only use `If-None-Match: *` to prevent overwrites. https://docs.aws.amazon.com/AmazonS3/latest/userguide/condit...

GCS also allows for conditional overwrites using `If-Match: <etag>` which means you can do optimistic concurrency control. https://cloud.google.com/storage/docs/request-preconditions

replies(1): >>41902206 #
45. kernelsanderz ◴[] No.41898504[source]
I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.
46. kernelsanderz ◴[] No.41898528[source]
Also worth checking out https://github.com/jasonwhite/rudolfs

Been using it to store datasets via lfs. Written in rust and has been very reliable.

47. zmmmmm ◴[] No.41898990{3}[source]
Thank you for highlighting that, I had remembered it wrongly.
48. ncruces ◴[] No.41902206{5}[source]
Yeah, it might still be possible to implement a mutex based on just the existence of an object, but it'll be harder to add expiration/liveness which I find essential.
49. Havoc ◴[] No.41903288[source]
Does this work with other s3 implementations like minio?
50. tonymet ◴[] No.41907242{3}[source]
thanks that's why i enjoy hackernews. great help