←back to thread

707 points namukang | 1 comments | | HN request time: 0.223s | source
Show context
jokethrowaway ◴[] No.29256434[source]
Really neat, that's the kind of stuff I always wanted someone to build. I think a marketplace of workflows would be a great next step, so that you can have someone else maintaining the flows.

I build tons of scraper and things that pretend to be browser (handcoded, not recorded from the browser - but lighter than spinning up a real browser) and the harder bit is keeping the flows maintained. Some websites are particularly annoying to work with because of random captchas jumping in your face but it's something you can handle by coding support for the captcha in the flow and presenting a real user the captcha.

One problem of logging in the cloud is IP checks. You may be asked to confirm.

If you want to look into this issues I'd recommend scraping yandex for dealing with captchas being thrown in your face and authed google or facebook for IP restrictions, weird authentication requests.

Again, I think a marketplace could outsource these problems to a community of developers maintaining flows.

Security could be another concern, but you always have the option of running things locally.

replies(1): >>29256827 #
1. dkthehuman ◴[] No.29256827[source]
For sure! I'll definitely be exploring the marketplace idea. Currently, you can share flows and import flows that others have shared, but there isn't (yet) a nice way to discover ones others have made or charge for flows you've made.

Maintaining flows as sites change is definitely a drawback for any scraping solution, so I built features like generating selectors by pointing and clicking to make it as easy as possible.

Browserflow Cloud has built-in support for rotating proxies and solving CAPTCHAs to get around the issues you mentioned. (They're currently in private beta.)