Connect is an API you can use in your own products to either pull your customers’ data into your own systems, or push data from your own systems to your customers’, or both. We have first-class support for data warehouses too. And there's a self-hosted deployment option.
You can set up automatic data syncs in any direction between data warehouses, databases (including CDC streaming), cloud applications (e.g. Salesforce, Zendesk, etc), spreadsheets, cloud storage (S3 etc), and arbitrary APIs and webhooks. We take care of authentication, automatically pushing updates, type conversions, rate limits, scaling to handle large volumes, monitoring, and alerting.
You can also sync updates from custom queries powered by data models you define as well join data across disparate systems (e.g. HubSpot + Stripe + Airtable) to sync fields from them in one payload to any destination.
You can see demo videos and code examples for a small subset of use cases here: https://apidocs.polytomic.com/guides/code-examples/overview.
We’d love to take your comments and feedback! Happy to answer questions too.
Amazing work! Two questions:
1. What is the best avenue for potential customers to express interest in the addition of a new data source or sink?
2. I assume this runs on Polytomic owned compute? Is there a way to bring your own compute, or is it on the roadmap?
Thanks!
They've been really great to work with and they've been able to handle our data as we've scaled at Vercel.
Congrats on the launch!
1. All integrations are built on request (see instruction at top of this page to submit one: https://www.polytomic.com/integrations). We're able to turn them around very quickly (our record is two hours from request submission time).
2. Yes, though you can bring your own compute too. We have a self-hosted distribution that you can deploy to your own VPC using Docker and our Terraform module (see details here https://docs.polytomic.com/docs/on-premise-setup). Customers that are public/large enterprises often go for this.
If you want to chat over email shoot me a message at ghalib@polytomic.com.
We automatically do the right thing depending on the situation. The type engine is highly-precise when we generate our own schema. For example, JSON objects and arrays get converted to native struct and array types in data warehouses (e.g. Databricks, BigQuery) rather than strings.
But we're forgiving in other situations where we don't control schemas. For example, we'll automatically convert a source string type to a date when mapping to a Salesforce or Zendesk (or other cloud app) date field.
If I'm pushing and pulling customer's data into my own system (e.g. postgres), what sort of customer data are we talking about? Can it sync with data on their own filesystem? That would be neat. Or it just for e.g. syncing a customer's google drive with my postgres type of thing? Looks interesting and like something I may want to use but not sure I entirely grok it yet.
For example, maybe your product needs to process data from your customers' sales and customer support systems. In that case you can have automatic updates show up in Postgres from your customers' Salesforce and Zendesk instances. Does that sense?
Alternatively, many products want to the ability to push data to their customers' systems. For example, suppose you're a billing software vendor showing data in your own portal that customers log in to. Your customers may also want you to automatically push this data to their accounting systems or data warehouses so that they can do their own analysis.
Does that make sense?
Our clients come in multiple languages. You can see the list here: https://apidocs.polytomic.com/guides/native-clients.
You can see here for sample code: https://apidocs.polytomic.com/guides/code-examples/overview.
The full API reference is here: https://apidocs.polytomic.com/api-reference.
Each sync in Polytomic is one-way so we're not forced to deal with collisions all the time.
But you can, on a per-field basis within a sync config, declare that field not to be synced if the destination system already has a value in the corresponding field.
This setting is a proxy for deciding where your source of truth is for each field if you are indeed setting up two-way syncs.
Most customers are pulling data into their data warehouse, then syncing from queries that generate other values back into other systems. This issue doesn't come up there. But customers of ours doing two-way syncs between, say, HubSpot and Airtable or such do need to decide where the source of truth is for each field.
> But customers of ours doing two-way syncs
Could you clarify where two way sync fits in to the picture?
I don't really follow how this approach avoids having to deal with collisions all the time.