←back to thread

36 points dataflowmapper | 3 comments | | HN request time: 0.617s | source

Hi all,

I’m curious how folks handle the prep work for data imports/uploads into systems like Salesforce, Workday, NetSuite, or really any app that uses template based import for data loading, migration, or implementation.

Specifically: - How do you manage conversions/transformations like formatting dates, getting everything aligned with the templates, mapping old codes to new ones, etc.

- Are you primarily using Excel, custom scripts, Power Query or something else?

- What are the most tedious/painful parts of this process and what have you found that works?

Really appreciate any insights and am curious to learn from everyone's experience.

Show context
PaulHoule ◴[] No.43986506[source]
"Scripts" in Python, Java and other conventional programming languages (e.g. whatever it is you already use)

Not Bash, not Excel, not any special-purpose tool because the motto of those is "you can't get there from here". Maybe you can get 80% of the way there, which is really seductive, but that last 20% is like going to the moon. Specifically, real programming languages have the tools to format dates correctly with a few lines of code you can wrap into a function, fake programming languages don't. Mapping codes is straightforward, etc.

replies(3): >>43986637 #>>43987179 #>>43987624 #
1. dataflowmapper ◴[] No.43987179[source]
Yeah programming definitely offers most flexibility if you have that skillset. I'm particularly interested in your 'last 20% is like going to the moon' analogy for special-purpose tools or even Excel/Bash. Do you have any examples off the top of your head of the kinds of transformation or validation challenges that you find fall into that really difficult 20%, where only a 'real programming language' can effectively get the job done?
replies(1): >>43988303 #
2. PaulHoule ◴[] No.43988303[source]
For one thing a lot of tools like Excel do unwanted data transformations, such as importing from a CSV they try to guess whether text is meant to be a string or a number and sometimes guess wrong. You can spend a lot of time disabling this behavior or preventing it or fixing it up afterwards, but frequently bad data just plain corrupts the analysis or target system.

You can't really trust the time handling functions on any platform which is some of the reason why languages like Python, and Java, might have two or three libraries for working with dates and times in the standard library because people realized the old one was unfixable. Plenty of Javascript date handling libraries have told people "look, this is obsolete, it's time to move on" not just because that's the Javascript way, but because the libraries really were error prone.

In a real programming language it's straightforward to fix those problems, in a fake programming language it is difficult or impossible.

If you've got a strange index structure in the source or destination data, for instance, many tools fall down. For instance if you want to convert nested set trees

https://blog.uniauth.com/nested-set-model

to something more normal like an adjacency list (or vice versa) a lot of simple tools are going to fall down.

replies(1): >>43991614 #
3. dataflowmapper ◴[] No.43991614[source]
Gotcha, totally agree on those points. I think everyone's dealt with the Excel typing crap. My team uses Workato for some integrations and we use scripts any time we need math because of questionable precision so I see your take on the unreliable functions part.