←back to thread

36 points dataflowmapper | 1 comments | | HN request time: 0.198s | source

Hi all,

I’m curious how folks handle the prep work for data imports/uploads into systems like Salesforce, Workday, NetSuite, or really any app that uses template based import for data loading, migration, or implementation.

Specifically: - How do you manage conversions/transformations like formatting dates, getting everything aligned with the templates, mapping old codes to new ones, etc.

- Are you primarily using Excel, custom scripts, Power Query or something else?

- What are the most tedious/painful parts of this process and what have you found that works?

Really appreciate any insights and am curious to learn from everyone's experience.

Show context
nonameiguess ◴[] No.43988155[source]
This was roughly half my job a decade ago. Literally because I split my time 50/50 between two separate projects, one of which was a business analytics automation project that was designed for exactly this purpose. Take all of the data exported from the various project management and issue tracking systems used by different product teams, transform as necessary, then import to a common corporate analytics system upper management used to track progress metrics.

The idea was to replace all of the bespoke custom Excel workflows that various program management teams used to do it themselves, so we were self-funding based on the estimated cost savings of replacing manual work, meaning I had pretty much free reign to do it however I wanted as long as I could get people to use it. I did it entirely in Python, making heavy use of Pandas, creating a toolchain that ingested mapping definitions from the users and relied on fuzzy matching (i.e. stemming, de-capitalization, recognizing word roots) to categorize field names into a common definition set used by the corporate system.

It wasn't a 100% solution, more like 95%, but the reality of this kind of work is we had a real garbage-in problem anyway that no software could solve. Extracting data from an issue tracker relies upon developers using it correctly in the first place. If you have a bunch of workflow transitions but whoever works an issue just transitions them all at once when work is complete, there's no way to know how long everything actually took. You can only extract information that was actually recorded. Coercing dates into a single format standard is fairly simple to do, but if the dates were wrong in the original format, they'll still be wrong in the corrected format.

replies(1): >>43991742 #
1. dataflowmapper ◴[] No.43991742[source]
Thanks for sharing, good point about garbage in and garbage out. I mostly work with software implementation so this use case is pretty insightful.