Default HubSpot Blog

Are You a Data Scientist or Data Janitor?

Jul 17, 2019 2:45:14 PM / by Robert Baer

In 2016 “Data Scientist” was touted as the year’s hottest job. Glassdoor named the role as one of the “best jobs in America.”[1] Computer Science[2] quoted an analyst as saying this position had the power to create a better future for big data. However, three years later, there is still an ongoing conversation as to what a data scientist’s role actually entails.

Does a data scientist focus on the big picture? Are they managing and analyzing large, complex data sets and developing tools that will allow employers to use that information to drive their businesses forward? Or are they focused on the minuscule? Such as the day-to-day planning, implementing, coding and database cleaning.

Some in the industry argue that when a data scientist devotes the majority of their time to database cleansing and management, they are effectively a “digital janitor” and not a true data scientist. The rationale being that someone who is passively scrubbing a database instead of proactively innovating cannot effectively steward in the future of big data.

Does Your Data Have Structure?

No business can thrive on bad data. Bad data affects every aspect of the project lifecycle; from initiation to planning, execution and closure. Inaccurate data drastically slows down projects and costs valuable time and money.

Analysts who are spending a very large percentage of their time vetting and validating the ingested data aren’t able to provide the kind of useful analysis that is needed to drive the business forward. This can also be frustrating for talented data scientists, who want to be challenged and contribute to the company’s financial successes, not scrubbing data sets.

Businesses grow and change every day; with growth comes new objectives, ideas and formats. This is where having a solid data structure in place comes into play. Data needs structure to be effective and easily and quickly updated. Without structure, old data typically is not filled in with the same data being collected today and over time it becomes data becomes outdated, incomplete and fragmented, not only from the point of entry - but also across different systems within a company.

Cost

It’s no surprise that bad data costs company’s money. However, the actual amount is mind blowing. Bad data costs U.S companies three trillion dollars per year, according to IBM. A study by Gartner has found that most organizations surveyed estimate they lose $14.2 million dollars annually.

That’s a lot of cash. But it makes sense if you look at it through the lens of the 1-10-100 rule. It takes $1 to verify a record as it’s entered, it costs $10 to fix it later and $100 if you do absolutely nothing. Being passive in the short run will cost you a lot more in the long run.

Missed Opportunities

Unstructured data leaves a company vulnerable to making poor business decisions and missing out on lucrative opportunities. Without processes in place to enhance or fill in the blanks of existing data sets as the 1-10-100 rule illustrates, the problem will grow exponentially over time.

Good data provides businesses with accurate, detailed prospect profiles. These profiles are crucial to zeroing in on potential new customers. These customers can be easily missed or not pursued if your sales and marketing team aren’t proactively pursuing them. You need accurate customer insights in order to compile what your ideal prospect looks like in order to drive sales.

Milkshake marketing is the best example of how leveraging data to accurately target your most likely customers increases sales. In his book, Competing Against Luck, Harvard Business School professor Clayton Christensen discusses how McDonald’s had to change their milkshake marketing strategy when they discovered that their target demographic for milkshakes wasn’t children, it was adult commuters. McDonald’s was able to alter their product and their marketing strategy to better serve the early morning commuters who needed a sweet treat to get them excited for the workday. The change in marketing strategy resulted in almost immediate success for the ubiquitous fast food chain.

Data Doesn’t Need to be Bad

Businesses today can capture more information than ever before in multiple ways. Digital interactions with customers on mobile and stationary devices provide a footprint of interactions and inquiries. The Internet of Things has provided for billions of connected devices and objects fitted with tools to measure, record and transmit information. Customer purchases, responses to ad campaigns and marketing messages can be captured and analyzed.

The technology is available, but not all data scientists are utilizing the resources available in the market to capture and manage clean data.

Fill in the Gaps

Turning big data into valuable insights requires data analytics programs that today can easily capture, store, analyze, display and report on information from myriad sources. These internal insights provide business leaders with the information needed in real time to make better decisions.

There are solution providers that offer to help “fill in the gaps” of unknowns to your data sets. Choosing a trusted vendor can be difficult and will require due diligence. However, in order to successfully fill in the gaps, most companies need to partner with a provider.

Large companies that are heavily investing in data science are also looking into technology options that can take a vendor’s business file and use it as a base to overlay their unique data sets over.

Infogroup’s base business file will ensure daily/monthly updates are available to baseline data. This ensures data will stay current and frees up data scientists to focus on ‘big picture’ problems and alleviates them from having to worry about the the day-to-day issues of database maintenance.

Utilizing Infogroup’s business file can also help to reduce the data points needed during initial entry. In fact, several large insurance companies use Infogroup’s base file along with their own data and other vendors such as OSHA to reduce the number of questions asked to a business owner during the initial risk analysis phase.

Creating a Sustainable Data Ecosystem

In layman's terms, Customer Relationship Management Solutions (CRMs) are built to give businesses clear insights into their customers, as well as the relationships that they hold with them at any given time. The data that CRMs contain should be used to enhance marketing messages, personalize customer service experiences, and direct sales strategies. The ability to achieve the latter tasks is intrinsically tied to the quality of the data.

Unfortunately, far too many businesses have unsustainable data ecosystems. Take, for example, email addresses. Digital marketing success relies on delivering messages to the right email addresses. In fact, undeliverable email addresses can not only skew the ROI of an entire campaign, but they can also damage a company's sender reputation. To make matters worse, the average email list will decay at an estimated 22 percent per year, which means that you could be sending your messages to an empty void, and thus interpreting their results based on inaccurate data.

The good news is that by implementing email verification into your data ecosystem, you can increase the quality of your data. The next steps will require you to succinctly refine, match, append, and filter your gathered data to ensure that your business decisions are based on accurate data. With high quality data, you can elevate your marketing efforts, gain 360 degree insights into the customer's journey, and turn real-time alerts into higher conversions. In short, leveraging the full capabilities of your CRM will streamline your sales process, marketing efforts, and business directives to effectively deliver higher ROIs across marketing channels.

Conclusion

Data scientists need the proper tools in place in order to distill the value from big data sets and turn it into actionable knowledge for their companies. Without the right resources, data scientists will turn into digital janitors, endlessly scrubbing data sets.

Don’t let bad data sink your project before it gets off the ground. Infogroup can structure your data so it is updated in real time to help you achieve your business objectives. If you want to maximize the potential of your CRM, then you will need to update bad data with clean, verified data in real time. Infogroup is the trusted solution in the market to help you match and append your data in real time. Through proven CRM integration solutions, Infogroup can help your business define a winning strategy that leverages the power of a sustainable data ecosystem. By using your CRM to create an intersection between the customer facing front-end of your business, and the operational back-end, you can gain a true competitive advantage. To discover your competitive advantage, contact a member of the Infogroup team to learn more about their data and software-as-a-service (DaaS & SaaS) solutions.

 


[1] https://www.glassdoor.com/Award/Best-Places-to-Work-LST_KQ0,19.htm

[2] https://www.computerworld.com/article/3025440/why-data-scientist-is-this-years-hottest-job.html

Topics: big data, database, structured data, data science, clean data

Robert Baer

Written by Robert Baer