Justifying a Big Data Project – Good Math – Bad Presentation

IT investment decisions are easy. Right? If you’re projections show that you’ll get back more than you spend, in either cost savings or increased revenue, you do it. Sounds easy. It usually is not. But, it needs to be.

Let’s say you’re exploring a big data or a master data project. You know you should do it. You know that it makes business sense. But, the finance guys want hard numbers, and that’s often not easy to get.

CFOs often demand to see some pretty complicated numbers such as ROI. NPV. IRR. Payback period. So, business folks go to great lengths to come up with those calculations.

But, I’ve found, in the end, to get the sale, you’re analysis and presentation need to be brain-dead simple and a no-brainer. If you aren’t comfortable with the numbers, if you don’t fully understand them and can explain them really easily and convincingly, don’t even bother to go to the top to ask for the budget approval. The answer should slap you in face. “Yes, of course, we have to do it,” must be the obvious conclusion. And remember, it all has to be measurable.

The most important thing I learned in business school was how to do analysis on the back of a napkin. Literally, you should be able to outline the ROI for a business project on a napkin. I’ve done it before. I once helped convince the management team of a startup to sell the company and lock in a good return, rather than continue to invest for another three years in the hopes of a higher return, by scribbling a few numbers on my coffee stained napkin (I drink a lot of coffee) in a staff meeting.

A. Bird-in-hand return now: $10/share offered by a potential acquirer.


B. Potential return in three years. = ($15/share)

  • Revenue would be 70% higher (20% per year increase target)
  • Stock price of 4X revenue. (Typical for a company growing 20%)
  • Stock dilution of 25% because we’d need to raise $10 million

=$10*(1+(0.7*.75))=$15.00 per share (potentially)

The simple result was a modest potential upside. I did not bother to risk adjust anything or do NPV with a fancy calculation. My colleagues knew the incredibly high market risks in their minds. We were in a very competitive market and needed to make significant product enhancements to remain competitive.

The decision was a no-brainer. We took the deal.

Yes, we put the whole thing in a fancy spreadsheet later, but that was really all a formality. The real decision had been made in that conference room on that napkin.

You should apply a similar approach when you’re trying to get buy-in for a Big Data Analytics or Master Data Management or other strategic data project.

Let’s look at two manufacturing companies. Both make, or have made, acquisitions fairly regularly. Both IT departments knew they needed to handle their master data better. They had all the usual problems – data silos, incomplete data, quality problems, imperfect customer service, etc. Both had lots of inefficiencies because various groups didn’t know what other groups were doing. Both companies had the idea to integrate big data across their various divisions so they cold run more analytics to optimize their businesses.

While their challenges were similar, each company took a different approach to justifying the project. One tried to justify the project via increased sales. The other through reduced costs.

1. Industrial materials manufacturer – ROI would come from increased sales – better cross-selling and thus higher productivity for the telesales staff.

2. Air conditioner manufacturer – ROI would come from the cost savings derived from reducing the cost of maintaining master data across multiple systems and divisions. E.g. Much easier to enter new customers or modify customer information enterprise-wide.

One was way easier to calculate and measure than the other. Guess which one got funded much faster.

Company 1 stated that increasing telesales productivity by 15% would way more than pay for the project. It got funded right away. They also projected a variety of cost savings. But, the obvious advantage of the increased sales was the most convincing number. The rest was gravy. The project is implemented and the results are exceeding their expectations.

Company 2 collected a lot of data and wrote a 10-page report and 15-slide presentation basing their justification on reduced data maintenance costs of IT and LOB personnel. They calculated that they spent tens of thousands of IT man-hours per year in master data related activities, and significantly more with LOB personnel in the business units. By making those processes and those employees more efficient, they estimated $5 million in annual savings, far more than the cost of the project. They calculated an NPV of the savings of $10 million and IRR of 170%. But, it took 10 pages and 30 minutes to explain.

Working with outside consultants deeply knowledgeable and experienced in master data and data quality projects, they came up with twelve ways to save money across a variety of groups and processes, totaling many hundreds of employees. For each of the twelve different processes and types of personnel, they estimated different productivity improvement coefficients, ranging from 5 to 25%. They calculated that they’d save millions from reducing both master data maintenance and data errors. They built a big spreadsheet to calculate the savings. They transposed the spreadsheet into a few PowerPoint slides, each with about 40 or 50 numbers on them.

Great analysis.

Bad presentation.

They are still working towards getting approval. They need to simplify their approach. They also need to make sure the results are clearly measurable. It’s hard to track man-year savings across many divisions and job functions. Perhaps, they should concentrate on one major group and apply the average of all the productivity coefficients and come up with a few simple measures that justify the project and can be measured. All the detail is great, but present it in a highly simplified way.

I have a background in statistics and math. I’m somewhat of a geek. I like numbers. But, first and foremost, I’m a businessman. I have a steadfast belief that when you are making business decisions, throwing more math, and especially throwing higher-level math, at decision-making can easily result in diminishing returns. If you can’t very easily and quickly explain the numbers to your bosses with full confidence, then don’t even bother. Simplify it all first.

Build an ROI-based business case for your big-data project.

Big Data Analytics are increasingly essential to gaining competitive advantage.  But, where do you start?

Intelligently analyzing more data results in better business decisions. Right? I should just dig in and do it. Right? Well, not necessarily. As the volume of structured, unstructured and semi-structured data accelerates, you should start by answering a few business-oriented questions:

  • Where, when and how do I make big data a strategic advantage?
  • Which of my business processes will benefit the most from big data analytics?
  • After you make those strategic business driven data strategy questions, then you ask the technical and project questions:
  • How will I deal with large and rapidly growing data volumes and poor performance?
  • How do I integrate and analyze new data sources, such as unstructured data?
  • What tools do I need to achieve this?
  • How do I get there?
  • How big an effort will it be?

So, you are asking, “How can Big Data technologies, tools and processes transform my organization with game-changing capabilities?”

Your approach to big data analytics should start with business strategy.  Target business processes where a data-centric approach can drive significant improvements.   What data, analytics and KPIs will provide a significant business ROI?  Before you can accurately determine ROI, your first technical step should be to evaluate your data quality and completeness.  You need to know how much work you have to do in terms of data cleaning, ERP systems enhancements and how much new data you are going to have to collect.  For example, you might have to alter your business systems to make sure you are collecting good data on an ongoing basis.  Once you know the amount of work needed, you can build an accurate ROI-based business case.

Once the business case is made, you’ll dive into choosing specific technologies.  There are lots of choices to make, including analytics, business intelligence, data visualization, in-memory technologies, columnar and MPP databases, Hadoop-based systems, data warehouse appliances, big data integration and cloud storage platforms.  Make your choices with sustainability and evolution at the center of your thinking, so that you can continue to benefit from, and expand, your investments, building on them, as opposed to building a one-off.

Evaluating, installing, configuring and implementing cutting-edge in-memory database appliances or real-time data warehousing solutions is exciting.  They promise the advantage of high-capacity, parallel computing performance for your big data endeavor.  But remember, these technology decisions are not made in a vacuum.  They are made with business process and ROI at the forefront.  And, make sure your solution is designed to be flexible, and scalable, in terms of performance with future add-on capacity to avoid unnecessary up-front costs due to over-provisioning.  Keep your eye on the ROI ball.

Share This: Facebook Twitter Linkedin Email

Improve Your Data Quality, A starting point for your Data Management strategy – A Business Focused Approach

Click the link/title to see the video.

Share This: Facebook Twitter Linkedin Email

Optimizing the SAP BW Solution using SAP Data Services 4.0 and Preparing for In Memory Database Solution such as HANA.

Expedien and Kennametal Present – Optimizing SAP BW Solution using SAP DS 4.0 and In Memory DB solution such as HANA

Share This: Facebook Twitter Linkedin Email

HANA – Next Wave of High Performance BI

HANA stands for High Performance Analytic Appliance. HANA, is what is called an in-memory appliance, which means HANA loads substantial amounts of data from traditional disk storage into real computer memory, which allows the data retrieval and logical processing within memory at light speeds.  SAP’s In-Memory Appliance (SAP HANA) enables organizations to instantly explore and analyze all of their transactional and analytical data from virtually any data source in near real-time. Delivered on optimized hardware, HANA realizes the efficient processing and analysis of massive amounts of data by packaging SAP’s intelligent use of in-memory technology, columnar database design, data compression, and massive parallel processing together with essential tools and functionality (i.e. data replication, analytic modeling etc.), business content, and the SAP BusinessObjects Business Intelligence (SAP BusinessObjects BI) solutions.  In-memory computing holds data in RAM instead of being read from disks, providing a performance boost. HANA, which SAP launched last year, can tap data from both SAP and other sources, and the company has also started rolling out a series of specialized applications aimed at specific business problems.

How can you leverage HANA?

  • Organizations who do not want to disrupt their existing Business Intelligence setup, HANA can be deployed as a high performance “side-by-side” data mart to provide “real-time“ reporting and analytics. Existing BW customers  can deploy HANA (with its BAE component) in a “BWA for BW” mode and receive the in-memory acceleration features equivalent to a BWA appliance.
  • HANA can be leveraged as a high performance “side-by-side” data mart to your existing data warehouse or can take the place of a data warehouse (DW).
  • HANA can be deployed as the in-memory acceleration engine for BusinessObjects Explorer Accelerated version.
  • HANA can replace your existing backend data warehouse database system.

The benefits to the SAP customer base for HANA are significant. HANA will allow real-time decision making in areas never possible. An example given is the CEO of a big company like SAP can get any information about any SAP sales pursuit in any part of globe or product almost in real time (in seconds) on his iPad.

Share This: Facebook Twitter Linkedin Email

A glance at SAP data migration methods….

What are the various methods available for SAP Data Migration?  I studied few ongoing prominent SAP Data Migration projects and had a discussion with our Data Migration team. As per my understanding, there are three popular methods for SAP data migration from legacy systems and/or old SAP R/3 to new SAP ECC system.

  • SAP Best Practices – Pre built contents based on SAP Data Services (ETL) that utilizes primarily IDOCs to load data into SAP.
  • LSMW – A utility by SAP that utilizes flat files to load data into SAP
  • Custom Developed Programs – Uses SAP BDC programs and flat files.

Each method has its advantages and disadvantages. I will discuss what I know about these methods, advantages and disadvantages of one method vs. another, challenges faced by clients by using any of these methods etc.  In this blog, I will talk about SAP Best Practices. In subsequent posts, I will discuss LSMW, Custom Developed Programs, Advantages, Disadvantages, Challenges etc.

 SAP Best Practices Method

Let’s talk about data migration from legacy(non-SAP) systems to SAP system. This includes new SAP customers as well as current customers who are bringing in new plants, new business units, etc., and need to convert data to a SAP ECC system.  SAP Information Lifecycle Management (ILM) is used for system decommissioning or data retention and archival. It is beyond the scope of this discussion at this time.

This method utilizes loading of data into SAP primarily by IDOCs. SAP acquired Business Objects tools such as Business Objects Data Integrator ETL, Data Quality (First Logic) and bundled it together with a new avatar “SAP Data Services”. The core strength of Business Objects Data Services, earlier known as Business Objects Data Integrator ETL or Acta ETL has been tight integration with SAP. This ETL tool was primarily used for SAP data extraction since its inception in 1998 or so. I have seen the evolution of tool from Acta 1.1 to SAP Data Services XI 4.x. There are some other Business Objects software too used in migration such as Data Insight (Data Profiling tool), Metadata Manager (these two tools now known as Information Steward) and some reports, but SAP Data Services is where the bulk of the work takes place. For those who don’t know – Business Objects America acquired a company Acta Technology in 2002 or so and SAP acquired Business Objects Americas in 2007. Business Objects renamed the Acta ETL as Business Objects Data Integrator after Acta acquision and later SAP renamed it as SAP Data Services.

Acta also offered SAP Rapid Marts. Rapid Marts are out of box pre-packaged Acta ETL code and target database schema based on Oracle or SQL Server databases for extraction of data from various SAP modules such as SD, IM, FI, CO, GL, HR and so on.  The value proposition of Rapid Marts has been that it gives a jump start to SAP customers in terms of getting data out of SAP quickly. Customers are generally able to leverage 65-70% of out of box Rapid Mart contents in its AS IS mode. Remaining contents can be easily customized based on customer’s SAP configuration etc. and generally entails addition of deletion of fields in tables in Rapid Marts, extraction of SAP custom table(s) if any etc. These Rapid Marts are standard SAP Data Mart offerings from SAP based on SAP Data Services now.

SAP has developed similar out of box SAP Data Services ETL codes for data migration to SAP based on standard SAP ECC Master data structures. These are called Best Practice(BP) Content for Data Migration.  It is also known as SAP AIO BP, which is nothing but “SAP Business All-in-One” Best Practices. It is confusing to see so many new SAP terms but don’t let it scare you. SAP is pioneer in coming up with new buzzwords however core contents remain more or less the same behind the scenes.

The BP content for Data Migration can be found under the Data Migration, Cross-Industry Packages in Best Practices section in HELP portal.    This content has everything you need to get started on migrating non-SAP data to an SAP system.  The content includes the following:  guides to install SAP data services and other components required for the migration, actual content to load that includes jobs to load data into SAP via IDOCs, mapping tools to help you map the non-SAP data to the IDOC structure, and some reports.   It includes IDOC mapping and structures for objects like Material Master, Vendor Mater and Customer Master, Pricing, BOM, Cost element, Payables and Receivables contents.   There are detailed word documents on each piece of content, for example a document on Material that is a 39 page word document, covering the IDOC structures, what you need to know, and how to map data to the structure.

SAP also provides standard data migration methodology, framework, templates, based on SAP Best practices and SAP Data Services. Methodology has components – Analyze, Extract, Cleanse, Validate, Upload and Reconcile legacy data into a SAP ERP environment.

This method of data migration using SAP Best Practices and IDOCs work very well in case no customization is required for data migration. What it means is that if a customer has standard SAP ECC vanilla implementaion, this method works just GREAT. For example, a SAP Best Practices per built job for material master loads the data as per standard ECC Material Master IDOC structure. In case customer needs more fields, or a custom table is to be loaded in Material Master, it is easy to modify or add to SAP Best Practices ETL code however along modify BP code will not suffice. Corresponding SAP IDOCs need to be modified or extended as well which may or may not be allowed by customer’s SAP Basis team. Customer will also need SAP ABAP/IDOC expertise on the project to modify IDOC structure. Many customers don’t prefer to modify standard  IDOCs.

Another scenario where SAP Best Practice will not work is if there is no one to one mapping between the input and output data. In other words if master data element to be convereted into SAP ECC is dependent on more than one dimension of input data, SAP Best Practices will not work. Let’s take an example, if sales org A in legacy system is to be converted into sales org B in SAP ECC, SAP Best Practice will work great. However if there are three sales orgs A, B, C in legacy systems and there is needed only one sales org D in SAP ECC with value dependent on three dimensions such as Sales org, Plant, Country code in source data in legacy systems, SAP Best Practice can’t handle this conversion scenario at least as of today.  In this case, a good amount of customization needs to be done in SAP Best Practices code, tables, scripts etc which may not be worth the efforts and may impact the integrity of SAP Best Practices contents dependent on the modified content/code.

A similar approach is taken for data migration from one or many SAP systems, Legacy System to SAP ECC system. In this option, maybe you have multiple SAP systems on different releases, so one on 4.6c, 4.7 and you want consolidate to a single ECC 6.0 system.  You can use SAP Data Services to extract data from old SAP system, non-SAP system and use same methodology, framework and SAP Best Practices to load data into SAP ECC similar to what we discussed above.

Share This: Facebook Twitter Linkedin Email

Poor Data – Don’t Just Treat Symptoms, Treat The Cause.

The Data Warehousing Institute estimates that data quality problems currently cost U.S. businesses over $600 billion annually. Even with these figures to guide us, it is still very difficult to use metrics to determine the cost of poor data quality and its effects on your organization. This is because making the mistake may be too distant from recognizing the mistake. Errors are very hard to repair, especially when systems extend far across the enterprise, and the final impact is very unpredictable.

Have you ever considered how much time and resources your organization spends on correcting, fixing and analyzing corrupted or erroneous data? What about the cost of delayed information exchange or lost revenue due to misplaced data or incorrect input? Evaluating data and determining errors is a time consuming process, not to mention the time needed to correct them. In a time of decreased budgets, some organizations may not have the resources for such projects and may not even be aware of the problem. Others may be spending all their time fixing problems leaving no time to work on preventing them.

According to several leading data quality managers, the cost of poor data quality may be expressed as simple formula that equates into:

Cost of Poor Data Quality  = Lost Business Value + Cost to Prevent Errors + Cost to Correct Errors + Cost of Validation

Loss of Business Value can be HUGE and can lead to business interruptions as well. Let’s use an example to illustrate the cost of fixing an element of poor data.

  1. A staff person spends about 40% of their time each day on this task
  2. There are five  people performing this operation (5 x 3.2 hours = 16 staff hours per day.)
  3. Accounting tells you that these people earn  $45 per hour (payroll + benefits.)
  4.  Total annual hours of cleanup is 4000 hours annually (16 staff hours x 250 annual working days.)This means the annualized cost to fix the known poor data is $180,000.

This cost of the poor data quality extends far beyond the cost to fix it.  It spreads through and across the business enterprise like a virus affecting systems from shipping and receiving to accounting and customer service. Eventually, your customers may lose patience with you, and you may lose their business.

Let’s look at traditional approach to cleanse the data when ever data quality issue is recognized in a business. The traditional approach to correct bad data fixes the bad data that’s already been created with data quality or ETL tools. This generally happens whenever there is an urgent need to fix the bad data either because of needs arisen from a data migration effort or a business problem.

This approach suffers from three problems:

First, data cleansing and repository building are almost always carried out on a project by project basis. Even if the project is successful, and bad data is transformed to good data, the repository starts to degrade in absence of any ongoing data quality sustenance measures. More and more newly created bad data will creep into the system. And the data already cleansed start getting stale. Data has a shelf-life and needs constant care and feeding. Without addressing how bad data is created, these solutions are costly and unsustainable.

Second, it’s difficult to get the business side fully committed to and involved in these projects. Without a change of mindset, data continues to be seen as IT’s responsibility. And to exacerbate the problem, the software tools used were meant for an IT user base, which leaves the business without a way to directly participate in the process. Without full and sustained business engagement, these projects often do not yield anticipated benefits.

Third, it is very, very hard to fix bad data using technical tools alone. A computer algorithm for data cleansing, no matter how cleverly constructed, can only address a very small subset of data problems. A data cleansing package would not even be able to detect that there is a problem, let alone fix it.

However, by and large these efforts treat the symptoms of disease that surfaced, rather than addressing the root cause. Strictly speaking, these projects represent a cost of bad data in addition to degradation of business performance.  Organization can take it as an opportunity to find root cause of bad data and identify people, process or technology related issues.  Once the root causes are identified, there MUST be a data governance strategy sponsored, implemented and owned by business.

The bottom line is data ownership and data contents shouldn’t be IT’s responsibility. With data volume and complexity exploding, the treadmill is spinning faster than the traditional approach’s ability to keep up.