Form initial hypotheses (IH) that you can prove or disprove with the data. I’d encourage you to come up with a few primary hypotheses to test, and be creative about additional ones. These IH’s will form the basis of the tests you will analyze in later phases, and serve as the basis for additional deliberate learning. Hypothesis testing will be covered in greater detail in Module 3.
As part of this initial work, identify the kinds of data you will need to solve the problem.Consider the volume, type, and time span of the data you will need to test the hypotheses.Also keep in mind the data sources you will need, and ensure to get access to more than simply aggregated data. In most cases you will need the raw data in order to run it through the models. Determine whether you have access to the data you need, since this will become the basis for the experiments and tests. Recalling the characteristics of big data from Module 1, assess which characteristics your data has, with regard to its structure, volume, and velocity of change.
A thorough diagnosis of the data situation will inform the kinds of tools and techniques to use in phases 2-4. In addition, performing data exploration in this phase will help you determine the amount of data you need, in terms of the amount of historical data to pull,
the structure and format. Develop an idea on the scope of the data and validate with the domain experts on the project.
The chess pieces shown above are a reference to these Harvard Business Review articles. These articles describe how to become experts in various fields, and specifically the amount of practice needed to become an expert. The relevant point in this context is about Deliberate Learning. For building expertise, it is critical to design experiments by first considering possible answers to a question before asking for the answer. In this way, you will come up with additional possible solutions to problems. Likewise, if you spend time formulating several initial hypotheses at the outset of a project, you will be able to generate more conclusions and more expansive findings after executing an analytic model than you otherwise would if you only began your interpretation after receiving the model’s results.
You can move to the next Phase when….
…you have enough information to draft an analytic plan and share for peer review. This is not to say you need to actually conduct a peer review of your analytic plan, but it is a good test to gauge if you have a clear grasp of the business problem and have mapped out your
approach to addressing it. This also involves a clear understanding of the domain area, the problem to be solved, and scoping the data sources to be used. As part of this discussion, you may want to identify success criteria for the project. Creating this up front will make the
problem definition even more clear, and help you when it comes to time make choices about the analytical methods being used in later phases.