Ok guys, here’s my first blog and I hope it’s a good start. Happy to get your feedback whether it be positive or constructive, however my objective here is to share my ideas and communicate with you regardless of where you are located. The topic is about analytics and data story telling.
Like any data analyst/Business Intelligence analyst, the encounter of “Microsoft Excel” in their daily work life is a necessity. I bet if you ask anyone in your team or yourself to provide a report or narrate a number story, the work would commence from excel. Well, that doesn’t mean you are doing it wrong, but that’s how scalable and user friendly Microsoft excel is isn’t it? Like any other toolset of an analyst, one tool doesn’t solve all your problems. People often have the perspective that a single tool could solve all data issues. It’s like you expect a 1.6L car engine to perform like a Maserati. Highly unrealistic isn’t it? So here’s a common list I compiled which I thought be useful. I have done my best to make it simple and understandable for anyone who is not just a data science fanatic but for all those who wish to dive into data science,, or even begin as an analyst I certainly would recommend this to be encouraged and shared across platforms, networks and forums in order to drive a data driven landscape in your job.
I believe many companies don’t have the data that we keep talking about. Remember the buzz word, “Big data”? It so happens that that companies do not understand what data really is. When we talk about surveys random sampling, stratified sampling etc.), it is important to note that surveys are purely primary research and opinion related pieces. I wouldn’t count that as real data when it comes to analyzing and coming up with data driven insights and decisions every company today, has its data layer in the cloud, separate from the application… Everybody you talk to, a large aircraft manufacturer to a retailer, is creating a data lake or a data pond, or some form of the ocean, some water body, but there’s data in it. And, and what’s happening out there is the intent of these data lakes is “let’s bring all pieces of all variables together, and then let’s figure out insights to drive our business”.
1) When it comes to that insights layer, very few companies have really figured out how to mine this data. It’s not about putting data in Enterprise ERP solutions and then accessing it. It’s about actually looking at the business problem and framing the right business question. And yes, it has to be a smart one because, it is only then that a solution can be framed correctly. And I’m putting a lot of emphasis on this, because 95% of the projects that I come across by reading case studies, failed because the problem is framed incorrectly. It’s like you have framed the planning perfect in theory but the practical application of that theory fails miserably; or there have been scenarios of wanting to go in for the BIG WIN rather than just picking the low lying fruits for the quick wins.
And then thinking it’s not about “Do I have all the data in one place or not?” It is about, “Do I have the right data?” Well personally, to carry out my analysis, I always remind myself that – I don’t need big data, I’m okay with small data, but do I have the right data? And if not what algorithmic techniques am I going to use to actually analyze that low density data or figure out where the holes are in the data and fill up those holes. The Second thing is, let me put some sort of a MVP (Minimum Viable Product) out there and then collect more data and create more hooks for that data.
2) Another very important piece that most companies miss out is the interaction data. Think about an autonomous car. If it was put out on the street, the amount of data that you could have would be a third of what we have today. After putting that on the street, you could start gathering more data and use that to make the drivability of that car a lot more precise. And same thing applies to even your applications. You don’t have to wait until you have everything. Remember, you were at zero… so any progress even if it is 50%, it’s pretty awesome. If it’s 50%, then gather more data and start enriching it.
3) Most of the times people think that AI is technology. AI is not technology. AI is math. It’s a different way of doing math. I am still in the process of understanding the deep learning aspect of AI and in an exploration mode if you do that math wrong, you got the business wrong. So the key is that the business has to own it. And that is rule number one.
4) Organization becomes a big pillar for collaboration with technology, there because they are going to provide you access to the data, etc. But then also one would say – well do I need tons and tons of data scientists or, AI experts to analyze this data? I would say you don’t need tons actually, such as core data analyst and AI experts. They key term you would want to look at in an employee for producing enriched analytics is having scalable skills.
5) The key is that once you start solving one problem, think about creating the data pipeline. And what I mean by pipeline is, let’s say that you’re analyzing an invoice. You’re analyzing to match quantity and price. There are 17 other elements on that invoice: timestamps, the date stamp, location, carrier, the trade lane, etc. But whatever it is, you should capture all that data and that is your data pipeline. The next time when you want to analyze and come up with, say, a logistical planning problem, you have all these data sets. In addition it’s all about models, and you should maintain a library of all those models. So the next time you’re not reinventing the wheel-scalability is what you seek to have.
6) If you cannot, and do not have the ability to put all these algorithms in production, you will never be able to succeed. So it’s very important that whenever you plan for solving these different business problems, you should think about how are you going to put it in production because it may require different architecture, it may require an entirely different way of planning feeding the data and taking the insights out or it may require actually a very different way of adopting it from the process standpoint.
7) Last but not the least, keep it very simple, understandable and user –friendly. Do not try to capture everything when you already have the low lying fruits that could be your quick wins. The fact is technology is only an enabler, and human intellect will always drive the force. So a machine can only do what you ask it to do but it’s only you who would be on the driving seat.
Food for thought – Do you require more data scientists/analysts who do all the technical work for you or people who could interpret or narrate story? I fear we may have a situation of an unbalanced see-saw where we churn out a lot of data analysts whereas individuals who are interpreting results may be scarce. Something that I would think of in my next article write up. Your views?