Process-centric data integration

Process-centric data integration to break down silos & hack process automation

Below is a high-level summary of this discussion, followed by a transcript of the full discussion. Contributions have been anonymised (except with permission) and edited to adhere to our 'Chatham House Rule' policy.

Go to:

1. Key points

2. Transcript (edited & anonymised)

Key points

Supply chain best practice discussions cover many different areas but two recurrent themes tend to emerge regardless: one is ‘people’ which is short-hand for stakeholder buy-in, change management and process adherence. The other is ‘data’ which tends to mean the challenges related to disconnected data siloed in multiple systems and poor data quality.

We’ve had debates about the right balance between a data strategy that, in an ideal world, should be the foundation for subsequent investments in technologies and processes that rely on clean, integrated data versus an operationally-driven approach where more immediate needs drive workable - but imperfect - solutions.

We’ve noticed a pattern where supply chains that have quite high levels of planning process and technological maturity, including those that have recently invested in ‘next gen’ planning platforms, are working on data lakes (or layers) to extract maximum value from those next gen platforms as well as develop greater supply chain visibility and, ultimately, control tower capabilities to support more agile responses to disruptions.

In this discussion, we explored how members are approaching data lakes, the challenges being encountered and alternative / complementary approaches that may offer a happy/happier medium.

How are supply chains implementing data lakes?

Typically, data from multiple ERPs and other systems such as WMS, order management and, sometime, customer/partner systems is being consolidated into a data lake, often using Microsoft Azure with a business analytics tool like Power BI for the reports.

Data lakes can have multiple layers with different levels of data harmonisation and end user access. Some have started with an interim stage by creating an user interface which first collects the data before migrating it into the data lake. This means an improvement in speed for the end user whilst, at the same time, learning about the complexities of the data lake before putting in a structure solution as a final step. Without this iterative stage, the risk is that there is an ongoing loop of corrections to structured solution which often means starting over.

Data lake projects are generally driven by the IT-department with input from supply chain partners.

What are the common challenges related to data lakes?

Time: many data lake projects were started several years ago and, if at all, have only just begun to deliver tangible results. A large proportion of this time can be taken up with cleaning, structuring and stitching the data and managing bugs;
Latency: depending on how often the data lake is refreshed, there can be questions about how useful the reports are for operational purposes. One approach has been to only refresh data that has changed but users tend to want complete refreshes;
Integration: connectors that are supposed to facilitate connection to data lakes and other systems can be ‘black boxes’ which requires a lot of trial and error to get them working;
Volume of data: many businesses have large amounts of stock and related data and it is not always possible to migrate that, even in batches;
Report proliferation: different users want to run different reports and analyses which impose an overhead on the system;
External / partner data: beyond just obtaining the data, it is possible that the data has been manipulated in some way before being received so getting the core transactional data is important but can be challenging.

How else to automate supply chain processes?

A process-centric and trigger-driven approach avoids the data lake altogether by using triggers to initiate intelligent workflows that only take the data required from wherever it sits to automate the process. For example, if a daily task might be to check the top ten stock deviations each day, the information required might already be available but the challenge is engaging with the right people in fast enough time to address the deviations. This is one of a kind of common operational challenge which isn’t going to be fixed by a data lake.

Data lakes are ‘must haves’ for strategic analysis and data science but, unless there is some kind of predictive purpose (e.g. predictive maintenance) to the machine learning, the data doesn’t necessarily need to be real-time.

A process-centric approach can also support a data lake initiative because it helps end users to challenge and report data accuracy issues. Those working with the data in processes every day know what they need and are able to spot issues. Unfortunately, too often inaccurate data is not fixed because it’s too difficult and time-consuming for the users to those responsible for the data.

Process and trigger-driven workflows for specific contexts frame a continuous feedback loop where the data is being used in context day to day and there is a clear incentive to capture information and clean up data because it makes the process easier.

< Back to top

Monthly updates

How to join

Tell us which sessions you'd be interested in joining

Transcript (edited & anonymised)

…

We do a lot of these discussions, and many of them, particularly when it comes to discussions around next gen planning or advanced planning or control towers, often come back to the challenge of data residing in multiple legacy systems being a major challenge to be able to get the most value from investments or improvement programs in other domains of supply chain. Data as a foundation for other transformational innovation initiatives. In response to that, a lot of the people that have been on the calls, have been talking about projects to introduce a data layer which seems to be quite a common approach, which makes sense, but it's quite a large undertaking, as I'm sure some people will attest to quite shortly.

It got me thinking in terms of what other insights might there be? I had a conversation with J previously, and I'll ask J to say more a bit later on, but I think I can say he perhaps has a slightly different take on it, which I'll let him explain, as he's better qualified to do that than me. We'll just start by defining the scope of the challenge. I'll invite a few of you to just expand on the comments that some of you provided before going into more of an open discussion. I think, B, you are one of the people that inspired this session, because I know of your recent initiatives around planning, innovation and, as a consequence, focus on your data layer.

I wonder if you could perhaps summarise the challenges that you encountered and therefore what you're trying to do with your data layer now.

Sure…I could try. We operate across two SAP systems. Not a hugely complex landscape, but I'm sure, as everybody has, we've got lots and lots of reports that have been developed over the years. Some read directly from SAP, some were written previously to read from BW, and we've created a Hana database. These have got more and more complicated over the years. I'm not in the IS department, but I work closely with them. Last year they started a project to create an Azure data lake, which I thought, yeah, this is great, fantastic. I'm going to have all this data and I'm just going to be able to connect some Power BI reports to it or something and this is going to be brilliant.

That's already been going on for about a year, that project, and I was expecting that simple things like the order book and the transaction history and that kind of thing would just be quickly moved into that. With an Azure data lake, I thought I'd be able to just say, I've got part number here, I want to look at this customer over here and connect that with this plant or whatever, and I could produce some fantastic analytical reports. This just seems to have taken a massive amount of time to extract this data and normalise it. I'm not actually that clear how far they got. When I start asking questions about, say, the order book to create some summarised data about that so I can look at total shipments to customers and my on time delivery to those customers and how much do we sell at this part and that kind of thing.

I need to be able to preprocess the data and have good summaries that I could use in my reporting. As I say, that just seems to be taking longer than if I had just written a report straight out of SAP. What we tend to find is people around the business just carry on using downloads in Excel and combining data because it's actually quicker for them to do that than it is to wait for the business to develop this huge data lake. Although it sounds simple, it actually seems to be an incredibly complicated process to get up and running. Then you get into how often is it refreshed and if it's not refreshed regularly, can I actually produce any operational reports from it because I want to see what's happened this morning but the Azure data lake is only refreshed every day, then it's no use to me for operational reports.

You get into what's the load on our system of updating with some delta loads? People don't like delta loads, they want to do complete refreshes. Yeah, it just seems to be a very complicated process that we're going through as an organisation.

Thank you very much B. I can see plenty of nodding. H, what's resonating with you there?

Well, we're doing much the same. We've got SAP in the mix, we've got a WMS system, we've got a load of customer systems, our website, all different platforms. What we're trying to do is get rid of our SAP BW platform, which is our main reporting tool at the moment, to try and use our group data platform to replace that. We're getting similar challenges and I think a lot of the challenges we're getting with SAP are because the SAP BW has a SAP connector and that does a lot of the calculations for stock particularly. It's obviously all based on movements and I think for some movements it needs to create some counter two records. I don't understand why. There is apparently another connector that allows people to connect to data lakes and things. They purposefully built that and we're in the middle of testing that to see if it can solve two problems.

One, which is to create these counter two records that we've been trying to create manually. Because you can't get at all the SAP code behind it's all a black box, and you're having to guess where the counter two records exist and where they don't. If you get new movements that you develop in the future, you'd be missing those if you don't have the connectors. So that's one reason. The second reason is just for the sheer volume of data on the stock side, we've got to pull out a SAP. We can't do that because we've got so much history there, but I think the connector allows you to do it in batches that you can't do with our Azure lake at the moment. Those are the two challenges we're trying to overcome. Yes, it's taken a long time and we also have challenges on the business side where we're just trying to migrate across and we've got new people coming along and saying, oh, this would be great, I'd like to be able to do this bit of analysis. The business is requiring those pieces of analysis to have business cases behind them to show the benefits of creating that report before we've migrated the day to day stuff that you need to the hygiene factor. We've got a number of similar challenges that we're trying to work through now. So what should be simple isn't. I think I'll just share that your journey seems very similar to ours.

Thank you. H. If I could turn to you quickly, R, you're implementing a company wide data lake from your input. Are you having similar challenges? Do you experience anything else?

I think to a large extent it is similar. I think our system landscape is very fragmented. We go into double digit ERP systems and then add some other systems to that. We're also using a server as the cloud storage. We've been going at it since November 2019, more or less, so for quite a while as well. The outputs took a long while to get first reports out of them, let me put it that way. We've been struggling with the time to get things out as well. A couple of things that we learned along the way: for the initiatives which were successful, we focused first on creating an interim stage user interface and end user product using a server in between to first collect the data on and then later migrate that into the data lake.

We talk about it as different modes. Mode two would be that approach, and then later we go in with a mode one approach where we migrate everything back to the database and clean any technical depth that we've created on the go. That allows us at least to create speed for the user, learn around all the complexities with the data lake, and then put a structure solution in last, basically. We do both at the same time. I think that works a lot better for us to get the usable product to the planners or the users to begin with and then learn, as we do that. What does it actually mean to get the data and get the data standardised? Because what we also quite often found when we went in with the mode one to begin with, so the more structural approach created iterations of corrections.

Every time we start to look at it and harmonise it, we found some other data issues, some other exceptions, and we would need to go back time after time and repeat the whole story again, first more in an interim stage and then, later, cleaning it up seems to be more effective both from getting the outputs out of it, but also understanding what we actually need to do.

Are you creating a very structured database where it's very relational and you can make sure if you're picking a customer or a part or something that you can connect through and get other information? Or is it just lots of tables and the users have to know how to connect them, or is it very well connected internally?

It actually has three layers. We have what we call the L0 layer. That's more or less an unstructured data lake. The user would not have access to that. There's the L1 layer that's curated, but we've gone through the harmonisation…that is structured. Normally, the users could have access to it, but then normally they work from the L2 layer, which is the semantic layer where we put it in specific tables or specific cubes on which the reports run so that they have a dedicated table structure. From a user perspective, they would never get in touch with the L0 layer. They normally work from the L2 and they might get access to L1.

Okay, interesting. Thank you, R. K, you're also working with Azure, doing something similar. What has your experience been so far?

Yeah, great. Fantastic! We did it within a month! No, I’m joking…of course we didn't. We started designing it last year. We have five SAP systems and two non-SAP systems. We wanted to combine it all together. We started with an Azure data lake. We're building a XXX layer, just like R’s layer one on top of that. On top of that, there is a beautiful Power BI dashboard that our companies can connect to and have a great view over the supply chain. Now, we started designing this probably a year and a half ago, and today, actually, or maybe tomorrow, we're feeding it live data.

So this is a year after design. We're finally feeding it only stock inventory and availability data. There's nothing else…there's just the product and how much we have in stock of it where. This will be viewed within a Power BI dashboard and only for, I think, two parts of our business. We're taking it step by step, to say the least. It’s very much needed as well because our planning tool currently doesn't give us great visibility of the supply chain. We're really banking on this supply chain visibility tool to unlock a lot of benefits and to show where the exceptions are within the supply chain.

Just making a note to give you a call on Friday to see how well it's all gone! Thank you for that. I'm going to open up if anybody has any other reflections or particularly if anyone's taking a different approach to what's being described so far?

I have a question: the issue we're facing, in addition to all the data lake problems we just mentioned, is that from a transportation perspective, we are relying on external logistics providers and therefore we need to integrate their data to be able to have a full picture of our supply chain visibility.

I am currently in charge of that and I am using different tools like XXX to consolidate all the data and harmonise it in a manageable way. I would say I'm interested to understand if anyone here is facing the same issue of getting data from multiple sources, not only from SAP or other systems, but also from external sources and how this is done and how this is managed?

From our point of view, we have a similar challenge. We've got some external sources, but we're doing much the same, using XXX to schedule workflows. The bit we've got to be careful about is the data that they send in: I want to get to the root, core, the transactional data, rather than a table that might have been manipulated before it came in. I'd rather do that once we've got the data. So we're taking much the same approach. Ideally, what we want to do is to get those feeds directly into our data lake directly from the source solution, rather than it being as a table of data, but that would take a lot longer. Short term, to get the data, we're doing much the same.

Is it something you have in your plan to connect directly external data and did you receive positive feedback on this or is it just like in the plan?

It's in the plan, but we've acquired a number of businesses and we will be looking to put probably SAP as the transactional platform there. We're just trying to work out what data we need now to run the business operationally but it's going to take us longer to put SAP into those businesses. We'll probably end up doing a bit of both. We might just get a direct intake from their existing systems to start with before our internal SAP system then we place that and we'll get the usual feeds that we get coming from that. Short term, we're having to use XXX just so we've got that visibility so we can look at the whole supply chain.

Thanks. Okay, so J, I'd like to bring you in now, if that's okay. What do you make of what you've heard so far?

Well, it's fun because this is the same frustration I had years ago and I still have. So first, obviously I've got a company and we have software doing these things from another angle. I might be a bit biased in what I'm going to say, of course, but the point is exactly for, let's say, the slowness that I was frustrated with. Five years ago I used to work for P&G and then I did a bit of consulting and that led me to actually say, okay, this can't work. Everything you describe is far too long. I worked every day with developers and I know how hard they work but 80% of the time they spend is on cleaning, structuring, stitching the data and managing bugs. That will take a lot of time, obviously, and typically after maybe months or typically years, once it's ready, then you can start building your fancy reports.

Still, once you have your beautiful reports, nothing really happened day to day in the execution because you need the operational guide to be like, okay, this morning I'm going to connect…I received this notification I need to check and okay, what do I do with this data? I need to obviously decide what to do next and determine that this is important, this is not important. Let me send an email to XYZ people. Let me do XYZ. There ought to be a better way. Basically, the idea was, can we actually remove that big blocker in between, which is the data lake? Basically, if I figure out or simplify it and start with the processes? What needs to be done every day? Automate this and provide the data against these processes. Basically, if every day I need to check the top ten deviations in stock in each of my warehouses, why don't I just send this directly?

All the information that I need, for example, the right people given the roles in the company and maybe which account is actually having an order for the SKU that is at risk…all this information, we have. Why don't we start with this and just attach the data to the actual execution that needs doing rather than starting from the data lake? I think one challenge though is the data lake approach is great if you really want to do data science. When you think about long term strategic things okay, I need all of this data to be cleaned and aggregated and I can apply some fancy algorithms. But, for the day to day, like the top ten deviations, to take a very stupid example, the challenge might not be in knowing that there is a deviation or risk but that I need now to engage with eight people in an organisation within 2 hours to solve this issue. That kind of scenario is the true operational challenge and that's never going to be fixed by a data lake. That was the thing we've been wrestling with years ago and that's what we're working on now.

So, my point of view is, we put in a data lake which is a ‘must have’ for the data centre and the tactical and strategic analysis. In some ways, you don't really need real time reporting for this except if you do machine learning to have predictive maintenance obviously on shop floors and equipment that you really want. But, in supply chain planning, your design of your operation and all this you do a data lake to have real time data. Actually, I think there is another way to tackle the real time operational decision aid rather than going through a data lake. Anyway that's just what I'm putting out there and that's what I wanted to share.

Thanks for that J.

Hi everybody. Also our organisation is going through a transformation and we are currently looking also to build a customer cockpit following also the Microsoft Azure technology. What we also do in parallel is to bring visibility by using SAP IBP because on one hand we have an ERP system which is not state of the art anymore, but it's going to be heavily supported and even rolled out in other regions. We are using SAP APO for scheduling purposes and now we have SAP IBP, two platforms: one for the financial world and one for demand and supply. When I started a customer project a few years ago, we brought some data visibility into XXX and the first finding I had actually was, oh my goodness…the data is crap. Everybody I'm listening to says I can't use X report because the data is not accurate. I do my own reports and for me it's great to get an additional data lake and visibility but I think the root cause which we need to tackle is the processes.

In my view, it is fundamental for any organisation to understand that process adherence as well as data governance is key in order to get exactly the visibility we need. Because if everybody continues to say I don't trust the data, well, you don't actually trust your people who are sitting in front of the keyboard. And that's currently what I'm facing. What I'm also addressing is, in my role as a demand manager, I don't always see what I need. At the same time, I'm part of the transformation and bringing additional reports in place, connecting with the different stakeholders in different departments. Training, which is also very important because it's great to build reports, but if you don't bring it to the organisation and they don't know actually what they should do with it, there's no need to spend time building reports.

We always come down to the point that data accuracy is key, right? I think the ones who are, at least in our organisation, playing a very big role is customer service because they do all the orders and the order tracking, shipment, notification, invoicing etc. and those who develop items and are responsible for master data. For me, being typically German, that's 80%, 90% of things I'm looking after. For me it's a question of what we can do also to bring better clarity to teams working day to day and to say guys, don't fiddle around…even if you take a minute more, this helps you afterwards not to spend two or 3 hours cleaning up data. I hear what you said before. I feel that as well. I think we are going through the same thing, but it won't be an easy touch here. For me also bringing this up, not just down, that's really a challenge as well.

Thank you M. A, if you can hear me, you were nodding quite a lot when J was talking about the importance of the data lake for the strategic data science perspective. Are you able to share anything about what you're doing at XXX?

Yes, I think I have heard a couple of things that resonate with the journey we have been following. I think we have been embarking on this idea of having a data lake probably, I don't know, we are already three or four years into that space, it's already a big data lake. I don't come from the IT space, I come from the supply chain side so I think we came to the realisation that you need someone within the function that makes that link to that IT space in order for the function to really be able to make use of that. I think having said that, I liked a lot of what J said about the question about why we need this data lake in there and really understanding the reason behind it. It can help to bring this ideal of a single point of truth. In that process, I think we have come across challenges similar to what others were saying in terms of which are the processes that are needed in order to make this happen.

JP, thank you very much. A. P, it looks like you're taking perhaps a slightly different approach in that you mentioned an event based data flow to trigger subsequent process requirements. I wonder if you would be prepared just to tell us a bit more about that, please, P?

Yes, we also have initiatives ongoing. We've been building our own data lake and the challenges that we're facing are similar to the ones that have been flagged already. Also even our data lake is still connected to SAP BW. How we organise ourselves with setting up different landings in that data lake I think are the main questions that we have within our enterprise. The topic that my team is more focusing on is indeed more the event based data integration. From SAP or non SAP systems, we can capture any event where a change is applied and transfer that to another target system to be used for logistics planning perspectives. It's like transactional data from SAP, but it can also be like weather forecast data that can just change.

Once that event is captured, that can translate into the input of our planning system. That's where we're looking into at the moment. I think we have an extractor called XXX but it's limited to SAP. I do know that our company is also looking at other possibilities. I was recently also in contact with a company called XXX, which provides an event driven data architecture mesh structure, which was quite interesting to explore further on. So I think we're following both routes. One is the data lake where I think a lot of data analytics, machine learning and AI algorithms can be applied. Here I think the challenges that we're facing are similar, like how do we organise ourselves to capture that data and to ensure that we take the necessary information out of it as well to make conclusions. Another part is more the operational planning where event based information is more important as each event can have an impact on how we adjust our planning output.

Thank you P. J, is there a way in which the process based approach which you're talking about can help and contribute towards that strategic data lake objective?

I think one of the challenges with data lake is, as someone said, you've got these IT people that are not, let's say, supply chain people or business people and they're doing this hard work from a technical angle. You need this constant collaboration with supply chain and IT to get it right. If you don't invest enough resources, you end up probably spending 90% of the time on stitching data that actually cannot be used or won't be used every day and won't really make a difference. Because if it's just data that you need for tactical strategic purposes to do data science, in a way, you could just prepare the data as a one off and you do that every six months and off you go. Starting with the process and maybe, P, this is what you're seeing…is if you start from the processes and you stitch data, you provide the data that you need for this exact process.

For example, managing an order and different elements that we need to do, say, when an order is late or when an order is a payment block or whatever is what you can be interested in. You can trigger, obviously, intelligent workflows that provide for that one order date from 20 different systems. I'm only going to get the data that I'm interested in, because the operational guys, people doing the day to day work, know what data they need, because they extract it on a daily basis. When they're on holidays or they're too busy, these things are not done. Basically, if you start with the process, you can stitch only the data that you need.

If you realise that the data is not clean enough, then you can actually get feedback from this specific context and figure out another process which is going to the master data team. Basically you've got this continuous loop where you use the data in the context day to day, because you start from the processes, therefore you can capture information so that you have this virtual loop to go and clean the data lake or clean stuff where it really matters, but it starts from the people who really use it. That's just one way that this can contribute to the data lake.

Things will happen every day without people having to force themselves to go into the right views that they know how to navigate. With a process centric approach, you can as well say, hey, no, actually, I just need these three systems, and you do just one API or one connection with these systems to get only the data that you need for the specific issue you have today.

Going back to the issue of trust. You're saying that when people are doing the processes that they need because that's what their focus is, and they need that to be able to do their job, there's a greater degree of trust around the data that comes through taking that approach, and that can contribute to building trust in a data layer approach as well. Is that what you're saying about how to tackle that issue of trusting the data?

Yeah, I think people, day to day, they know that the issue. You extract this Excel file, or you take three extracts, you do your VLOOKUPs and all of your work every day. You can see this data is wrong, but because you're so busy firefighting you don't have time to find out who to send that email to and tell them to fix that. One reason why this data is never cleaned up properly. All these things people know day to day in your teams, they know every day what's going on. The issue is all this knowledge doesn't typically travel to the guy in charge of doing the data lake, since it's not how they prioritise the data lake approach. This is why, therefore, you ask for something and then it takes two years to get it.

There are many ways you can tackle that. I've been doing the operational work myself, and I knew all the issues with my own data on a day to day basis, but actually, I never had a channel to tell people what was wrong with this data and I didn't have time to be honest. So, yeah, I think this is why I strongly believe in the process and trigger approach, where you get only the advice you need to do your one thing by exception or the one thing you need to do today, and that data that you have for this context, you can flag those data, if it's terrible.

Yeah, I just wanted to comment on that because I also come more from an operational background, but then I transitioned more to the technical side. I understand what you're saying and I think that's also one of the reasons why from our perspective at least, like the data lake is really good, but the data lake is really good for analytical work. It's not designed, or at least within our enterprise, it's not designed to support operational processes. Because the data lake, as per definition, at least within our context, takes snapshots either on a daily basis, maybe in some cases multiple times per day, but from an operational perspective, at least within logistics, those snapshots are not frequent enough to make decisions on. That's why for the operational processes we have a parallel route that looks at event driven data flow. Here we are still looking at what architecture and how is this supposed to look like for the future.

That's also why I was interested in the discussion here. Because I think the data lake, yes, it's going to add value within our processes more from an analytical perspective, analysing what went wrong and how can we then improve our mid to long term plan whereas event driven based is more okay, how do we fix the short term issues in our different operational cycles?

I was just going to add some thoughts, if I may. For me it feels like there's a number of different projects we're talking about here. The data lake is about presenting data and for people to do analytics. You have specific people who are experts in that. Master data or transactional data…I see this process impacting the transactional data but they're sort of separate projects. You have to identify accountabilities and who owns that data. We're putting in a PIM system as well for the master data which allows you to create rules and exceptions to see where there are gaps and allow you to not put that new product into our system unless all the component parts are completed to make sure it's accurate. And that's a separate project. The transactional stuff, I think, once you've got the data, if there's some hypothesis store stock for us, we see that when there's a stock take, we can see the drop, so we can understand and measure the availability drop so we can get some insight as to how accurate that data is and try and look at all the root causes.

That's a separate product project for the operational team or whoever in the business to go and make right, not the data platform team…they're the ones that are going to create the report. The report is only going to be as good as the data. Again, we wouldn't see it as operational, we would see it as strategic or trying to as an opportunity to bring multiple companies together. You got the data for the supply chain all in one place and then to be able to then do some of that high level planning, strategic planning, as you've described, the operational data should come out of the operational platform. If you've got an F&R tool, that should be where you're getting your exceptions from a day to day point of view, and then just rely on your group data platform to give you that insight. It feels to me like there are multiple projects that all need to come together and different business owners need to own those parts, because I think the risk is you try and do all of those and you end up failing at doing all of those.

Yeah, but what if your F&R tool won't be able to give you those exceptions because they don't provide them on that level? You have to find a way of adding an additional exception layer with that control tower and that's when it becomes a control tower, I guess, and your functionality becomes more operational.

Overnight data for some exceptions should be good enough. I think if it's something where you need to know if, for example, you're a warehouse operation and you need to know how the shift earlier in the day performed, then that's got to come from your warehouse management system as an exception. I think if you're looking for patterns and trends and heat maps and things like that should come from your group data platform. It's an opportunity also to mix transactional order stock data with your customer data and that's where you're going to get most benefit out of a group data platform, not necessarily for operational exceptions, which should come from the transactional solutions. That would be my belief and that's what we would work towards.

I think that is the ideal situation, isn't it? At the end of the day, your data display layer is just display. You can get all the great amazing exceptions that you want but it doesn't pull back to the system that we're talking about. I feel like that's one of the challenges that we will have is that some of the directors need to understand that it's just a visualisation tool.

I think, again, from a data scientist point of view, if you want, you could build in the group data platform some of the brains are forecasting and replenishment into that? Or should you go and buy an off-the-shelf platform where those companies have got teams of data scientists and have spent years investing in all the algorithms and slow moving demand, how to get that right and things like that? Would you take that expertise or try and do it yourself? It becomes an expensive task because good data scientists are hard to come by and very expensive, but a necessity if you want to embark on a project like that. If you just want to pull data together and draw some basic analysis, you need a reasonable data scientist to pull that data together, I'd suggest.

If we're just honing in on the supply chain, you need experts on a supply chain level as well as on the IT level, as well as on supply and purchasing. So, yeah, I can definitely see quite a big challenge in front of us around that topic.

Thank you, guys. So, just entering the last few minutes. Please, if anyone's got any other questions, comments, reflections, please add them.

I've got just one question. I think clearly you've got a good plan, H. I think people talk about visibility, but actually there are two kinds of things behind the scenes, so we need to be careful about the words we use and maybe educate stakeholders about what's the difference. Data lakes is one way, so we have control over data but what about actually interacting with this data? It's almost after visibility, you have the automation, you can call it RPA or things where actually I can take a decision from that control tower and that may write back into three different systems information so that I only have one interface going forward when I take decisions. Is this part of it when you say ‘data lake’? Are you also considering these two-way integrations or not?

Not at the moment, no, because I think most of those we have developed an integration layer so that when you do other projects, if you want that data, you don't have to go to the individual systems to get it. That will come out of your integration layer. That's the link, I guess, with the group data platform we would build in..

Your integration layer should filter the data going to whatever system. The group data platform is just another system you want that data to go to, and it depends on what that project and what other uses for that data is required. If it's another transactional system, it would go there, but it's the same data coming from a source system over here rather than it necessarily coming out of the group data platform. But it'd be via that integration layer.

Yeah, in our case, it would not only be an integration layer, but we would also use the data lake layer as the source tool for the data. We're actually moving the source of data out of the ERP layers into the data layer and start using those RPAs, machine learning AI technologies to create and maintain the data. Not only on the data, also for other purposes, but the centre of gravity for data actually moves away from the ERPs, in our case, into that data layer.

Okay. Thank you very much for joining and for contributing, especially J. Thank you for being able to share your experience and your approach.

< Back to top

Top SC Planning Best Practices...Digested

Real lessons learned distilled from a series of practitioner exchanges on...

Digital Transformation of Supply Chain Planning

Digital transformation has been on corporate agendas for some time already but, borne of necessity, tangible progress accelerated significantly since 2020. A silver lining of the pandemic may be that the business case is clearer so the focus has shifted towards implementing and scaling digitalisation initiatives.

Supply Chain Design Modelling and Analytics

Supply chain network (re)design used to come around every few years or so, often based on quite a high level view of customer segments and associated costs. Now, competitiveness and even business continuity relies on the capability to dynamically adapt network flows based on a much more granular understanding of cost drivers.

Volatility, Agility & Resilience: Next-Level Planning

Volatility, uncertainty, complexity and ambiguity was already an increasing factor before the pandemic but, now, it is clear that we need the next level of demand forecasting, sensing, planning and execution.