As part of Agworld’s development team, I work on our products everyday to continuously make them quicker, better and more user friendly for all Agworld customers. It’s only the results that our team delivers that count, and most people don’t see what goes on behind the scenes.
To give you an insight into how we work and why Agworld employs a high-quality in-house development team, I’d like to show you a real situation that recently occured. This ‘incident’ did not become an issue for our clients as we were able to deploy a fix before real problems arose across our customer base. We are always aiming to be out in front of any problems that may prevent us from maintaining our ‘up-time’ of 99.9%.
The screenshots you see below are from a product called Datadog, which allows us to constantly monitor all critical functions of our systems, and respond where necessary. The communication below between our team below all happens via Slack, the messaging app we use internally.
Some of the language might be a bit technical for non-developers, but I think you’ll find it an interesting overview nonetheless.
An incident occurs
A new version of our iPad and iPhone apps was released on Thursday June 24th, which enabled additional weather recording features on Agworld actuals. After a user updates the app on their device to the latest version, all actuals have to sync for this new feature to work. A problem occurred when another unrelated fix on our main web product slowed down the fetching of those actuals.
Alerts fire and the team springs into action
DevOps Engineer, New Zealand, 9.35am:
“I’m seeing alerts around high database CPU usage today.
Head of Engineering, Perth, WA, 9.40am:
“iOS tech lead says the new version of the iOS app has been released and requires a resync of the number of activities. This is likely impacting CPU usage.”
Senior Developer, Perth, WA, 9.45am:
“We’re investigating now.”
Head of Engineering, 10.00am:
“The increased load on the database is coming from checking the associated messages table. Is it possible some recent work has introduced a slowdown there? It looks like the change to weather observations has required more actuals to be synced and this causes the extra messages’ work to be done.”
Developer, Perth, WA, 10.15am:
“The call would be from the activity serializer. It looks as though we’ve missed an index there.”
Senior Developer, 10.40am:
“I’m on it, almost done.”
Senior Developer, 11.00am :
“Work complete and fix is now getting deployed to our test servers.”
Senior Developer, 11.50am :
“Testing is complete, we’re rolling the change into the deploy queue.”
Senior Developer, 12.50pm:
“Latency has dropped back down to normal, we can close the book on this issue.”
The response to this incident really shows the power of good analysis tools, like Datadog, and the advantage of doing all our technical work in-house. If there is a problem we can fix it, we don’t rely on any third parties to help us out. Our monitoring and alerting keeps us ahead of problems and lets us respond quickly.
I’m sure that growers who read the story above will be able to draw parallels with their latest John Deere tractors, where the local dealership can dial into the digital systems of the tractor remotely to help diagnose and fix issues before they become bigger problems. We all know it’s better to find an issue with your engine through digital diagnostics rather than wait until you see a piston sticking out the side of the hood! In the same way, behind the scenes we’re constantly monitoring the engines at Agworld to make sure everything’s running smoothly.