Square’s App Visibility Principles

Square’s App Visibility Principles

[[147070]]

background

Over the past five years, our site has evolved significantly. Our technical community has grown from a monolithic application processing community to a microservices architecture community. The changes and growth of our services have brought new challenges to application visibility. In today's blog post, we will provide some guiding principles and show the technology we use to detect and visualize our service ecosystem. Starting today, we will open source various parts of our service monitoring and visualization technology!

in principle

Some guiding principles are as follows:

  • Pay attention to ease of use as early as possible. With the microservice architecture, it is very easy to collect a large number of signals. A good user interface must be able to extract information from the signals.

  • Identify and present the most important aspects of the metric. We all agree that humans can only effectively handle a few tasks at a time. Therefore, any question about the metric should be answered in a limited number of times. For example:

    • Top-N API indicators are categorized by underlying factors or changed weekly.

    • Automatic problem detection. The inspect tool can reveal obvious system problems.

  • Applications should have good metrics by default from the start. We ensure good monitoring in our standard application, and our dashboard includes the following metrics:

    • database

    • Hosts and containers

    • Performance metrics for HTTP/REST endpoints

    • VM data used by the JVM to run service components

  • Warnings should be concise and relevant. We monitor a large number of indicators when warnings occur, with the goal of improving the performance of use and avoiding the lack of corresponding measures for warnings.

    • Warnings should be prompt and responded to immediately.

    • Warnings should be considered unusual events when they occur.

    • All warnings should require AI processing.

    • All warnings should be reproducible.

application

Under the above principles, the applications we use on the Square website are:

  • Appdash. Use this app to quickly get information about your app, including:

    • Operation information, such as which hosts are running, what has been released, etc. can be obtained

    • Application-dependent geometry

    • Events and exceptions from your application

    • Capacity Modeling

  • MetricsDashboard. Using this application, you can view the metrics of all platforms and applications. Below is an example of a dashboard in the metricsdashboard UI database.

  • Presidio. It is a log search application based on Elasticsearch. It provides an interface for application developers to easily find patterns that may cause errors, or help developers track an event in multiple services.

  • Equilibrium. It is our next generation alerting system and is rapidly replacing the Nagios infrastructure. Equilibrium is easy to use, has better reliability and balance. It was influenced by our experience using Nagios and working for other companies, and is in line with the current open source trend.

Now, we open source a seemingly small but very important project in the system: inspect. Inspect is a collection of libraries that we use to collect Linux, MySQL, and PostgreSQL metrics. The project also provides Linux command-line tools that can perform basic problem detection.

Conclusion

We hope that inspect was helpful, and that this blog post gave you a good understanding of the monitoring and alerting systems we use at Square. We will go into more detail about each system in subsequent blog posts. As always, please check back at https://corner.squareup.com/***Updates

<<:  It’s that simple! 5 simple principles to help you master the user experience design of your product

>>:  The future of virtual reality: from game accessories to control devices

Recommend

iOS 9 Human Interface Guidelines: UI Design Basics

1.1 Designing for iOS iOS embodies the following ...

The support behind the safety and reliability of new energy vehicles

Despite the decline in subsidies, the investment ...

10 UI/UX Lessons for Designing a Product from Scratch

[[338502]] In 2016, I realized something very imp...

How to improve ROI? Internet marketing combination methodology!

This article will focus on the general Internet m...

TCL "Happy Play" mobile phone review: a new choice for beginners

The "LeWan" mobile phone is a high-cost-...

Dog days: The sun is smaller, but the weather is hotter

As the saying goes, "train hard in the colde...

How to set keywords for iOS APP titles and find the right length and keywords

Today’s article is about the length of iOS APP na...

Frequent early awakening may be a disease. What time is considered "early"?

Have you ever had this experience: you woke up be...