Building scalable systems

With this article I want to shed more light on a vital aspect of any computer system: scalability. Why scalability is important? The answer is very simple – it gives the business which is based in or supported by the system freedom to grow. An unscalable system is like a tree with very weak roots – as the load on it grows it will eventually fall.

Before diving further into the topic let’s define the term “scalability” for a computing information system. 

I personally like this definition: scalability refers to a system’s ability to handle proportionally more load as more resources are added. Scalability of a system’s “information-exchange” infrastructure thus refers to the ability to take advantage of underlying hardware and networking resources, as well as the ability to support larger systems as more physical resources are added.

Here I need to mention that there are two types of scalability – horizontal and vertical, where vertical scalability means the ability to increase the capacity of existing computing unit hardware. This approach is limited and quickly becomes unacceptably expensive.

Instead horizontal scalability refers to a system’s ability to engage additional hardware computing units interconnected by a network.

But here is the catch: systems that are built using classic Object-Oriented methodologies and approaches for system software design which work superbly for local processing begin to break down in distributed or decentralized environments.

Why? Because a distributed computing environment brings a whole new class of challenges to the scene. 

Distributed systems must deal with partial failures, arising from failure of independent
components and/or communication links (in general the failure of a component is
indistinguishable from the failure of its connecting communication links). In such systems, there is no single point of resource allocation, resource consumption, synchronization, or failure recovery. Unlike local processes, a distributed system may simply not be in a consistent state after a failure. In the “fallacies of distributed computing” [Van Den Hoogen 2004], summarized below, the author captures the key assumptions that break down (but are nonetheless still often made by architects) when building distributed systems.

  • The network is reliable. 
  • Latency is zero. 
  • Bandwidth is infinite. 
  • The network is secure. 
  • Topology doesn’t change. 
  • There is one administrator. 
  • Transport cost is zero. 
  • The network is homogeneous (it’s doubtful that anyone today could believe this)

I prefer to treat this list not as a set of fallacies but as challenges a software architect has to meet to create a horizontally-scalable system. As an architect who has had a chance to work with large-scale systems, I can attest that if one attacks those challenges directly and adds code that resolves the issues one by one, the result is a heap of wiring code which has nothing to do with the business idea. And that code can easily become more complex than the system itself! Implementing communication transactions, zipping/encoding/decoding data, tracking state machines, supporting asynchronous communication, handling network failures, creating and maintaining environment configuration and update scripts, and so on… all this stuff evokes despondency when it comes to maintainability.

So – is there any good solution to make a system easily scalable?

Luckily, yes. In three words: data-oriented programming.

The main idea of data-oriented programming is exposing the data structure as the universal API between system parts and then defining the roles of those parts as “data producer” and “data consumer”. Now, in order to make such a system scalable we just need to decouple data producers from data consumers in location, space, platform, and multiplicity. Here the trusty old “publish/subscribe” pattern comes in handy.

Here’s how it generally works – a data producer declares the intent to produce data of a certain type (lets call it Topic-X) by creating a data writer for it; a data consumer registers interest in a topic by creating a data reader for it. The data bus in the middle manages these declarations, and automatically routes messages from the publisher to all subscribes interested in Topic X.

It’s time to draw a picture to illustrate how the classic client-server architecture would look had it been designed as data-centric system 

As you can see all system components are isolated and have no knowledge of each other. They only know the data structure or “topic” they can consume or produce.

And now imagine that the number of clients that wanted to consume information from our system increased so that our system could not resolve all the requests in time. – Let’s try to scale this system horizontally.

On the figure above you can see that I have increased number of business logic processor units. This is easily done because the system doesn’t care which computing unit will do the job and doesn’t even need to know that the units actually exist. Each system unit just waits for the data it can consume or publishes data it has declared. Also I’ve easily decoupled client input and client output, spreading the burden to different servers. Since only the number of clients that want to consume information from our system increased, we add more servers that will handle read requests. Also in order to avoid bottlenecks on DB access side I’ve decoupled DB writes and DB reads and allocated more computing power to the ‘read’ side. Of cause in reality those things are more complex,  but this figure shows basic principles of system scaling.  

There are several more important benefits of the data-oriented approach:
1) It’s easy to make system more reliable by adding redundant processing power. If one of
the business process units fail nothing critical will happen because other units of the same type continue to handle requests.
2) The system becomes more flexible – new functionality can be added on the fly by adding new data producers/consumers.
3) Maintainability goes to a whole new level since components are very well isolated one from another.
4) It’s easy to work on the system in parallel.

You can say that it’s all good but what should I do with my existing system?

Fortunately we can isolate all this data-centric publish/subscribe magic into a middleware layer that will handle all communications. And there are a wide variety of such solutions: 
http://en.wikipedia.org/wiki/Category:Message-oriented_middleware

What you need to do is define a system data model (most probably its entities will be very similar to the DB model you already have) and then create data readers/writers for each system component which will publish or consume data to/from the middleware.

In my opinion, most prominent and promising messaging solutions that support the publish/subscribe model are: 

1) http://kaazing.com/products/kaazing-websocket-gateway/ for web-based solutions

2) http://www.rti.com/products/index.html (or any other DDS implementation) for TCP/IP or in-memory real-time peer-to-peer communication. No brokers or servers in the middle. Instead leverage TCP/IP and IP multicast for real peer-to-peer message transportation.

But you are encouraged to conduct your own research. 

Practical hint: keep your messages small. Don’t try to push megabytes through your data bus in a single message. The data bus is a vital component and big messages can turn it to a bottleneck causing the whole system to struggle. If you need to transfer a significant amount of data from one system component to another, data producers should prepare and provide a link to those data, so that the data consumer can access them. 

Happy data-oriented programming! 

User manual for distributed software development Part 2

Loosely-coupled design to the rescue!

This is the time to ask your in-house project leader, “Is it possible to split system development into independent chunks that could be implemented in parallel?”

If the answer is anything but “yes” – it’s a cause for concern. “No” likely means that system components are very dependent on each other, thereby making the system tightly-coupled. And a tightly-coupled system is an unscalable, hardly maintainable, inflexible system.

One of the key factors driving this grim reality is that Object Oriented Programming is, by nature, tightly-coupled. To meet this problem, the software system architect (project leader) has first of all to employ loosely-coupled design techniques to achieve system scalability, maintainability, flexibility and testability. If this task is solved – incremental and independent development will come by itself.

The main point is this: a properly architected system consists of fairly separate and independent modules or classes that have little to no knowledge of each other. Given such an architecture, it becomes easy to split the work by components and avoid interference amongst team members.

An optimized distributed team development process can be boiled down to the following 5 points:

1) Define task, describe, discuss and estimate it

2) Define team (project) roles and agree on formal communication paths

3) Balance implementation efforts of one portion of the team with code reviews from the other

4) Demonstrate (ongoing) results to the project stakeholders

5) Retrospect and review: what went well, what went wrong, identify points for improvements.

And there are many smaller, but still important points that will enhance the remote team’s output:

* Trusted engineer is interested in remote team’s success

* Both sides understand and appreciate a transparent and tailorable development process

* Trusted engineer provides feedback to the remote team regularly

* Use technology to improve collaboration (screen sharing, video conferencing, etc.)

* Leaders of both teams meet in person to align their vision on project goals, create an achievement roadmap, and, ideally, build the project backlog together.

 If you decide to use an “external muscle” to strengthen your product development, don’t forget to ask the remote team for their “user manual” and development process before things get going. Then make the investment to move your system towards loosely-coupled design principles and practices. If these things are done right, the “trust gap” will be bridged very soon, typically in 5 to 10 sprints. And it will result in a pleasant sensation as you lay down to sleep each night, knowing that your project keeps growing and moving in the right direction while you are sleeping.

User manual for distributed software development Part 1

Having worked as an offshore software development team leader for ten years I’ve often seen the same situation arise when engaging with new clients, and it’s no different at Waverley. It goes like this: a company (client) decides to hire an outsourcing company to help their internal team with product implementation. As business terms are ironed out, the client’s internal team checks the technology knowledge of the offshore team and if everything seems alright they start working together.

Almost immediately the problem of trust arises. In the first stage of building the relationship there is no trust for the new offshore team. This is absolutely normal, it is a given, a matter of human nature. To fill this “trust gap”, the client often names a trusted engineer as the intermediary between his company and offshore team. Typically, this technical person is busy enough with tasks that pre-date the engagement of the offshore team, has little idea how to manage a remote team or how to set up a productive distributed development process. Moreover, these “management” activities are just boring for an engineer (having been an engineer myself I understand this perfectly). Now add 7-12 hours time difference between the client team and the offshore team and you have a perfect recipe for disaster.

The question is how does one make the “Business owner <-> Trusted engineer <-> Remote team” model work effectively?

The short answer is: with the trusted engineer you have to introduce an Agile development process and the entire team needs to embrace loosely-coupled system design.

 Now to make a short answer longer…

 When we buy something complex it typically comes with a user manual which explains how to use and troubleshoot it. And when you hire a remote team you are buying something complex. So you should check not just business terms, technical parameters and qualifications but also ask to see the offshore team’s “user manual”. Any remote team that’s been on the market for more than a couple of years has its “client interaction patterns”. Understanding those patters is a very good starting point for building a new relationship. The converse is also true!

Here are a few questions you might ask the remote team leader:

1) What will you do to build my confidence that you are going in the right direction and building the thing I need?

2) How can I know current status of the project at any given time?

3) How can I know what you are working on right now?

4) By what procedure will be manage system changes if (when) we decide to make them?

 I’m not going to write another SCRUM handbook! But from my experience on the offshore side of the equation I can say that having a “Vision & Scope” document, a product (user story) backlog, sprint planning meetings, sprint backlog, daily standups, and demo and retrospective meetings helps a lot to make the development process transparent and predictable.

So the first thing to do with a remote team is align around a transparent and tailorable development process. This is a must – without a development process things will fall apart very soon.

 Now imagine you have that user manual: you’ve agreed on a development process, you’ve created a “Vision & Scope” document where you’ve captured your goals and metrics to understand which goals have been achieved, and you and your off-shore team have started moving toward those goals.

Here a second problem arises: working on the same project requires a lot of communication amongst members of distributed teams. While there are strategies for organizing this communication there is also the question of how to work in a way that doesn’t require permanent communication. How, in the day-to-day, does a distributed team share a codebase in a way that does not have members block each other?

To be continued in Part 2

Effective Management – the Carrot or the Stick?

I’ve always believed that there are three vital components to the running a successful software team -obviously the talent of the developers is critical, but process and management are also essential. So what makes for effective management of a software team? There are many attributes, but here’s what I think is most important

  • Motivation. Although both “tough” and “kind” have their place, the carrot is more important than the stick. Creating challenges to motivate people and making sure those challenges result in positive team thinking is critical. I’d rather spend my time coming up with appropriate motivational challenges (thinking positively) than ranking people (thinking negatively).
  • Active listening. This means listening carefully and reflecting back what you hear in an empathetic manner so that the speaker feels understood. Everyone talks about it. Not many people do it.
  • Get out of people’s way. Why do you have team members in the first place? Because you can’t do it all yourself. So let your team members do what you hired them to once you set goals and talk about how to measure results.
  • Provide clear and consistent direction and goals. This seems rather obvious, but again, not many do it. Work on your team’s goals and communicate them, then constantly work towards achieving those goals with periodic reviews to make necessary changes.
  • Be excited. I think this is important. Just being excited about what is going on will help everyone perform better.
  • Turn disappointments into learning opportunities. When things don’t work out, turn the disappointment into a lesson learned and an opportunity for growth. Remember when one door closes, another opens.
  • Understand needs and feelings in yourself and others. Understanding your own needs and feelings will pay huge dividends in motivation and effectiveness. Always go back to the practice of what universal human needs and feelings are alive in you and your team, especially when things are tense or there is conflict.
  • Know your own weaknesses and work on them.
  • Leaders serve their teams. Being a leader means you are serving your team members and enabling them to do the best they can do. Do whatever it takes to make things work.
  • Clear and decisive, but caring too. Sometimes the most caring thing you can do is make hard decisions. Don’t prolong the agony and remember you can’t do everything – just make your decisions.
  • Use good tools, so the organization collects and refines its knowledge. All of the above are only as good as your methods to disseminate, store, and develop information and best practices within your team. So find tools that work for everyone and use them. Email is a start, but there are many more sophisticated and effective tools available today.

Google engineers not smarter than Vietnamese 11th graders

A recent article about computer science education in Vietnam caught my attention, having invested a lot of effort in the last year to ramp up our office in Ho Chi Minh City. In addition to Vietnam’s commitment to produce software developers with a high level of skill, I think a critical reason for sourcing developers in Vietnam is a cultural bias towards coming through for the team and doing what is needed to follow through on commitments. This is a great attribute: one that naturally fits with Waverley’s vision for doing business. My personal experience of Vietnam is that its young people (85 percent of the population is under 40) are friendly, motivated, and helpful. And the food is excellent! We look forward more great things to come from our office in Ho Chi Minh City and to having our Vietnamese colleagues contribute to our know-how and our client relationships.

Beauty or the Beast? Understanding Mobile Web and Native Application Development Tradeoffs

These days, when choosing a development strategy for your next mobile app, an essential question is whether to write it as a cross-platform hybrid mobile web app versus “going native”.

A hybrid mobile web app is an application written mostly in JavaScript/HTML5 and wrapped in a native shell using tools such as PhoneGap. A native app is written in a platform-specific programming language (Objective C for iOS, Java for Android, etc.) and is able to take full advantage of all device-specific features. There are also “pure” mobile web apps that run in a browser, but they are not really apps per se because they cannot be placed in platform stores such as Apple’s AppStore or Google’s Play.

There are many parameters to consider when deciding between hybrid and native app development. Many articles on the web provide “pros and cons” which aid analysis. But is there an easy way to understand the tradeoffs, as in the classic project management triangle? Here’s one: the choice is between a multiplatform solution, the beauty of an awesome user experience, and the beast: development cost.

Beauty or the Beast

Basically, when choosing how you will develop a mobile app, you have to settle for two out of three. Here’s how it works.

If you want a the most elegant and beautiful app that runs both on iOS and Android, be prepared to reach deep into your pocket. That’s because you’ll have to do native apps for each platform. You could probably design your app in a way that some code would be reusable, but the potential savings are quite limited. If you paid X dollars to design and develop your native app for one platform, be prepared to spend 70%-80% of X for each additional platform.

If you have a limited budget, and still want to reach the maximum possible group of users across multiple platforms, be prepared to sacrifice some of the slickness of the user experience. Why? With HTML5 and JavaScript, it only costs 15-30% extra to support each additional platform. You could even afford to include Windows Phone 8 which is gaining momentum, and not break the bank. But complex animations, scrollable lists of transparent images, certain background processes like always-on location tracking: some of that stuff is going to have stay home. Javascript doesn’t have what it takes to pull these off smoothly.

But if you are developing an enterprise mobile app, then you can usually do it in a cross-platform hybrid and do it cheaply. Your audience may not need a top-notch user experience. After years of working with your current enterprise application on Windows, designed in say the early 2000′s, will your users really be that demanding for UX? Even if you believe they will, do you have the budget to address this perceived need? If you do, we’d like to hear from you!

Don’t get us wrong – it’s certainly possible to develop slick, beautiful apps in HTML5/JavaScript as we know from experience. Been there, done that. But be careful – you need to know the pitfalls and limitations of the technology stack you are choosing. You need to know what can be done and what can’t be done. Or you need a developer that knows, and can bring that knowledge to your project. At Waverley, we love AngularJS because it allows us to build really slick JavaScript apps that function well on all major platforms. More on that in another post.

What is your experience with choosing an approach to mobile development? What route have you chosen, what were the tradeoffs you had to make? Did the approach you chose meet your expectations? Share your story!

Hacking “Made in China”

At Waverley, I’m the lucky person who receives auto-generated emails regarding anything to do with our Web server. One of the messages I receive most frequently is entitled “Large Number of Failed Login Attempts”. These emails contain the “offending” IP address, the account that was targeted, reverse DNS info, a timestamp, etc. A simple action I’ve taken with each of these is to block access from the specific IP address so that our server can’t be reached again from the same address.

Since I go through these personally, over time I’ve started to notice a trend: the country with the largest share of attempted hacks to Waverley’s server is China. This week, I decided to build a spreadsheet to take a closer look at the geographic distribution of hacking attempts. Since the first of the year, Waverley’s server has logged 307 failed logins. Of this total, 124 (40.4%) originated in China. The second biggest offender was the United States, with 40 attemps (13%). Rounding out the top five are Korea, Germany and Brazil .

Interestingly, it is reported that “President Obama will confront Chinese president Xi Jinping next week over a spate of cyber-attacks on the US, including the latest allegation that Chinese hackers gained access to more than two dozen of America’s most advanced weapons systems.” I have no idea if the hacking attempts on our servers originating from China are coordinated and run by the military or if the Chinese people just have a lot of time on their hands to break into computer systems.

China, the largest country has 19.1% of worlds by population, and India is at 17.1%. However comparing the two countries of similar size shows very different picture from the hacking perspective. As of May 30, 2013, China has 124 hacking attempts compared to India’s 7, almost 18 times the number as seen from China.

Sometime during the last few months, Waverley began limiting entire subnets from offending Chinese IP addresses, masking off the least significant byte of the IP address. It’s hard to say how much of a difference this has made, but we still get lots of failed logins from China. Interestingly, almost all the “Made in China” hacks are directed at the account “root”, whereas access attempts from other countries are slightly more likely to use people’s names.
Hacking Attempts by Country

Partner Domain Experience

While working with prospective clients I am often asked, “Do you have experience with our particular industry?” This question is a part of a client’s standard due diligence on which vendor to hire. While many software development companies such as ours do end up building experience in certain industry verticals, to really be an expert in a large number disciplines is nearly impossible. When searching for a development partner, looking only at providers that have experience in your domain seems foolish to me. You may be overlooking the relationships that could be most productive. Why?

At Waverley we believe that magic happens when buyers and sellers of a software development project integrate their knowledge to create combined know-how greater than the sum. Assuming the buyers, or clients, know their business well and that the sellers, or providers, know how to design, write and deliver great software, the magic happens when two parties figure out how to bring their experience and knowledge together to get the job done. They need to listen to one another and brainstorm to find the best possible approaches for solving problems, which technology to choose and what processes to employ in design and implementation. When a vendor has specific domain experience it can be reassuring for the client – but is domain experience really such a critical component of a successful project?

We feel there are several considerations that trump domain experience. Care, attention, intelligence and trust for starters. Granted, trust is something that results from successful outcomes so trust takes time. But the others are there from the very first phone call. Care and attention mean prompt and thoughtful responses: “digging down” to get to a deeper level of understanding. Intelligence means bringing smart thinking to a problem, sometimes by those who are looking at the problem in a new way, and are in a position to question assumptions, challenge supposed limitations and take a fresh look at how the problem is framed and then solved.

Naturally the process used to build the solution is important. Agile is great, we use it on every project, but Agile is only a process; it’s the people who implement the process that make it work. Creating an environment where the team works together and solves problems well is also crucial. Environment is perhaps an over-used term. At Waverley we believe it covers everything from the physical work environment to the tools used to communicate over distance and the quality and bandwidth of that communication, down to the accessibility and readability of documents used to share ideas.

So when shopping for a software developer, we suggest looking beyond domain experience. Otherwise you might miss the magic: a chance to build a great working relationship and the clean design, solid engineering and top results that come with it.

Notes from IAOP 2013 World Summit: Is Outsourcing Really Dead?

After attending thought provoking presentation “Is Outsourcing Dead?” by Cliff Justice, KPMG and Lee Coulter, Ascension Healthcare, I have some observations.  It looks like the whole dynamics of the industry is changing, and that this represents a fundamental shift in thinking rather than a temporary trend.

The leaders among business services organizations are embracing a new global delivery concept, which KPMG calls “Extended Global Enterprise” or “EGE”. EGE includes end-to-end processes, with internal resources and outsourced service providers working together towards the goal of delivering high-value services.

As a result of this paradigm shift, client expectations are changing. Service providers need to adjust or become irrelevant. Forget about transactional long-term mega-deals with the emphasis on back office and TCO metrics. Labor arbitrage is more and more out of the equation. Some argue that even location matters less. All of which begs the question of whether outsourcing is dead or is instead evolving into something broader and wider than the traditional ìmodel? I think that the latter is true.

The new EGE includes outsourced partners, not just vendors, where clear business and financial alignment are crucial, where value is expected at every customer touch point.  The focus is on smaller Agile partnerships that are “best of breed” and best of geographical locations, delivering solutions for the middle and front office.  Cloud, social, mobile, and big data change the rules of engagement fast. Amazon Web Services capitalized on these game changers and has emerged as the leading EGE.

We at Waverley believe that organizations should position themselves to benefit from the latest developments and the “new normal” dynamics of the industry. To us it means working smarter, not just faster and cheaper.

Declining Enthusiasm for Outsourced Agile?

VersionOne does a State of Agile Dev survey every year, which for 2012 they kicked off during the Agile 2012 conference in Dallas in August.  They recently released their results and it’s full, as always, of exciting findings pointing to the further increase in adoption and passion for Agile across the industry.  One statistic they revealed though definitely caught my attention.

According to the survey the support for Agile with outsourced projects has fallen significantly from last year. Only 49% of respondents said they use, or plan to use, Agile methods – down from 77%!  Given our success with Agile methods this trend surprises me. Unfortunately the results report doesn’t go into any detail about why people are struggling in this area, I would have loved to get into the details of this one.

We have found that Agile actually aids us in our success and could not imagine a better way to coordinate with our clients and make sure we are addressing their highest priority features and issues during development. Agile promotes close communication which is so vital in outsourced projects.  I can confirm that we are not part of that drop in outsourced Agile support!