Now I am reading Caitlin Sadowski, Thomas Zimmermann "Rethinking Productivity in Software Engineering", Apress, 2019.
- 発売日: 2019/05/08
- メディア: ペーパーバック
Now I have read through Part I: Introduction to Productivity. This is a memorandum for what I thought when I read this book.
Background of their story
This book doesn't mention why do we measure productivity. The answer to this question will be different according to the business style and the application style.
For example, you will want to measure the productivity of companies when you are going to procure a custom-made one-off enterprise business application for a specific customer. You will contract with one of those companies with the fixed firm price *1. In this situation, you will use the measured productivity to select the most cost-effective company, because compressing the cost is the only way to increase profit. So the metrics of productivity should be comparable between the companies *2.
You will also want to measure the productivity of the software developers in your company, to compress the cost of development of a web service that your company is offering. But in this situation, you can also increase your profit by getting more customers as well as compressing development costs. So you can choose to not measure the productivity of your development team and to measure the cost-effectiveness of your sales team. If you prefer to measure productivity, you can use the velocity of the development team because you don’t have to compare the productivity of two or more teams. Also, it is relatively difficult to move the development activities of this kind of software to other teams, so it is reasonable to choose the loyalties of the developers rather than the productivity.
The Part I (chapter 1-3) of this book looks to assume you are developing the latter one. If you are developing the former one, you will have to create your own way to measure the productivity of developers by yourself.
I am working on a company developing the former one. To adopt their proposals, we will have to answer many questions from the executives who used to use a single metric (or set of standard metrics) rather than a set of custom metrics to measure productivity. They have many successful experiences about using those metrics, so I think it is tough to break their beliefs.
I would like to list the questions what we have to answer in following sections.
1. The Mythical 10x Programmer
- "Do your best not to have very weak engineers on your team at all". How do you get rid of very weak engineers from your team? We are not likely to be able to fire the developers due to their performance, at least in Japanese traditional culture*3. What can we do is just assigning another task to such kinds of developers. Commonly, finding the task suitable for each employee is considered as the duty of their employers or managers. Actually, less-skilled engineers are often assigned to tasks like just typing code following the "detailed design" *4, testing programs manually following the checklists that other developers wrote *5, or the tasks less related to programming skills. But this kind of tasks are limited.
2. No Single Metric Captures Productivity
- "The management stopped asking his(Bill Atkinson’s) LOC". How about other developers?
- "... measuring productivity by LOC is clearly fraught". Even nowadays, measuring productivity based on LOC is a major method in the enterprise application development in Japan *6. Because managers think almost ordinary (average skilled) developers are not like Bill Atkinson, and they don’t have skills to implement functional/non-functional requirements with code deletion (it is true more or less). Some of those developers don’t want to be high-skilled developers (I’ve heard one of them said "I don’t want to design something. I just want to write code in peace."). Should they stop to measure productivity based on LOC? Or, LOC is still an appropriate productivity index for ordinary developers?
- In the enterprise application software development, architectural design and the business logic design are easily separated. In this situation, the code deletion is not likely caused by business logic design. It is not appropriate to measure the productivity of architectural design based on LOC, but how about the business logic design?
- "developers do not like having metrics focused on identifying the productivity of individual engineers". What kind of preconditions are there? If the workers union has strong power (so they are not likely fired due to their performance, and they don’t worry about keeping their jobs), and the individual developers don’t have to worry about their performance grading (because business performance evaluation is based on what their team, not the individuals, achieved), and developers just want to their aptitude for software development (because they think that finding the task suitable for each employee is the duty of their employers), the developers do not like such kind of metrics even in that situation?
- Some developers don’t worry about privacy issues through their performance. They even think it is good for performance to sign their name to their deliverable, because think about their responsibility through signing. What is the difference of those developers and Googlers?
- Is it valuable to make effective countermeasure to unnecessary code? There are some trials to find "clone codes" from codebase: CCFinder *7.
- "managers (and peers) frequently already know who the low performers are", but don't you have to show their performance quantitively and compare it to other developers one, when you move them to another team?
- "the type of productivity metric used for these scenarios is different". Of course, you can make custom metrics suitable for your needs, but what if all stakeholders have a common understanding like "The LOC is the easiest and widely accepted way to measure productivity"? You will want to use LOC for the metric because you can easily explain the productivity to the stakeholders. It is likely situations in some companies.
- "Developers engage in a variety of other development tasks beyond just writing code". It is the reason why developers record, classify and report their workload according to the kind of tasks. They can remove most of those noises from their performance measurement. Why did they say it is a problem for measuring productivity?
- The Japanese System Integrators (Slers) evaluate the quality of their deliverable as well as productivity. The product owner can select these developers according to some common metrics, like defect density and LOC. In that situation, what is the problem to measure productivity based on a single standard metric?
- Confounding factors: Some companies define their standard development language and framework (e.g. terasoluna) to improve the flexibility of the human resources movement. Some organizations also define their standard framework and methodology (e.g. AIST Framework) to avoid vendor lock-in (and it means they can move to another more productive vendor). In that situation, language, tools, libraries, culture, and types of projects *8 can be fixed. Is it difficult and not appropriate to use a single metric (or some standard metrics) even in that situation?
- To make noise in the metrics as small as possible, the project managers and architects often write project (or sometime company-wide) standards that define the target code coverage, what kind of functional/non-functional requirements covered by the tests, and what kind of bugs should be reported. How do they work to improve the reliability of metrics? It is not possible to improve the reliability of a single metric and rely on it for a single purpose?
- The workload per function point seems to be a standard metric because it defines the investment value of software. In Japan, the software developed in companies is treated as an intangible fixed asset, so there is an incentive to invest to implement a business function as small as possible.
3. Why We Should Not Measure Productivity
- Measuring productivity can warp incentives *9. On the other hand, managers can prevent warp those measurements. You can use code formatters before you measure LOC to erase unnecessary CR/LF. You can also define another metric like checklist density (items/LOC) to require additional unnecessary workload for unnecessary code. Isn’t it enough for such kind of problem?
- Monitoring can unintentionally reduce the productivity of high-performance developers. On the other hand, can this monitoring be a help for low-performance developers? They can require more specific assistance according to their monitored behavior. From the perspective of six-sigma, it is a desirable change for a development team.
- It is more likely that managers introduce a practice (or tool) that seems effective to improve productivity, and if a metric shows it does not affect, the managers will use another metric that proves their assumptions, or just move to the next practice. Whichever the result is positive or negative, they will eventually introduce that practice. The adoption of practices or tools seems to be driven by more specific needs, for example, reducing workload for degradation prevention.
- Managers likely decide to adopt a practice (or tool) according to decision methodology like the Kepner-Trigoe method rather than "Richer data" that is hard to get. Should we stop such kind of decision methods?
- Managers can report what was slowing developers down as your subjective report, but the report should be based on the data from their team. The report not based on the numbers could be ignored by their bosses, or leads a worse evaluation for the manager. At least, they have to show how does their performance improved by the adopted practice/tool with the numbers, due to the perspective of investment for education of the practice or the tool.
- How can we connect qualitative data from developers and quantitative data about productivity? Can all the developers provide qualitative data related to productivity? For this purpose, the qualitative data should explain the quantitative data. We usually use qualitative data to explain productivity reductions. This strategy looks work fine, because it doesn’t require the ability of self-explanation to the developers, and it gives a chance to excuse for the reduction.
- How to observe productivity without measuring productivity in a quantitive way?
*1:In this kind of application, the application architecture and business logic are likely to loosely coupled, so you can relatively easily choose or change the development companies.
*2:It means you cannot use velocity as the metrics.
*3:The right of dismissal is very limited in the Labor Laws of Japan. On the other hand, The right of redeployment is not limited: see https://www.slideshare.net/TokorotenNakayama/dxdevopstechlive (Japanese) p.62. Consequently, sometimes the redeployment is used as a tool to get rid of a developer, but it is nothing more than run-around.
*4:sometimes they called "key-punchers"
*5:sometimes they called "tester"
*6:Of course they use LOC along with defect rate or checklist density to measure quality.
*7:Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue, "CCFinder: A Multi-Linguistic Token-based Code Clone Detection System for Large Scale Source Code," IEEE Trans. Software Engineering, vol. 28, no. 7, pp. 654-670, (2002-7).
*8:AIST Framework target to the enterprise application in local municipalities.