Code Review Metrics

OWASP Code Review Guide Table of Contents

Introduction
There are two distinct classes of software metrics: Relative and Absolute. Are numerical values that describe a trait of the code such as the number of references to a particular variable in an application, or the number of lines of code (LOC).Absolute metrics, such as the number of lines of code, do not involve subjective context but are material fact. Are a representation of an attribute that cannot be directly measured and are subjective and reliant on context of where the metric was derived. There is no definitive way to measure such an attribute. Multiple variables are factored into an estimation of the degree of testing difficulty, and any numeric representation or rating is only an approximation and is subjective.
 * Absolute metrics
 * Relative metrics

Some Metric benefits
The objective of code review is to detect development errors which may cause vulnerabilities and hence give rise to an exploit. Code review can also be used to measure the progress of a development team in their practice of secure application development. It can pinpoint areas where the development practice is weak, areas where secure development practice is strong and give a security practitioner the ability to address the root cause of the weaknesses within a developed solution. It may give rise to investigation into software development policies and guidelines and the interpretation of them by the users; communcation is the key.

Metrics can also be recorded relating to the performance of the code reviewers and the accuracy of the review process. The performance of the code review function and the efficiency and effectiveness of the code review function.



Secure Development Metrics

 * Defect Density:

The average occurance of programming faults per Lines of code (LOC). This gives a high level view of the code quality but not much more. Fault density on its own does not give rise to a pragmatic metric. Defect density would cover minor issues as well as major security flaws in the code; all are treated the same way. Security of code can not be judged accuratley using defect density alone.


 * Lines of code (LOC):

The count of executable lines of code. Commented-out code or spaces don't count. This is another metric in an attempt to quantify the size of the code. This gives a rough estimate but is not particularly scientific. Some circles of thinking believe that the estimation of an application size by virtue of LOC is professional malpractice!


 * Function Point:

The estimation of software size by measuring functionality. The combination of a number of statememts which perform a specific task. Independent of programming language used or development methodology.


 * Risk Density :

Similar to defect density but discovered issues are rated by risk (high, medium & low). In doing this we can give insight into the quality of the code being developed via a [X Risk / LoC] or [Y Risk / Function Point] value. (X&Y being high, medium or low risks) as defined by your internal application development policies and standards. Eg: 4 High Risk Defects per 1000 (Lines of Code) 2 Medium Risk Defects per 3 Function Points


 * Path complexity/complexity-to-defect/cyclomatic complexity

Cyclomatic complexity can help establish risk and stability estimations on an item of code such as a class or method or even a complete system. It was defined by Thomas McCabe in the 70's and it easy to calsulate and apply hence its usefullness.

CC = Number of decisions +1

A decision could be considered commands such as:

If....else switch case catch while do

and so on.....

As the decision count increases so does the complexity. Complex code leads to less stability and maintainability. The more complex the code the higher risk of defects.

One could establish tresholds for Cyclomatic complexity:

0-10: Stable code. Acceptable complexity 11-15: Medium Risk. More complex 16-20: High Risk code. Too many decisions for a unit of code.

Review Process Metrics

 * Inspection Rate

This metric can be used to get a rough idea of the required duration to perform a code review. The inspection rate is the rate of coverage a code reviewer can cover per unit of time. From experience, a rate of 250 lines per hour would be a baseline. This rate should not be used as part of a measure of review quality but simply to determine duration of the task.


 * Defect detection Rate

This metric measures the defects found per unit of time. Again can be used to measure perfromance of the code review team but not to be used as a quality measure. Defect detection rate would normally increase as the inspection rate (above) decreases.


 * Code Coverage

Measured as a % of LoC of function points, the code coverage is the proportion of the code reviewed. In the case of manual review we would aim for close to 100%, unlike automated testing wherein 80-90% is considered good.

The amount of time used to correct detected defects. This metric can be used to optimise a project plan within the SDLC. Average values can be measured over time, producing a measure of effort which must be taken into account in the planning phase.
 * Defect Correction Rate

The rate at which upon reinspection of the code more defects exist, some defects still exist or other defects manifest through an attempt to address previously discovered defects.
 * Reinspection defect Rate