Open Testware Reviews
Sclc Metrics Tool
Copyright 2003 by Tejas Software Consulting
- All rights reserved.
Contents
Overview -- Maturity
-- Project
activity -- Platforms
-- Support -- Documentation
-- Installation -- Implementation -- Performance
-- Similar tools -- Limitations -- Observations
Overview
Reviewed: 2003-April-18
Version reviewed: 1.23,
2003-April-15
Maintainer: Brad
Appleton
URL: http://www.bradapp.net/clearperl/sclc-cdiff.html
Testingfaqs.org category: Static Analysis Tools
License: Perl license
- Artistic or GNU GPL
User interface: command
line
Sclc is a code size measurement tool, able to report total lines, blank
lines, comment lines, assembly equivalent source lines, and the most useful,
non-comment source lines (NCSL). Sclc is an abbreviation for "source code
line counter." It can parse 13 different languages: Ada, Assembly, Awk,
C, C++, Eiffel, Java, makefiles, Lisp, Pascal, Perl, shell scripts, and
Tcl.
Sclc, and its cohort, cdiff, are housed as an afterthought on Brad Appleton's
"ClearPerl: Perl 5 modules for ClearCase" web page. I haven't used ClearCase
in years, but I've found that sclc is useful in many ways that have nothing
to do with ClearCase.
Maturity
4 - Beta (on a scale of 1-5)
With more feedback to the author from users, more testing (preferably
with test data added to the distribution package), and a bit of further
refinement, the tool could step up to production quality.
Project activity
3 - Stable (on a scale of 1-5)
It's hard to know how often the tool has been updated because until
recently there was no change log or version tracking. The user community
is not very active. Recent updates saved the tool from "Inactive" status.
Platforms
Sclc is portable to a wide range of platforms. The full range is
not documented. I used it successfully on Windows 2000 with ActiveState
Perl 5.8.0, Cygwin/Windows 2000 with Cygwin Perl 5.8.0, RedHat Linux
7.3 with Perl 5.6.1, MacOS X 10.1 with Perl 5.6.0, and HP-UX 9.05 with
Perl 5.5.2. It will very likely work on any Unix platform supported by
Perl. I tested sclc on the DejaGnu 1.4.3 source distribution, and verified
that I got the same results on each of these platforms. Verifying this
wasn't easy, because the order of the language summary at the bottom of
the output was different on each platform, and the order of the files
printed out in the detailed output was different between Windows and the
other platforms. There was a lot of output to stderr because sclc looked
at all files in the distribution, many of which were not programs. I had
to take care to separate the error output from the report because when I
captured stdout and stderr in the same file, the errors occasionally showed
up in the middle of one of the report lines.
My compatibility test illustrates the fact that static analysis tools
like sclc don't necessarily have to run on your target platform or your
development platform. You can run the tool on any available system that
can access your code.
I did encounter a fatal error on Windows using an earlier version
of sclc, but that's fixed in the latest sclc version.
The modules sclc uses are all provided with the standard Perl distribution.
You'll need Perl version 5.4 or later.
Support
There is little in the way of support for sclc. There is no public
bug tracking database, and no public version control system. Only the
most recent version of the tool is available on the web page. The ClearPerl
web page (the parent page containing the sclc page) mentions a ClearPerl
mailing list, but the list is defunct. Sclc is mentioned infrequently on
the ClearCase International
Users Group (CCIUG) mailing list, including a recent announcement from
the tool's author about a new version of sclc that was posted. The web page
for the mailing list says that only Rational customers may join, though the
list archives are publically available.
The author, Brad Appleton was easily reachable via email during the course
of my review, and he was eager to address problems that I reported. So
it seems best to contact Brad directly with any questions you have.
Documentation
Sclc is sufficiently documented. There is some background information
on the web page. The script includes documentation in the Perl-standard
POD format, which can be accessed using the script's "-help" option, by
feeding the script directly to the perldoc program, or by accessing the html
version of the documentation. I had trouble with the formatting of the
documentation using the "-help" option on Cygwin and on one of two Linux platforms.
I suspect this is due to flakiness in perldoc rather than a problem with
sclc.
There is little documentation of the "AESL" metric, which stands
for "assembly-equivalent source lines." The comments in the script refer
to the "Programming Language Table" from Capers Jones' company Software
Productivity Research, Inc. The URL given is defunct - the correct URL
is http://www.spr.com/products/programming.shtm.
This page gives dire warnings about using the data from this table, though
it refers to function point estimates rather than using AESL to compare
the size of programs written in different languages. The report now costs
US$75.00, which is a barrier for anyone wanting to enhance sclc to calculate
AESL metrics for additional languages.
Installation
Sclc is available in either a zip file or a gzipped tar file. The
package includes a short README file, a copy of the main web page and the
html-formatted man page, plus the cdiff script. Cdiff is an add-on for
the ClearCase configuration management system, and sclc has hooks for integrating
with cdiff. There are slight differences between the zip and tar packages,
to appease Windows and Unix users, respectively--the sclc script has a .pl
extension in the Windows package, the README file is slightly different on
both, and each package is adjusted to have platform-appropriate line endings.
Note that the html pages have links that only work if you're viewing the
pages live on the web site.
To execute the sclc script directly, you have to edit the first line
of the script to replace "#!/usr/misc/bin/perl5" with the path where
Perl is installed on your system, which is often "#!/usr/bin/perl". The
default path is reminiscent of the days when Perl users were transitioning
from Perl 4 to Perl 5, and is very unlikely to be right for systems installed
within the last several years.
There is no automatic installer - you just extract sclc to a location
in your filesystem where your normally keep executable files.
Implementation
Sclc is implemented in a single Perl script. Run against itself, it counts
1008 non-comment source lines. There are 680 comment lines. Function
headers vary from a single line to a long standardized template. There
is a sprinkling of comments throughout the code that should help experienced
Perl programmers understand the script.
I like the modular design of the definitions for each language that sclc
parses. I tried once to add code to support the Limbo programming language,
but after several minutes of studying the code and the comments I wasn't
quite able to get the gist of it. I ended up using the -language option
to tell it that Limbo programs were shell code, because they both use the
same comment syntax, and I think the line count results ended up being
accurate except for the overall summary for shell code. If I had wanted
to spend more time teaching sclc to understand Limbo, I'm pretty sure I
could have done it with some experimentation and/or help from the author.
There are no test cases that ship with the tool. The script does use the
"use strict" mechanism to enforce good programming practices, but it
does not use "use warnings" to catch errors at run-time.
The author states that sclc is "ancient Perl 4 code," with only minimal
porting to use common Perl 5 coding standards. This should only be a
concern to Perl purists who both want to work with a newer style implementation
and aren't willing to help update the sclc code.
Performance
Parsing source code tends to be cpu-intensive. I tested sclc by running
it recursively on all files in the DejaGnu 1.4.3 source distribution
with 66,000 total lines of code. It took an average of 2 minutes on a
266 MHz Windows machine, 48 seconds on a 600 MHz Mac, and 27 seconds on
an i686 Linux machine of unknown origin.
Similar tools
Clc is a predecessor to sclc, also written by Brad Appleton. It works
for C, C++, and Perl, and unlike sclc, it also counts source statements as
well as lines of code. When I use clc to measure sclc, it agreed with sclc's
assessment of total lines of code, but disagrees significantly on the number
of non-comment source statements, 1008 according to sclc, and 1656 according
to clc. I don't trust the numbers from clc - a simple grep shows that there
are only 1480 lines that don't start with a comment character. I think the
two tools tend to agree more on C code. The moral is - it doesn't hurt to
use two different tools as a sanity check.
Clc is dated February 14, 1995, a few months before sclc originally appeared.
It's on Chris Lott's "Metrics collection
tools for C and C++ Source Code" page, which includes several other
tools that are either abandoned or are outdated copies. With some detective
work you can find more recent copies of some of them.
One such tool listed on Lott's page is sloccount. You'll find the latest
version at http://www.dwheeler.com/sloccount/.
Sloccount can parse 27 languages, more than twice as many as sclc, it
probably has better heuristics for automatically determining the language
used within a file, and it seems to have a larger user community than
sclc. I compared the NCSL numbers from both sloccount and sclc after analyzing
a directory of Linux kernel source files, and the numbers were identical.
So why do I stick with sclc? I like sclc's user interface better. It's
more difficult to specify the files you want to process with sloccount,
especially if you just want to check one file. Sloccount's output is more
cluttered. It probably wouldn't take much effort to make sloccount easier
to use, but at first glance, I like sclc better.
Limitations
- Sclc counts lines of code, not source statements. Counting
source statements is probably more accurate when you're dealing with code
developed with a variety of coding styles, because raw line counts are
sensitive to coding style. Using the -delim-ignore option may make sclc's
NCSL numbers roughly similar to a source statement count.
- The tool often makes wrong guesses about language - for example,
it thought a Cascading Style Sheet (.css) file was C code, it counted
a makefile.in file as shell code rather than a makefile, it parsed an RTF
file as Pascal, and it counted Expect and incr Tcl files as shell code.
The workaround is to use the -language option to make it smarter about recognizing
file extensions, and exclude documentation and other files you don't want
it to count.
- The numbers are likely to be different if you're using the "-diff"
option and you use a context diff rather than the default diff format.
The sum of the inserted and deleted lines tends to stay the same, but the
individual counts vary.
- With the "-diff" option, sclc doesn't try to count changed lines.
All modifications are reported only in terms of additions and deletions.
- The tool gives no output if you give it the "-diff" option
and its input stream is empty. I'd prefer to get positive confirmation
that sclc ran.
- I had to go to the source code to find out that the regular expressions
given as arguments to "-name" and "-except" are sandwiched between ^ and
$ anchors automatically. The "filename must completely match" comment in
the man page didn't get this point across to me.
- Sclc doesn't run preprocessors on files. After macro expansion,
conditional compilation, etc., the number of lines of code that actually
get compiled can grow or shrink. That's probably okay, and probably the
way most metrics tools work, but it's something to be aware of.
- The AESL value changes when using the -delim-ignore option,
because this option changes the NCSL count and the AESL is based on NCSL.
It seems strange that the assembly equivalent for a file would change if
you change the way you measure the source code.
- The "Totals" and "LangTotals" arguments to the "-sections" option
are ignored if you're only analyzing one source file. This might make it
more difficult to parse the output with a script that doesn't know ahead
of time how many files will be analyzed. There is also no mention in the documentation
that these two sections are omitted by default if there's only one file.
- It would help if the column headings were repeated before the summary,
to help the user remember what each column represents. I once read the
AESL column thinking it was NCSL, which resulted in an order-of-magnitude
error in the numbers I reported, and I didn't catch the error for several
days.
- If you put the same filename on the command line twice, there
is no warning and the file is counted twice in the totals.
- If you ask sclc to process a file that doesn't exist, it gives
a misleading error such as: "sclc: Can't determine programming language
for nosuchfile."
- The man page is garbled with escape characters on my Cygwin/Windows
2000 configuration using TERM=cygwin or vt100. Probably not an sclc bug.
I also sometimes saw an error on Linux that resulted in the man page coming
out as raw POD markup.
- If there are 100,000 or more lines in a source file, the columns
in the output don't line up.
These issues were found and addressed during the course
of the review. They don't affect version 1.23.
- Fatal error using ActiveState Perl on Windows to process a
file that doesn't have an extension - "Unmatched ) in regex" at line 1244.
- If I enter a bad command-line argument, a man page is printed
that obscures the error message.
- Reports erroneously that an empty file with no extension
is a binary file.
- Gives no output for an empty file with an extension (e.g. empty.c).
- Tries to process directories as source files when not using
-recurse. Either reports them as binary files or says it can't determine
the programming language.
- Silently ignores context diff's (diff -c) when using the -diff
option.
- The synopsis in the man page is wider than the standard 80 character
line on Linux and HP-UX.
Observations
Sclc is a tool that can do simple static analysis on 13 different programming
languages (if you count makefiles as a language). It uses a command-line
interface that looks like this when you ask it to analyze itself:
$ sclc sclc
Lines Blank Cmnts NCSL AESL
===== ===== ===== ===== ========== =======================================
1824 167 680 1008 15120.0 sclc (Perl)
It shows a total line count, the number of blank lines, the number of lines
containing either full-line or inline comments, and the number of non-comment
source lines (NCSL). Note that if your code has lines with inline comments,
those lines will be counted in both the comment and NCSL total. The AESL
metric refers to "assembly-equivalent source lines," discussed further in
the Documentation section earlier. I don't use the AESL metric, and I could
suppress it from the output using the "-counts" option.
These kinds of metrics, especially the raw line count (often called the
LOC or KLOC metric, for "lines of code" or "thousands of lines of code")
that you could generate with a simple tool like the Unix "wc" utility, have
been the cause of much academic hand-wringing because there are so many
ways to misuse the metrics. I'm going to presume that if you're considering
using a tool like sclc, you've done some background reading so that you
understand the limitations of these simplistic metrics. Here are a few
references to get you started--Software Metrics:
Successes, Failures and New Directions by Norman E. Fenton, and for an
interesting but hard to implement alternative, Managing (the Size
of) Your Projects: A Project Management Look at Function Points by
Carol Dekkers.
Despite the shortcomings, LOC and NCSL metrics are very common software
metrics. In fact, for both of my previous reviews, I have reported NCSL
metrics for the tools I reviewed, using data generated by sclc. These metrics
give us a very rough idea of how complex the programs you're dealing with
are. I've also used sclc during a consulting engagement to demonstrate the
magnitude of the hundreds of thousands of lines of code they were trying
to wrangle.
If you're analyzing more than one file, you get a column total at the
bottom of the output, plus a file count. And if you're dealing with more
than one programming language, you'll get a breakdown by language. Here's
an example of a further refinement of the DejaGnu analysis I mentioned earlier,
to help sclc figure out the programming language and to tell it to ignore
certain files.
$ sclc.pl -language .exp=tcl -language .itcl=tcl -except '.*\.(rtf|in|am)' \
-recurse -ignore -counts Lines+Blank+Cmnts+NCSL .
Lines Blank Cmnts NCSL
===== ===== ===== ===== ===================================================
36 1 1 34 ./.clean (shell)
901 134 314 453 ./aclocal.m4 (shell)
23 4 11 8 ./baseboards/a29k-udi.exp (Tcl)
37 8 15 14 ./baseboards/arc-sim.exp (Tcl)
...
34277 3985 6952 23504 ----- Tcl ----- (256 files)
409 63 38 310 ----- C++ ----- (2 files)
345 44 16 289 ----- C ----- (6 files)
24652 2588 2882 19291 ----- shell ----- (51 files)
2223 214 606 1416 ----- Lisp ----- (1 file)
61906 6894 10494 44810 ***** TOTAL ***** (316 files)
This still needs further tuning, because there are still documentation
files like ".clean" above that shouldn't be counted, m4 files that might
should be counted but not as shell scripts, and all the ".in" and ".am"
files that I haven't figured out. So what counts as code that needs to be
counted? Tools aren't going to help much with that conundrum.
Sclc is packaged with the cdiff tool, which is a wrapper on top of the
cleardiff command in the ClearCase configuration management system. Sclc
has a few command line options for interfacing with cdiff. It probably wouldn't
take much effort to enhance sclc to talk to other configuration management
systems as well. But sclc works just fine if you don't use ClearCase.
Note that the URL for sclc is a redirect to a different site. Be sure
to take note of the bradapp.net URL, because the underlying page has changed
recently, and it may change again.
There are many freeware metrics tools lurking around the Internet, especially
for C code, though most of them are orphaned. Sclc is slightly rough around
the edges, but it's my favorite among the tools I've tried.