I want to count the lines of code in a multi-file Python project as accurately as possible, but without including comments, docstrings or blank lines in the total.
I first tried using cloc, which is available as a Debian package. But cloc treats most docstrings as code - even though they are comments. (Actualizar: no longer - recent versions of cloc now treat Python docstrings as comments.)
I notice some comments below saying that docstrings should be included in the total because they might be used by the code to influence behaviour at runtime and hence count as part of the programs code/data/config. A prominent example of this is 'ply', which asks you to write functions with docstrings which, as I recall, contain grammar and regular expressions which are central to the program's operation. However, this seems to me to be very much a rare exception. Most of the time docstrings act just like comments. Specifically, I know for a fact that is true for all the code I want to measure. So I want to exclude them as such from my line counts.
preguntado el 31 de enero de 12 a las 08:01
It is probably correct to include Python docstrings in a "lines of code" count. Normally a comment would be discarded by the compiler, but docstrings are parsed:
Consulte nuestra página PEP 257 - Docstring Conventions:
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the
__doc__atributo especial de ese objeto.
String literals occurring en otra parte in Python code may also act as documentation. They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes..
In other words, docstrings son compiled and constitute, in a very real way, the code of the program. Additionally, they're commonly used by the
doctest module for unit testing, as usage strings for command line utilities, and so on.
Comment lines can be lines of code in python. See
doctest por ejemplo.
Moreover, you will have trouble to find a sensible/reliable way to consider a case like this as being a comment or code:
foo = ('spam', '''eggs eggs eggs''' '''more spam''', 'spam')
Just count the comment lines as well, I think most programmers will agree it is as good a measure for whatever you are actually trying to measure.
Tahar doesn't count the docstrings. Here's its count_loc function :
def count_loc(lines): nb_lines = 0 docstring = False for line in lines: line = line.strip() if line == "" \ or line.startswith("#") \ or docstring and not (line.startswith('"""') or line.startswith("'''"))\ or (line.startswith("'''") and line.endswith("'''") and len(line) >3) \ or (line.startswith('"""') and line.endswith('"""') and len(line) >3) : continue # this is either a starting or ending docstring elif line.startswith('"""') or line.startswith("'''"): docstring = not docstring continue else: nb_lines += 1 return nb_lines