TITLE: The H Number
AUTHOR: Eugene Wallingford
DATE: June 22, 2006 3:08 PM
DESC:
-----
BODY:
I ran across the concept of an "h number" again over
at
Computational Complexity.
In case you've never heard of this number, an author
has an h number of *h* if *h* of her
*Np* papers have ≥ *h* citations each,
and the rest of her (*Np* - *h*) papers
have ≤ *h* citations each.
It's a fun little idea with a serious idea behind it:
Simply counting publications or the maximum number of
citations to an author's paper can give a misleading
picture of a scientist's contribution. The h number
aims to give a better indication of an author's cumulative
effect and relevance.
Of course, as Lance points out, the h number can mislead,
too. This number is dependent on the research community,
as some communities tend to publish more or less, and
cite more or less frequently, than other. It can reward
a "clique" of authors who generously cite each other's
work. Older authors have written more papers and so will
tend to be cited more often than younger authors. Still,
it does give us different information than raw counts,
and it has the enjoyability of a good baseball statistic.
Now someone has written an
h number calculator
that uses
Google Scholar
to track down papers for a specific researcher and then
compute the researcher's index. (Of course, this
introduces yet another sort of problem... How accurate
is Scholar? And do self-citations count?)
I love a good statistic and am
prone to vanity surf,
so I had to go compute
my h number:
The h-number of Eugene Wallingford is 5 (max citations = 22)

You can put that into perspective by checking out some
folks with
much larger numbers.
(Seventy?) I'm just surprised that I have a paper with 22
citations.
I also liked
one of the comments
to Lance's post. It suggests another potentially useful
index -- (*h* * *maxC*)/1000, where
*maxC* is the number of citations to the author's
most cited paper -- which seems to combine breadth of
contribution with depth. For the baseball fans among
you, this number reminds me of OPS, which adds on-base
percentage to slugging percentage. The analogy even
feels right. *h*, like on-base percentage,
reflects how the performer contributes broadly to the
community (team); *maxC*, like slugging percentage,
reflects the raw "power" of the author (batter).
The commenter then considers a philosophical question:
*
Lastly, it is not so clear that a person who has published
a thousand little theorems is truly a worse scientist than
one who has tackled two large conjectures. You don't agree?
Paul Erdos was accused of this for most of his life, yet
for the last two decades of his life it became very clear
that many of those "little theorems" were gateways to entire
areas of research.
*

Alan Kay doesn't publish a huge number of papers, but his
work has certainly had a great effect on computing over
the last forty years.
Baseball has lots of different statistics for comparing the
performance of players and teams. Have a large set of tools
can both be fun and give a more complete picture of the
world.
I suppose that I should back to working beefing up my
h number, or at least doing something administrative...
-----