TITLE: The H Number AUTHOR: Eugene Wallingford DATE: June 22, 2006 3:08 PM DESC: ----- BODY: I ran across the concept of an "h number" again over at Computational Complexity. In case you've never heard of this number, an author has an h number of h if h of her Np papers have ≥ h citations each, and the rest of her (Np - h) papers have ≤ h citations each. It's a fun little idea with a serious idea behind it: Simply counting publications or the maximum number of citations to an author's paper can give a misleading picture of a scientist's contribution. The h number aims to give a better indication of an author's cumulative effect and relevance. Of course, as Lance points out, the h number can mislead, too. This number is dependent on the research community, as some communities tend to publish more or less, and cite more or less frequently, than other. It can reward a "clique" of authors who generously cite each other's work. Older authors have written more papers and so will tend to be cited more often than younger authors. Still, it does give us different information than raw counts, and it has the enjoyability of a good baseball statistic. Now someone has written an h number calculator that uses Google Scholar to track down papers for a specific researcher and then compute the researcher's index. (Of course, this introduces yet another sort of problem... How accurate is Scholar? And do self-citations count?) I love a good statistic and am prone to vanity surf, so I had to go compute my h number:

The h-number of Eugene Wallingford is 5 (max citations = 22)

You can put that into perspective by checking out some folks with much larger numbers. (Seventy?) I'm just surprised that I have a paper with 22 citations. I also liked one of the comments to Lance's post. It suggests another potentially useful index -- (h * maxC)/1000, where maxC is the number of citations to the author's most cited paper -- which seems to combine breadth of contribution with depth. For the baseball fans among you, this number reminds me of OPS, which adds on-base percentage to slugging percentage. The analogy even feels right. h, like on-base percentage, reflects how the performer contributes broadly to the community (team); maxC, like slugging percentage, reflects the raw "power" of the author (batter). The commenter then considers a philosophical question:

Lastly, it is not so clear that a person who has published a thousand little theorems is truly a worse scientist than one who has tackled two large conjectures. You don't agree? Paul Erdos was accused of this for most of his life, yet for the last two decades of his life it became very clear that many of those "little theorems" were gateways to entire areas of research.

Alan Kay doesn't publish a huge number of papers, but his work has certainly had a great effect on computing over the last forty years. Baseball has lots of different statistics for comparing the performance of players and teams. Have a large set of tools can both be fun and give a more complete picture of the world. I suppose that I should back to working beefing up my h number, or at least doing something administrative... -----