Cite Zeer

From CommerceNet Wiki

Jump to: navigation, search

by Rohit Khare; see also Publications

A decentralized paper reading/ranking system?

I was thinking of bringing together aspects of zSearch and ARRESTED when I was leafing through the paper program for WOSS'04. At Hypertext'04, Jim Whitehead's students had the latest in a long run of bibliometric papers to recommend good 'background reading' by iteratively crawling the references of a given paper. I was struck by a possible typo that made me wonder if an author with the same last name in the same community was likely to be the one, or if this was a different, new principal investigator. To disambiguate that, I immediately found myself relying on subtle cues like the social connectivity of the program committee and even the grouping of papers in sessions -- information that's entirely lost in the ordinary bibliographic citation process.

  • Can we calculate the relevance of papers by computing links through "hidden" nodes such as PCs, timeslots, and other social cues we could crawl from the web?
  • Could we showcase a plethora of 'naive' Bayesian estimators (or as Armando Fox has been generalizing it, statistical learning theory (SLT)) to weight whether:
    • Is an initial is the same as a first-name, given the rarity of the last name, co-occurrence with other authors, discipline, prior probability of a typo on certain keyboards, etc...?
    • Is affiliation stable over the years? Can we attempt to infer affiliation at the PI level, not just the departmental level?
  • Could we recognize that the same paper can appear over the years due to publishing cycles -- that is, when grouping "2002" season papers, what do you attribute as the publication date to a paper submitted in 02, revised in 03 and published in 04?
  • Could we detect co-ocurrence by grant #s, granting agencies, even trace the impact of granting officers?

Basically, as we keep looking into RSS and email as PersonalWeb/Fisher problem domains, new research papers are pretty similar to new mail -- and they definitely form an information/reputation market...

So the cool-demo-aspect would be to see if we could produce a much slicker UI than CiteSeer, something a lot more like Amazon and "Search Inside The Paper" with the encouragement of reviews and other-people-also-read-these -- even URL History Bloom Filters could be relevant.

Personal tools