productivity GTD efficiency results methodology

Soft peer review? Social software and distributed scientific evaluation

February 21st, 2007 by
For an extended version of this post, see also:
D. Taraborelli (2008), Soft peer review. Social software and distributed scientific evaluation, Proceedings of the 8th International Conference on the Design of Cooperative Systems (COOP 08), Carry-Le-Rouet, France, May 20-23, 2008

Online reference managers are extraordinary productivity tools, but it would be a mistake to take this as their primary interest for the academic community. As it is often the case for social software services, online reference managers are becoming powerful and costless solutions to collect large sets of metadata, in this case collaborative metadata on scientific literature. connotea popular tagsTaken at the individual level, such metadata (i.e. tags and ratings added by individual users) are hardly of interest, but on a large scale I suspect they will provide information capable of outperforming more traditional evaluation processes in terms of coverage, speed and efficiency. Collaborative metadata cannot offer the same guarantees as standard selection processes (insofar as they do not rely on experts’ reviews and are less immune to biases and manipulations). However, they are an interesting solution for producing evaluative representations of scientific content on a large scale.

I recently had the chance of meeting one of the developers of CiteULike and this was a good occasion to think of the possible impact these tools may have in the long run on scientific evaluation and refereeing processes. My feeling is that academic content providers (including publishers, scientific portals and bibliographic databases) will be urged to integrate metadata from social software services as soon as the potential of such services is fully acknowledged. In this post I will try to unpack this idea.

Traditional peer review has been criticised on various grounds but possibly the major limitation it currently faces is scalability, i.e. the ability to cope with an increasingly large number of submissions, which—given the limited number of available reviewers and time constraints on the publication cycle—results in a relative small acceptance rate for high quality journals. Although I don’t think social software will ever replace hard evaluation processes such as traditional peer review, I suspect that soft evaluation systems(as those made possible by social software) will soon take over in terms of efficiency and scalability. The following is a list of areas in which I expect social software services targeted at the academic community to challenge traditional evaluation processes.

Semantic metadata

A widely acknowledged application of tags as collaborative metadata is to use them as indicators of semantic relevance. Tagging is the most popular example of how social software, according to its defendants, helped overcome the limits of traditional approaches to content categorization. Collaboratively produced tags can be used to extract similarity patterns or for automatic clustering. In the case of academic literature, tags can provide extensive lists of keywords for scientific papers, often more accurate and descriptive than those originally added by the author. The following is an example of tags used by del.icio.us users to describe a popular article about tagging, ordered by the number of users who selected a specific tag.

del.icio.us screenshot

Similar lists can be found in CiteULike or Connotea, although neither of these services seem to have realized so far how important it is to rank tags by the number of users who applied them to a specific item. Services allowing to aggregate tags for specific items from multiple users are in best position to become providers of reliable semantic metadata for large sets of scientific articles in an effortless way.

Popularity

Another fundamental type of metadata that can be extracted by social software is popularity indicators. Looking at how many users bookmarked an item in their personal reference library can provide a reliable measure of the popularity of that item within a given community. Understandably, academically oriented services (like CiteSeer, Web of Science or ) have focused so far on citations, which is the standard indicator of a paper’s authority in the bibliometric tradition. My feeling is that popularity indicators from online reference managers will eventually become a factor as crucial as citations for evaluating scientific content. This may sound paradoxical if we consider that authority measures were introduced precisely to avoid the typical biases of popularity measurements. But insofar as popularity data are extracted from the natural behavior of users of a given service (e.g. users bookmarking an item because they are genuinely interested in reading it, not to boost its popularity) they can provide pretty accurate information on what papers are frequently read and cited in a given area of science. It would actually be interesting to conduct a study on a representative sample of articles comparing the distribution of citations and the distribution of popularity indicators (such as bookmarks in online reference managers) to see if there is any significant correlation.

del.icio.us recently realized the strategical importance of redistributing popularity data it collects. It recently introduced the possibility of displaying on external websites a popularity badge based on the number users who added a specific URL to their bookmarks. Similar ideas have been in circulation for years (consider for example Google’s PageRank indicator or Alexa’s rank in their browser toolbars) but it seems that social software developers have only recently caught up with this idea. popularity indicator in CiteULikeConnotea, CiteULike and similar services should consider giving back to content providers (from which they borrow metadata) the ability to display the popularity indicators they produce. When this happens, it’s not unlikely that publishers may start displaying popularity indicators on their websites (e.g. “this article was bookmarked 10,234 times in Connotea”) to promote their content.

Hotness

“Hotness” can be described as an indicator of short-term popularity, a useful measure to identify emerging trends within specific communities. Mapping popularity distributions on a temporal scale is actualy a common practice. Authoritative indicators such as ISI Impact Factor take into account the frequency of citations articles receive within specific timeframes. Similar criteria are used by social software services (such as del.icio.us, technorati or ) to determine “what’s hot” in the last days of activity.

Online reference managers have recently started to look at such indicators. As of its current implementation, CiteULike extracts measures of hotness by explicitly asking users to vote for articles they like. The goal—CiteULike developer Richard Cameron explains—is to “catch influential papers as soon as possible after publication”. I think in this case they got it wrong. Relying on votes (whether they are combined or not with other metrics) is certainly not the best way of extracting meaningful popularity information from users, insofar as most users who use these services for work won’t ever bother to vote, whereas a large part of those users who actively vote may do so for opportunistic reasons. I believe that in order to provide reliable indicators, popularity measures should rely on patterns that are implicitly generated by user behavior: the best way to know what users prefer is certainly not to ask them, but to extract meaningful patterns from what they naturally do when using a service. Hopefully online reference management services will soon realize the importance of extracting measures of recent popularity in an implicit and automatic way: most mature social software projects have faced this issue avoiding the use of explicit votes.

Collaborative annotation

One of the most understated (and in my opinion, most promising) aspects of online reference managers is the ability they provide to collaboratively annotate content. Users can add reviews to items they bookmark, thus producing lists of collaborative annotations. This is interesting because adding annotations is something individual users naturally do when bookmarking references in their library. The problem with such reviews is that they can hardly be used to extract meaningful evaluative data on a large scale.

The obvious reason why collaborative annotation cannot be compared, in this sense, with traditional refereeing is that the expertise of the reviewer is questionable. Is there a viable strategy to make collaborative annotation more reliable while maintaining the advantages of social software? A solution would be to start rating users as a function of their expertise. Asking users to rate each other is definitely not the way to go: as in the case of “hotness measures” based on explicit votes, mutual user rating is an easily biasable strategy.

The solution I’d like to suggest is that online reference management systems implement an idea similar to that of anonymous refereeing, while making the most of their social software nature. The most straightforward way to achieve this would be, I believe, a wiki-like system coupled with anonymous rating of user contributions. Each item in the reference database would be matched to a wiki page where users would freely contribute their comments and annotations. Crucial is the fact that each annotation would be displayed anonymously to other users, who whould then have the possibility to save it in their own library if they consider it useful. This behavior (i.e. importing useful annotations) could then be taken as an indicator of a positive rating for the author of the annotation, whose overall score would result from the number of anonymous contributions she wrote that other users imported. Now it’s easy to see how user expertise could be measured with respect to different topics. If user A got a large number of positive ratings for comments she posted on papers massively tagged with tag “dna”, this will be a indicator of her expertise for the “dna” topic within the user community. User A will have different degrees of expertise for topics “tag1″, “tag2″, “tag3″, as a function of the usefulness other users found in her anonymous annotations to papers tagged respectively with “tag1″, “tag2″, “tag3″.

This is just an example and several variatons of this scheme are possible. My feeling is that allowing indirect rating of metadata posted via anonymous contributions would allow to implement at a large scale a sort of soft peer review process. This would allow social software services to aggregate much larger sets of evaluative metadata about scientific literature than traditional reviewing models will ever be able to provide on a large scale.

The place of social software in scientific knowledge production

I’ve suggested a few ways in which online reference management systems could be used to extract from user behavior evaluative indicators of scientific literature. In the long run, I expect these bottom-up, distributed processes to become more and more valuable to the academic community and traditional publishers to become increasingly aware of the usefulness of metadata collected through social software.

This will be possible if online reference management services start developing facilities (ideally programmable interfaces or API) to expose the data they collect and feed them back to potential consumers (publishers, individual users or other services). The future place of online reference managers (the way I wish it to be) is that of intermediate content providers—between information producers and information consumers—of collaborative metadata. To quote a recent post on the future of mashup economy:

[Y]ou don’t have to have your own data to make money off of data access. Right now, there’s revenue to be had in acting as a one-stop shop for mashup developers, essentially sticking yourself right between data providers and data consumers.

I think a similar strategy could justify a strong presence of these services in the scientific arena. If they succeed in doing this, they will come to occupy a crucial function in the system of scientific knowledge production and challenge traditional processes of scientific content evaluation.

tags: social bookmarking, , , , , , , , , , , , , , ,

If you enjoyed this post, make sure you !