Gabriella Kazai

Profile summary

I am currently working as a research consultant at Microsoft Bing and at Microsoft Research. My background is in computer science with over 8 years of research experience in information retrieval (IR). My work focuses on user-oriented aspects of IR and personal information management (PIM), with influences from HCI. My research interests include crowdsourcing, IR evaluation, social IR, information seeking behaviour, activity based PIM, book search and personal digital libraries. My current focus is on crowdsourcing and search engine evaluation.

I am founder and organiser of the INEX Book Track since 2007. I am co-founder and co-organizer of the TREC Crowdsourcing track since 2011. I am currently working on a book “Crowdsourcing for Search Engine Evaluation” with Omar Alonso and Stefano Mizzaro, to be published by Springer in 2012.

I hold a PhD in IR from Queen Mary University of London. My PhD work covered the evaluation of focused information retrieval approaches. I published over 90 papers and organized several workshops and an IR conference.

Projects

Crowdsourcing Search Relevance: My research in crowdsourcing, in the context of relevance data gathering, focuses on developing methods and metrics to measure the influence of task design decisions on the output quality of crowdsourcing engagements and on the human factors that characterize the crowds. I work on methods for spam worker detection, bias and noise analysis in the resulting labels using gold set data derived from search engine click logs and behavioural observations through, e.g., personality traits as well as mouse movement logging.

IR Evaluation Measures: This line of research builds on my PhD and includes the development of algorithms and methods for evaluating the effectiveness of search systems, taking into account the user's browsing behaviour.

Social Information Retrieval: My work in this area focuses on models incorporating the notions of trust and reputation, authoritativeness and popularity to aid personalised retrieval or recommender systems.

Book Search: Concerns the development of algorithms and systems for the domain specific searching and browsing of collections of digitized books, see www.booksearch.org.uk, as well as associated user studies, investigating user’s post-query browsing behaviour.

Research Desktop: Designing and developing technologies to aid the everyday work of knowledge workers, with focus on four key areas: 1) Support for activity based computing, 2) Pervasive research tools, 3) Library, and 4) Notes. See also Colletta project

ScholarLynk: Design and development of a Cloud architecture and desktop client prototype supporting the collaborative use and sharing of scholarly search results through reading lists.

INEX Book Track: I am founder and organizer of the Book Track at the INEX evaluation initiative since 2007, which investigates techniques to support users in searching and navigating the full texts of digitized books and complementary social media. My current research here focuses on methodology and systems for evaluating book search engines and crowdsourcing relevance judgements on parts of books. In this context, I developed a crowdsourcing system for collecting relevance judgements for digitized books as part of a social game.

TREC Crowdsourcing Track: I am co-founder and co-organizer of the TREC Crowdsourcing Track with Matthew Lease, Panagiotis G. Ipeirotis and Mark D. Smucker. The track investigates crowdsourcing techniques for IR evaluation. In 2012, the goal is to evaluate approaches to crowdsourcing high quality relevance judgments for two different types of media: 1) web pages and 2) images.

Selected Publications

  1. O. Alonso, G. Kazai, S. Mizzaro: Crowdsourcing for Search Engine Evaluation. Springer. 2012.
  2. G. Kazai, J. Kamps, N. Milic-Frayling: Human Factors and Label Accuracy in Crowdsourcing Relevance Judgements. IR Journal Special Issue on Crowdsourcing, 2012.
  3. G. Kazai, J. Kamps, M. Koolen, N. Milic-Frayling: Crowdsourcing for Book Search Evaluation: Impact of Quality on Comparative System Ranking. SIGIR 2011.
  4. G. Kazai: In Search of Quality in Crowdsourcing for Search Engine Evaluation. ECIR 2011: 165-176.
  5. G. Kazai, P. Manghi, K. Iatropoulou, T. Haughton, M. Mikulicic, A. Lempesis, N. Milic-Frayling, N. Manola: Architecture for a Collaborative Research Environment Based on Reading List Sharing. ECDL 2010: 294-306.
  6. G. Oleksik, M. Wilson, C. Tashman, E. Mendes Rodrigues, G. Kazai, N. Milic-Frayling, R. Jones: Lightweight Tagging Expands Information and Activity Management Practices. CHI 2009.
  7. Chung Tong Lee, E. Mendes Rodrigues, G. Kazai, N. Milic-Frayling, A. Ignjatovic: Model for Voter Scoring and Best Answer Selection in Community Q&A Services. WI 2009.
  8. G. Kazai, N. Milic-Frayling, J. Costello: Towards methods for the collective gathering and quality control of relevance assessments. SIGIR 2009.
  9. G. Kazai, N. Milic-Frayling: Effects of Social Approval Votes on Search Performance. 6th International Conf. on Information Technology: New Generations (ITNG’09), Social Computing Track, 2009.
  10. M. Koolen, G. Kazai, N. Craswell: Wikipedia Pages as Entry Points for Book Search. WSDM 2009.
  11. Gabriella Kazai, Natasa Milic-Frayling: Trust, authority and popularity in social information retrieval. CIKM 2008: 1503–1504. Best poster.
  12. S. Ali, M. Consens, G. Kazai, M. Lalmas: Structural relevance: A common basis for the evaluation of structured document retrieval. CIKM 2008. Best paper runner up.
  13. G. Kazai, B. Piwowarski, S. Robertson: Effort-precision and gain-recall based on a probabilistic navigation model. In Studies in Theory of Information Retrieval (Proceedings of ICTIR 2007), pp. 23–36, Foundation for Information Society, Budapest, 2007.
  14. G. Kazai, M. Lalmas: eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst., vol. 24, no. 4, pp. 503-542, 2006.
  15. G. Kazai, M. Lalmas, A.P. de Vries: The overlap problem in content-oriented XML retrieval evaluation. SIGIR 2004.
More publications at DBLP and on Google Scholar profiles.