Formal characterization of the Information Retrieval (IR) problem. IR models: Boolean, Vector and Probabilistic; Fuzzy Set, Extended Boolean, Generalized Vector Space models. Retrieval evaluation. Query languages. Pattern Matching. Query Protocols. Query operations: User Relevance Feedback, Query Expansion and Term Reweighting. Standards of document representation and metadata. Document indexing. Semantic Web principles.
R. Baeza-Yates, B. Ribeiro-Neto, "Modern Information Retrieval" Addison Wesley
- Knowledge and understanding:
This course aims to provide the basic knowledge about Information Retrieval models. Moreover the principles and standards of the Semantic Web will be introduced.
- Applying knowledge and understanding: Particular attention is given to applying knowledge to the techniques for information discovery on the Web and to assigning a relevance value of a document with respect to a query, search engine creation, information gathering and indexing techniques, document standards.
Via de' Barucci 20 50127 Firenze, Italy
Tel.: +39 055 4399665
Fax: +39 055 4399605
Type of Assessment
Written test. Intermediate tests are not foreseen. The written test aims to verify the acquired knowledge on the IR model able to assign a relevance value with respect to a query, as well as to the assessment of the IR models. The exam grade is the result of the evaluation of the exercises included in the test.
Motivations. Basic concepts. The Information Retrieval process.
Information Retrieval (IR) models. Formal characterization of the Information Retrieval (IR) problem. IR models: Boolean Model, Vector Model, Probabilistic Model. Modelli Fuzzy Set, Extended Boolean, Generalized Vector Space.
Precision and Recall. Alternative measures.
Keyword-Based Querying, Single-Word Queries, Context Queries, Boolean Queries, Natural Language Query. Pattern Matching. Structural Queries. Query Protocols.
User Relevance Feedback. Query Expansion and Term Reweighting for the Vector Model. Term Reweighting for the Probabilistic Model. Automatic Local Analysis: Query Expansion Through Local Clustering. Query Expansion Through Local Context Analysis. Automatic Global Analysis: Query Expansion based on a Thesaurus of keywords, Query Expansion based on a statistic Thesaurus.
Standards for content representation
Metadata. Texts: formats, information theory, natural language modeling and processing. Markup languages: SGML, HTML, XML.