Defesa de mestrado - 31/08 - 10h - Felipe Santiago Martins Coimbra de Melo

O Colegiado do Programa de Pós-Graduação em Ciência da Computação da Universidade Federal de Ouro Preto tem a satisfação de convidar V. Sa. para participar da defesa de dissertação de mestrado intitulada "s-WIM - A SCALABLE WEB INFORMATION MINING TOOL", a ser defendida pelo aluno Felipe Santiago Martins Coimbra de Melo, dia 31 de agosto de 2012, as 10h, na sala de multimídia do DECOM-UFOP. A Banca será composto pelos seguintes membros:

Álvaro Pereira Jr. (Orientador)
Fabrício Benevenuto (DECOM - UFOP)
Joubert Lima (DECOM - UFOP)
Nívio Ziviani (DCC - UFMG)

RESUMO: Web mining can be seen as the process of discovering patterns from the Web by means of data mining techniques. Web mining is a computation-intensive task and most mining software is developed ad-hoc, which makes scalability and reusability difficult for other mining tasks. Web mining is an iterative process and prototyping plays an essential role in experimenting with different alternatives, as well as in incorporating knowledge acquired in previous iterations of the process.

Web Information Mining (WIM) is a model for fast Web mining prototyping. The main motivation behind WIM development was the fact that its conceptual model provides its users with a high level of abstraction, appropriate for prototyping and experimenting during the mining tasks.

WIM is composed by a data model and an algebra. The WIM data model is a relational view of Web data. The three types of existing Web data, namely Web content, Web structure and Web usage, are represented by relations. The main input components for the WIM data model are the Web pages, the hyperlink structure linking Web pages and the query logs obtained from Web users' navigation. WIM materializes a declarative programming language from its algebra. The WIM programming language is based on dataflows, where sequences of operations are applied to relations. The operations are defined by the WIM algebra, which contains operators for data manipulation and for data mining.

The objective of this work is the software design and development of the Scalable Web Information Mining (s-WIM), given the data model and the algebra originally presented by WIM. In order to provide s-WIM operators with the intended scalability capabilities - and consequently the programs generated by them - the s-WIM operators were developed on top of Apache's Hadoop and HBase, which provide linear scalability for both, data storage and processing, by the addition of hardware resources.

The main motivation for s-WIM development is the lack of a free platform offering both, the same high level of abstraction provided by the WIM algebra, and the scalability necessary for the operation on huge data volumes. Furthermore, the high level of abstraction provided by the WIM algebra allows users without expertise in programming languages such as Java or C++ to effectively use s-WIM.

The design and the architecture of s-WIM on top of Hadoop and HBase are presented in this work, as well as details on the implementation of the most complex s-WIM operators. This work also presents several experiments performed on s-WIM and their results, that ascertain s-WIM scalability, and consequently, its support for the mining of huge data volumes, including Web data sets.

MENU

Defesa de mestrado - 31/08 - 10h - Felipe Santiago Martins Coimbra de Melo