Proposta de dissertação do MEI
Título: Benchmark and Tuning of Data Science Framework over Spark (Feedzai)
Proponente(s): Pedro Bizarro (feedzai.com)
Vitor Duarte
Créditos: 42 ECTS
Área científica: Computer Systems and Networks
Início preferencial: Qualquer semestre
URL:
Já estão em curso trabalhos preliminares executados pelo alunos:
Breve descrição: In the field of data science applied to fraud detection, operating big data datasets that span to several terabytes of size is a normal day of work. Feedzai has a Data Science Framework that integrates with Spark in order to process large volumes of data in a large number of cluster nodes.
The main objective of this internship is to learn the most common set of workloads used in the Data Science Frameworks and perform benchmarks over the Cluster in order to obtain the best set of Data Science Framework and Spark that can minimize the time that jobs take to complete. The student will tackle problems such as minimizing network I/O, garbage collection and CPU bottlenecks.
Observações: Tema proposto pela empresa FeedZai SA e a realizar na empresa.