Science needs organized information in a specific field with the purpose of generating new structured knowledge. Experts generate large amounts of specialized unstructured text or databases. On the way to organizing these documents, text mining has been used since its objective is to discover knowledge from a text corpus collected in a repository of knowledge classified that automatically generate inferences.
This project focuses on documents classification using text mining through a classification model generated by the open source software “WEKA”. This software is a repository of machine learning algorithms to discover knowledge. Weka easily preprocesses the training documents to compare different algorithms configurations. The exactitude in the generated predictive model will be measured based on a confusion matrix. This project will help to illustrate text mining preprocessing and classification using WEKA. The result will be the development of a tool to generate the input data files arff and of a video tutorial on documents classification in Weka in English and Spanish.
By Valeria Guevara