ANÁLISE MULTIVARIADA aula 01 · Apresentação • Análise Multivariada (Quimiometria 1) •...

Preview:

Citation preview

Multivariate Analysis

Prof. Dr. Anselmo E de Oliveira

anselmo.quimica.ufg.br

anselmo.disciplinas@gmail.com

Introduction

Apresentação

• Análise Multivariada (Quimiometria 1)

• Aulas Teóricas e Práticas – Salas 304 e 104

• Horário – 5as, 14:00 às 17:50 h

• Minha sala: 209, IQ-1

• Softwares – Matlab, Octave, Planilhas

eletrônicas, R,...

• E-mail

– Subject: [Q1] ...

– anselmo.disciplinas@gmail.com

• Página do curso

– anselmo.quimica.ufg.br Quimiometria 1

– Material didático

– Plano de Ensino

• Ementa

• Datas das avaliações

• Bibliografia

Apresentação

Avaliação: Trabalho

• Escrita de um texto acadêmico e científico sobre a aplicação do conteúdo do curso

• Contido no trabalho de pós-graduação ou em artigo científico publicado a partir de 2011 (QUALIS A1... B4)

• Entrega no último dia de aula (26/11)

Apresentação

• Quem são vocês?

– Nome

– Projeto em desenvolvimento

– Formação

• O que vocês esperam do curso?

Chemometrics

Chemometrics is not a single tool but a range of methods including – Basic Statistics, Signal Processing, Factorial Design, Calibration, Curve Fitting, Factor Analysis, Detection, Pattern Recognition and Neural Networks.

Fonte: http://www.decisioncraft.com/dmdirect/chemometrics.htm

Chemometrics

Statistics

Food Feed

Chemistry

Pharmacy

Engineering

Computing

Exploratory Data Analysis

• Exploratory data analysis can reveal hidden patterns in complex data by reducing the information to a more comprehensible form

Exploratory Data Analysis

• It can expose possible outliers and indicate whether there are patterns or trends in the data.

Exploratory Data Analysis

• Exploratory algorithms such as principal component analysis (PCA) are designed to reduce large complex data sets into a series of optimized and interpretable size.

Regression Analysis

• The goal of chemometric regression analysis is to develop a model which correlates the information in the set of known measurements to the desired property

Godinho et al. Talanta 2014, 129, 143

Regression Analysis

• Chemometric algorithms for performing regression include partial least squares (PLS) and principal component regression (PCR).

• Chemometric regression is extensively used in making decisions relating to product quality in the on-line monitoring and process control industry where fast and expensive systems are needed to test.

Classification Model • It is used to predict a sample's class by comparing the sample

to a previously analyzed experience set, in which categories are already known

Hyperspectral Imaging and Chemometric Modeling of Echinacea — A Novel Approach in the Quality Control of Herbal Medicines Sandassi et al. Molecules 2014, 19(9), 13104-13121

Classification Model • k-nearest neighbour (k-NN) is primary used in Chemometrics.

– This can be thought as separating chromatographic data set from spectroscopic data set and doing analysis.

Classification Model • When these techniques are used to create a classification

model, the answers provided are more reliable and include the ability to reveal unusual samples in the data.

• Therefore, Chemometrics helps in standardizing data.

The Analytical Process

Exploratory data analysis

Data mining

Calibration/resolution

Information/control theory

optimization

Experimental design

Sampling theory

Luck

Information: chemical concentrations...

Measurements: voltages, currents, volumes...

Samples

System

Knowledge of properties of system

Fonte: M.A. Sharaf; D.L. Illman; B.R. Kowalski, Chemical Analysis: Chemometrics

Challenges with Data Analytics

Aggregating data from multiple sources

Cleaning data

Choosing a model

Moving to production

Multivariate Data

automated analysis large amount of data

chromatography and spectroscopy methods

one sample/analyte at a time many samples/analytes

many variables are measured on each run

Multivariate Data

Univariate x Multivariate Data

Univariate data

• Control of product humidity during a month

• Lunch time of an employee

• Calibration curve:

peak intensity x analyte concentration

Multivariate data

• Datasheet including all quality control data

• Historical information regarding employees’ productivity

• Multivariate calibration: spectrum x analyte concentration

Univariate x Multivariate Data

V1

V2

V1

V2

V1

V3 covariance

Pattern Recognition

• Artificial Inteligence

• 2D and 3D pattern recognition:

– human being x computer?

– Pattern recongnition on measurement data tables/sheets containing lots of numbers and a huge amount of sample information.

Pattern Recognition

• Handwritten

Pattern Recognition

• Printed alphanumeric characters

Pattern Recognition

• Speech recognition

Pattern Recognition

• Speker recognition

Pattern Recognition

• Fingerprint identification

Pattern Recognition

• Radars

An airborne image of an A-3 flight prior to automated motion compensation, image centering, and overlay fitting (a) and the image after automated processing (b)

Pattern Recognition

• Electrocardiogram

Pattern Recognition

• Weather Forecasting

Pattern Recognition

• Stock Market

Pattern Recognition

• 217 males • diagnosed with bipolar disorder, major

depressive disorder, schizophrenia and other psychiatric issues.

• about 20 percent went from no suicidal thoughts to a high level of suicidal thoughts while they were being seen at a clinic at the university.

• blood samples • RNA biomarkers that appeared to predict

suicidal thinking. • it's unclear how well the biomarkers would

work in the larger population due to the fact that the study was limited to high-risk males with psychiatric diagnoses, but that the app is ready to be deployed and tested on a wider group in real-world settings such as emergency rooms.

Pattern Recognition

• Preprocessing techniques are designed to transform the data into the most informative representation in the context of the goal study

Espectros FT-NIR de 268 amostras de óleos minerais: (A) sem pré-tratamento; (B) após correção de linha base; (C) após correção de espalhamento

múltiplo; (D) após alisamento Savitzky–Golay .

Pattern Recognition

• Unsupervised learning refers to methods that make no a priori assumptions about cathegory-membership of the samples, but rather assist the analyst in unconvering intrinsic clusters or other patterns in the data

Pattern Recognition

• In supervised/machine learning the computer “learns” to optimally classify the samples based on advance knowledge about their category membership.

Pattern Recognition

clustering

Recommended