Como NÃO fazer pesquisas usando LIKE - Eventials · Como NÃO fazer pesquisas usando LIKE Fabio...

Preview:

Citation preview

Como NÃO fazer pesquisas usando LIKE

Fabio Akita @akitaonrails

CODE MINER

Search está em todos os lugares

SELECT * FROM PRODUCTS WHERE NAME LIKE '%Camisetas%' AND DESCRIPTION LIKE '%Camisetas%' AND NAME NOT LIKE '%Calças%' AND DESCRIPTION NOT LIKE '%Calças%'

Camisetas INDEX SEEK Rápido

Camisetas INDEX SEEK Rápido

Camisetas% INDEX SCAN Quase Rápido

Camisetas INDEX SEEK Rápido

Camisetas% INDEX SCAN Quase Rápido

%Camisetas% TABLE SCAN Indo pra trás

Índices não vão te ajudar

wp-includes/taxonomy.php (1256 até 1545)

WordPress

<?php function get_terms($taxonomies, $args = '') {   ...   if ( !empty($name__like) ) {     $name__like = like_escape( $name__like );     $where .= $wpdb->prepare( " AND t.name LIKE %s",       '%' . $name__like . '%' );   }      if ( ! empty( $description__like ) ) {     $description__like = like_escape( $description__like );     $where .= $wpdb->prepare( " AND tt.description LIKE %s",       '%' . $description__like . '%' );   }   ...      if ( ! empty( $search ) ) {     $search = like_escape( $search );     $where .= $wpdb->prepare( ' AND ((t.name LIKE %s) OR (t.slug LIKE %s))',       '%' . $search . '%', '%' . $search . '%' );   }   ... } ?>

AbstractHelper.php

Magento

<?php

public function getCILike($field, $value, $options = array()) {

  $quotedField = $this->_getReadAdapter()->quoteIdentifier($field);   return new \Zend_Db_Expr($quotedField . ' LIKE ' .     $this->addLikeEscape($value, $options)); }

?>

Rankeamento, Relevância

Rankeamento, Relevância

Frases, Proximidade, Intervalos

Rankeamento, Relevância

Frases, Proximidade, Intervalos

Sinônimos, "Stemmer"

Rankeamento, Relevância

Frases, Proximidade, Intervalos

Sinônimos, "Stemmer"

“More Like This"

Rankeamento, Relevância

Frases, Proximidade, Intervalos

Sinônimos, "Stemmer"

“More Like This"

“Did you mean …?"

Rankeamento, Relevância

Frases, Proximidade, Intervalos

Sinônimos, "Stemmer"

“More Like This"

“Did you mean …?"

Faceting (Terms, Geolocation, etc)

Pesquisa Não-Estruturada

Pesquisa Não-EstruturadaSugestões

Pesquisa Não-Estruturada

Ordenação

Sugestões

Pesquisa Não-Estruturada

Terms Facet

Ordenação

Sugestões

Pesquisa Não-Estruturada

Terms Facet

Ordenação

Agregação

Sugestões

Pesquisa Não-Estruturada

Terms Facet

Ordenação

Paginação

Agregação

Sugestões

SELECT * FROM PRODUCTS WHERE MATCH (NAME, DESCRIPTION) AGAINST ('+Camisetas -Calças' IN BOOLEAN MODE)

CatalogSearch/Model/Resource/Helper.php

Magento

<?php

public function chooseFulltext($table, $alias, $select) {

  $field = new \Zend_Db_Expr(     'MATCH (' . $alias . '.data_index) AGAINST (:query IN BOOLEAN MODE)');

  $select->columns(array('relevance' => $field));   return $field; }

?>

SELECT * FROM PRODUCTS WHERE CONTAINS( (NAME, DESCRIPTION), 'Camisetas AND NOT Calças')

SELECT * FROM PRODUCTS WHERE TO_TSVECTOR(NAME || '' || DESCRIPTION) @@ TO_TSQUERY('Camisetas &! Calças')

Cadeias de Markov

Cadeias de Markov

Índices Invertidos

Cadeias de Markov

Índices Invertidos

Vector Space Model

Cadeias de Markov

Índices Invertidos

Vector Space Model

Okapi BM25

http://u.akita.ws/vsm_example (Exemplo Simplificado)

Vector Space Model

d1 “new york times"

d2 “new york post"

d3 “los angeles times"

angeles log2(3/1)=1.584

los log2(3/1)=1.584

new log2(3/2)=0.584

post log2(3/1)=1.584

times log2(3/2)=0.584

york log2(3/2)=0.584

angeles los new post times york

d1 0 0 1 0 1 1

d2 0 0 1 1 0 1

d3 1 1 0 0 1 0

angeles los new post times york

d1 0 0 0.584 0 0.584 0.584

d2 0 0 0.584 1.584 0 0.584

d3 1.584 1.584 0 0 0.584 0

angeles los new post times york

q 0 0 (2/2)*0.584= 0.584 0 (1/2)*0.584=

0.292 0

q = “new new times"

Distância d1 sqrt(0.584^2+0.584^2+0.584^2) 1.011

Distância d2 sqrt(0.584^2+1.584^2+0.584^2) 1.786

Distância d3 sqrt(1.584^2+1.584^2+0.584^2) 2.316

Distância q sqrt(0.584^2+0.292^2) 0.652

cosSim(d1,q) (0*0+0*0+0.584*0.584+0*0+0.584*0.292+0.584*0) / (1.011*0.652) 0.776

cosSim(d2,q) (0*0+0*0+0.584*0.584+1.584*0+0*0.292+0.584*0) / (1.786*0.652) 0.292

cosSim(d3,q) (1.584*0+1.584*0+0*0.584+0*0+0.584*0.292+0*0) / (2.316*0.652) 0.112

Douglass Cutting

LuceneNutch

Hadoop!

TikaSolr

ElasticSearch

Apache Lucene

150GB/hora 20%-30% tamanho do índice

HTML, XHTML, OOXML, ODF, XML, RSS, OLE2, iWorks (Pages, Numbers, Keynote), PDF, EPUB, RTF, Commons Compress (ar, cpio, Unix dump, tar, zip, gzip, XZ, Pack200, bzip2, 7z, arj e lzma), Audio (javax.sound, MIDI, Mp3), Image (javax.imageio, Tiff, Jpeg), Video (FLV, Flash), Mail (Mbox, RFC822), DWG, Font (TrueType), HDF, e plugins.

InputStream is = new BufferedInputStream( new FileInputStream( new File("sample.pdf"))); !Parser parser = new AutoDetectParser(); ContentHandler handler = new BodyContentHandler( System.out); !Metadata metadata = new Metadata(); !parser.parse(is, handler, metadata, new ParseContext()); !for (String name : metadata.names()) { String value = metadata.get(name); ! if (value != null) { System.out.println("Metadata Name: " + name); System.out.println("Metadata Value: " + value); } }

http://localhost:8983/solr/query?q=title:black

http://localhost:8983/solr/query? q=*:* &fl=id,title,series_s,pubyear_i &sort=pubyear_i desc &group=true &group.main=true &group.field=series_s &facet=true &facet.field=cat

curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultField=text” --data-binary @tutorial.html -H 'Content-type:text/html'

Solr ElasticSearch

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Automatic Shard Rebalancing Não Sim

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Automatic Shard Rebalancing Não Sim

Schema + / - Sim

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Automatic Shard Rebalancing Não Sim

Schema + / - Sim

Nested Typing Não Sim

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Automatic Shard Rebalancing Não Sim

Schema + / - Sim

Nested Typing Não Sim

Queries Key / Value JSON

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Automatic Shard Rebalancing Não Sim

Schema + / - Sim

Nested Typing Não Sim

Queries Key / Value JSON

Distributed Group By Sim Não

Solr ElasticSearch

Coordination ZooKeeper Zen Discovery

Shard Splitting Sim Não

Automatic Shard Rebalancing Não Sim

Schema + / - Sim

Nested Typing Não Sim

Queries Key / Value JSON

Distributed Group By Sim Não

Percolation Queries Não Sim

Setup

cd ~ sudo apt-get update

sudo apt-get install openjdk-7-jre-headless -y

### http://www.elasticsearch.org/download/

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/

elasticsearch-0.90.7.deb

sudo dpkg -i elasticsearch-0.90.7.deb sudo service elasticsearch start

Setup

# Bonsai heroku addons:add bonsai heroku config:add ELASTICSEARCH_URL=`heroku config:get BONSAI_URL` !# Found heroku addons:add foundelasticsearch heroku config:add ELASTICSEARCH_URL=`heroku config:get FOUNDELASTICSEARCH_URL` !# SearchBox heroku addons:add searchbox:starter heroku config:add ELASTICSEARCH_URL=`heroku config:get SEARCHBOX_URL` !# reindex heroku run rake searchkick:reindex CLASS=Product

Setup

# Gemfile - bundle install gem "searchkick" !# app/models/product.rb class Product < ActiveRecord::Base searchkick end !# config/initializers/elasticsearch.rb ENV["ELASTICSEARCH_URL"] = "http://username:password@api.searchbox.io" !# no shell rails r "Product.reindex"

# Search simples products = Product.search "Camisetas" products.each do |product| puts product.name end

# Search simples products = Product.search "Camisetas" products.each do |product| puts product.name end

# Search com campos Product.search "Camisetas", fields: [:name, :description] where: { in_stock: true, expires_at: {gt: 1.week.from_now}, or: [ [{in_stock: true}, {backordered: true}] ] }, order: {_score: :desc}, # relevant first limit: 10, offset: 50 # , page: params[:page], per_page: 20

# Sinonimos class Product < ActiveRecord::Base searchkick synonyms: [ ["pc", "computador pessoal"], ["word", "microsoft office"] ] end

# Sinonimos class Product < ActiveRecord::Base searchkick synonyms: [ ["pc", "computador pessoal"], ["word", "microsoft office"] ] end

# Sugestões class Product < ActiveRecord::Base searchkick suggest: ["name"] end !products = Product.search "coldminer ", suggest: true products.suggestions # ["codeminer"]

class City < ActiveRecord::Base searchkick autocomplete: ["name"] end !City.search "Sao P", autocomplete: true

# app/controllers/cities_controller.rb class CitiesController < ApplicationController def autocomplete render json: City.search(params[:query], autocomplete: true, limit: 10).map(&:name) end end

# app/controllers/cities_controller.rb class CitiesController < ApplicationController def autocomplete render json: City.search(params[:query], autocomplete: true, limit: 10).map(&:name) end end

# partial <input type="text" id="query" name="query" /> !<script src="jquery.js"></script> <script src="typeahead.js"></script> <script> $("#query").typeahead({ name: "city", remote: "/cities/autocomplete?query=%QUERY" }); </script>

products = Product.search "GPS", facets: [:type, :brand, :screen_size] puts products.facets

class City < ActiveRecord::Base searchkick locations: ["location"] ! def search_data attributes.merge location: [latitude, longitude] end end !City.search "Codemi", where: { location: {near: [-23, -46], within: "10mi" } } # ou 16km

Próximos Capítulos

SELECT … LIKE ‘%'

SELECT … LIKE ‘%'

OBRIGADO!slideshare.net/akitaonrailscodeminer42.com@akitaonrails