Tags:

solr

How to change Solr standard tokenizer rules?

Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

How to add data to Solr document during indexing?

The process of indexing in Solr in an advanced topic covered by many publications. On the most basic level it can be described as putting data into previously prepared containers. But what if user wants to perform additional data processing depending on documents that already are in the index?

Compression modes in Solr 5.5

The ideal situation is when whole index can be located in memory, due to disk operations are much slower then those in RAM. What’s more, often companies have to fit the requirenments of the tender or reduce server costs, which put pressure on developers to come up with a solution that will make the index smaller.

Solr startup scripts on Windows

As we all know from Solr website:

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene.

It’s all true but on the other hand Solr is not known from its ease-of-use. In the fifth version community decided to address one of the main issues making Solr more demanding for newcomers than competition, that is simplicity of startup.

windows

Solr startup scripts on Windows

As we all know from Solr website:

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene.

It’s all true but on the other hand Solr is not known from its ease-of-use. In the fifth version community decided to address one of the main issues making Solr more demanding for newcomers than competition, that is simplicity of startup.

script

Solr startup scripts on Windows

As we all know from Solr website:

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene.

It’s all true but on the other hand Solr is not known from its ease-of-use. In the fifth version community decided to address one of the main issues making Solr more demanding for newcomers than competition, that is simplicity of startup.

visualvm

jmx

compression

Compression modes in Solr 5.5

The ideal situation is when whole index can be located in memory, due to disk operations are much slower then those in RAM. What’s more, often companies have to fit the requirenments of the tender or reduce server costs, which put pressure on developers to come up with a solution that will make the index smaller.

indexing

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

How to add data to Solr document during indexing?

The process of indexing in Solr in an advanced topic covered by many publications. On the most basic level it can be described as putting data into previously prepared containers. But what if user wants to perform additional data processing depending on documents that already are in the index?

Compression modes in Solr 5.5

The ideal situation is when whole index can be located in memory, due to disk operations are much slower then those in RAM. What’s more, often companies have to fit the requirenments of the tender or reduce server costs, which put pressure on developers to come up with a solution that will make the index smaller.

codec

Compression modes in Solr 5.5

The ideal situation is when whole index can be located in memory, due to disk operations are much slower then those in RAM. What’s more, often companies have to fit the requirenments of the tender or reduce server costs, which put pressure on developers to come up with a solution that will make the index smaller.

server side request forgery

What is and how to defend against a server side request forgery?

Server side request forgery occurs when attacker enters one application and is able to use to it to perform some activity on another application(s). It can be scaning internal network, calling services or making request to another website - our case. Note, that a hacked application would be responsible for an attack - as it produces a call! - not hacker’s machine. More information can be found on numerous sites on the Internet, e.g. here.

security

What is and how to defend against a server side request forgery?

Server side request forgery occurs when attacker enters one application and is able to use to it to perform some activity on another application(s). It can be scaning internal network, calling services or making request to another website - our case. Note, that a hacked application would be responsible for an attack - as it produces a call! - not hacker’s machine. More information can be found on numerous sites on the Internet, e.g. here.

node.js

What is and how to defend against a server side request forgery?

Server side request forgery occurs when attacker enters one application and is able to use to it to perform some activity on another application(s). It can be scaning internal network, calling services or making request to another website - our case. Note, that a hacked application would be responsible for an attack - as it produces a call! - not hacker’s machine. More information can be found on numerous sites on the Internet, e.g. here.

express

What is and how to defend against a server side request forgery?

Server side request forgery occurs when attacker enters one application and is able to use to it to perform some activity on another application(s). It can be scaning internal network, calling services or making request to another website - our case. Note, that a hacked application would be responsible for an attack - as it produces a call! - not hacker’s machine. More information can be found on numerous sites on the Internet, e.g. here.

angularJS

What is and how to defend against a server side request forgery?

Server side request forgery occurs when attacker enters one application and is able to use to it to perform some activity on another application(s). It can be scaning internal network, calling services or making request to another website - our case. Note, that a hacked application would be responsible for an attack - as it produces a call! - not hacker’s machine. More information can be found on numerous sites on the Internet, e.g. here.

lombok

What is Project Lombok and why you should use it?

Java is well known for its necessity to write quite a lot of code to perform simple tasks: all this getter/setter methods handled nicely by the competitors, common Problem Factories, Calendar & Date or logging jumbo. As more languages with plain syntax arise, staying put with actual aproach seems to be a bit out-of-date. There are even some propositons to add JavaScript’s-like val folding to change things, but with Oracle’s lacking investment it is hard to believe that any changes appear in a finite time. On the other hand Java ecosystem is full of decent libraries that can fill this gap; one of this libraries is Project Lombok.

java

How to change Solr standard tokenizer rules?

Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

How to add data to Solr document during indexing?

The process of indexing in Solr in an advanced topic covered by many publications. On the most basic level it can be described as putting data into previously prepared containers. But what if user wants to perform additional data processing depending on documents that already are in the index?

What is Project Lombok and why you should use it?

Java is well known for its necessity to write quite a lot of code to perform simple tasks: all this getter/setter methods handled nicely by the competitors, common Problem Factories, Calendar & Date or logging jumbo. As more languages with plain syntax arise, staying put with actual aproach seems to be a bit out-of-date. There are even some propositons to add JavaScript’s-like val folding to change things, but with Oracle’s lacking investment it is hard to believe that any changes appear in a finite time. On the other hand Java ecosystem is full of decent libraries that can fill this gap; one of this libraries is Project Lombok.

clean code

What is Project Lombok and why you should use it?

Java is well known for its necessity to write quite a lot of code to perform simple tasks: all this getter/setter methods handled nicely by the competitors, common Problem Factories, Calendar & Date or logging jumbo. As more languages with plain syntax arise, staying put with actual aproach seems to be a bit out-of-date. There are even some propositons to add JavaScript’s-like val folding to change things, but with Oracle’s lacking investment it is hard to believe that any changes appear in a finite time. On the other hand Java ecosystem is full of decent libraries that can fill this gap; one of this libraries is Project Lombok.

java internals

What is Project Lombok and why you should use it?

Java is well known for its necessity to write quite a lot of code to perform simple tasks: all this getter/setter methods handled nicely by the competitors, common Problem Factories, Calendar & Date or logging jumbo. As more languages with plain syntax arise, staying put with actual aproach seems to be a bit out-of-date. There are even some propositons to add JavaScript’s-like val folding to change things, but with Oracle’s lacking investment it is hard to believe that any changes appear in a finite time. On the other hand Java ecosystem is full of decent libraries that can fill this gap; one of this libraries is Project Lombok.

asynchronous

spring

spring mvc

jmeter

performance

jetty

update request processor

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

How to add data to Solr document during indexing?

The process of indexing in Solr in an advanced topic covered by many publications. On the most basic level it can be described as putting data into previously prepared containers. But what if user wants to perform additional data processing depending on documents that already are in the index?

entity recognition

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

solr text tagger

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

fst

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

finite state transducer

Extract entities from document with Solr Text Tagger

Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.

lucene

How to change Solr standard tokenizer rules?

Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.

jflex

How to change Solr standard tokenizer rules?

Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.

tokenization

How to change Solr standard tokenizer rules?

Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.

payloads

How to change Solr standard tokenizer rules?

Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.

genetic algorithms

Genetic Ranker - genetic algorithms for search

Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!

search quality

Genetic Ranker - genetic algorithms for search

Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!

search

Genetic Ranker - genetic algorithms for search

Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!

python

Genetic Ranker - genetic algorithms for search

Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!

github

Genetic Ranker - genetic algorithms for search

Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!

open source

Genetic Ranker - genetic algorithms for search

Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!

string performance

java performance

memory

performance guide