Skip to main content

Posts

Showing posts from July, 2017

Parser Generator

If you have a need for a parser generator for your applications say a SQL parser or an expression parser to plugin somewhere in your application, your obvious choices are ANTLR or JavaCC. ANTLR  (ANother Tool for Language Recognition) is a parser generator developed by Terence Parr, it is extremely versatile and is capable of generating parsers in multiple programming languages, but requires a run time library. The Definitive ANTLR 4 Reference is an excellent book to get into ANTLR. JavaCC  (Java Compiler Compiler) was written originally by Dr. Sriram Sankar and Sreenivasa Viswanadha. It is only capable of generating parsers in Java but doesn’t require a runtime library and the parsers it generates are very performant for an LL(k) parsers. Useful References: 1.   https://dzone.com/articles/antlr-and-javacc-parser-generators 2.  Work on JavaCC

Searching for Search

Search ” Information retrieval  is the activity of obtaining information resources (in the form of documents) relevant to an information need from a collection of information resources . Searches can be based on metadata or on full-text (or other content-based) indexing ”   - W ikipedia Text Search is a critical part of any company's success and is more so in the Information Governance arena. Indexing hundreds of billions of documents, and dealing with complex, millions of queries day-in and day-out is the order of the day for anyone in Compliance or Legal discovery teams. The ability to get to the documents in need in a lightning speed is very essential for effective governance. Large scale search engine deployments based on Lucene or its off-shoots SOLR/ElasticSearch is on the rise. The index built by these API or frameworks are  "Inverted Indexes" - the most obvious example is the page index on the back pages of a book where every word listed gets associ...