Skip to main content

Posts

Information Governance vs Data Governance

Information Governance vs. Data Governance: Key Differences While Information Governance (IG) and Data Governance (DG) are closely related, they focus on different aspects of managing organizational assets. Here’s a simple breakdown of their differences and how they work together: 1. Definitions Data Governance (DG): - Focuses on managing **data as an asset**. - Ensures data is accurate, consistent, secure, and available for use. - Example: Defining who can access customer data and how it’s stored. Information Governance (IG): - Broader than DG, focusing on managing **all forms of information** (structured data, unstructured data, documents, emails, etc.). - Ensures information is used effectively, ethically, and in compliance with regulations. - Example: Setting policies for retaining and disposing of emails and documents. 2. Scope Data Governance: - Primarily deals with **structured data** (e.g., databases, spreadsheets). - Focuses on...

The mainframe - where it all started!

It is estimated that there are 10,000 mainframes (computer dinosurs) that run many, many large scale applications in some of the top fortune 500 companies, banks and insurance companies in the world even today. Something that started may be 5 or 6 decades ago - it is still in existence and IBM is still the name to reckon with - in the world of mainframes. Hitachi, Fujitsu, Unisys - were some of the names that resonate with the word - mainframes. From Vaccum tubes, magnetic core memory, magnetic drum storage, tape drives and punched cards to today's mega machines - that can house upto 240 server-grade CPUs, 40TB RAM and Petabytes of flash storage - several decades of travel has been nothing short of a magical journey. From batch mode applications written in Cobol/PL 1 and ran through several JCLs (think Shell programs) - mainframes served the needs of many large scale applications. Late 80s/early 90s saw CICS/DB2 - a major revolution from traditionally batch mode applications backed...

Parser Generator

If you have a need for a parser generator for your applications say a SQL parser or an expression parser to plugin somewhere in your application, your obvious choices are ANTLR or JavaCC. ANTLR  (ANother Tool for Language Recognition) is a parser generator developed by Terence Parr, it is extremely versatile and is capable of generating parsers in multiple programming languages, but requires a run time library. The Definitive ANTLR 4 Reference is an excellent book to get into ANTLR. JavaCC  (Java Compiler Compiler) was written originally by Dr. Sriram Sankar and Sreenivasa Viswanadha. It is only capable of generating parsers in Java but doesn’t require a runtime library and the parsers it generates are very performant for an LL(k) parsers. Useful References: 1.   https://dzone.com/articles/antlr-and-javacc-parser-generators 2.  Work on JavaCC

Searching for Search

Search ” Information retrieval  is the activity of obtaining information resources (in the form of documents) relevant to an information need from a collection of information resources . Searches can be based on metadata or on full-text (or other content-based) indexing ”   - W ikipedia Text Search is a critical part of any company's success and is more so in the Information Governance arena. Indexing hundreds of billions of documents, and dealing with complex, millions of queries day-in and day-out is the order of the day for anyone in Compliance or Legal discovery teams. The ability to get to the documents in need in a lightning speed is very essential for effective governance. Large scale search engine deployments based on Lucene or its off-shoots SOLR/ElasticSearch is on the rise. The index built by these API or frameworks are  "Inverted Indexes" - the most obvious example is the page index on the back pages of a book where every word listed gets associ...