Monday, 26 October 2015


A faster / cheaper / accurate ,  alternative to


 A Proposal submitted to

  *  Shri TCA Anant ,
      Chief Statistician ,
      Ministry of Statistics and Programme Implementation ,
      Government of India ,
Suggested By :

hemen  parekh
mumbai / (M) 0 - 98,67,55,08,08
27  Oct  2015

I have a database of over 5 million job advts , downloaded over the past 6 / 7 years from various job portals of India

Each job advt database consists of :

Ø  Advt ID

Ø  Designation ( being advertised )

Ø  Company Name ( Advertiser )

Ø  Job Description

Ø  Desired Profile

Ø  Compensation Offered

Ø  Experience ( desired ) – Years

Ø  Industry Type

Ø  Education Quali ( Min )

Ø  Location ( Posting City )

Ø  Keywords

Ø  Advt Posting  Date

Ø  Expiry Date

Some years back , ( when our website , , was up and running ) , we had developed a feature to analyze this database and display the findings visually , in different ways

We were displaying PIE-CHARTS of :

Ø  Industry-wise Jobs

Ø  City-wise Jobs

You will observe that , with a much larger database available now , it is possible to analyze / display the “ No of Jobs “ , in many more ways

Not only that , it should be possible to analyze this huge database to predict the future expected PATTERN of the occurrence of jobs , in many different ways !

At any given time , the number of jobs getting advertised , is an important Economic Indicator

If economy is booming and company Order Books are getting fatter , then more jobs will get advertized – and vice-versa

Hence , a time-series analysis of the no of new jobs getting posted on job portals , has a  straight line relationship with the state of the economy ( a high co-efficient of correlation )

Apart from that , can a Data mining of 5 million jobs , answer ( even partially ) , the following questions ?

Ø  Who ( which Companies ) are advertizing and when ?

Ø  What jobs / vacancies / positions are being advertized ?

Ø  What is the frequency with which a particular job gets advertized ? By entire industry ? By a given Company ?

Ø  Which regions / cities have max / min no of new jobs ?

Ø  What are regional disparities due to ?

Ø  Which Industries are advertising most – creating most jobs ?

Ø  What Edu Qualifications are in max demand ?

Ø  What kind of jobs demand what kind of Edu Qualifications ?

Ø  What is the level of co-relation between , Position and the years of Experience demanded ?

Ø  For identical positions being advertized , how much do “ Job Descriptions / Desired Profiles “ differ, from company to company ?

Ø  Are there significant differences in the “ No of years of Experience “ being demanded , for identical positions ?

Ø  What is the probability of finding the “ Keywords “ in “ Job Description / Desired Profile “ ?

Ø  What is the extent of duplication ( redundancy ? ) between , “ Job Description “ and “ Desired Profile “ ?

Ø  What percentage of Advts fail to make any mention of , Compensation Offered ?

Ø  When a company posts an advt for same / identical position , at different points of time , are there any differences in values ( fields ) ?

Ø  From an analysis of all the advts posted by a given Company ( over past 7 years ) , can any conclusion be reached as to the changing nature of that company’s business (by co-relating the “ Skills related Keywords “)?

Ø  Can the algorithm predict what job a company will advertize next – and when ?

Ø  Is there any correlation between , “ Designation / Position “ and the “ Keywords “ ?

Ø  From analyzing this huge data , can software auto-generate , a complete / editable job advt , as soon as a Recruiter simply types the “ Designation / Position “ ?

I believe , so far , no one has undertaken such a Data mining project

If carried out diligently , I am sure , the outcome would be of immense benefit to :

Ø HR Managers

      for Manpower Planning / Compensation Planning

Ø Recruiting Managers

      for framing Man Specifications / Job Description Manuals

Ø Educationists

      for deciding what Edu Quali are in demand and tailor the Courses

Ø Students

      to figure out what “ Skills “ are in demand by Industry and prepare

Ø Planning Commission ( NITI Aayog )

      for allocating Resources to States / Regions , based on imbalances

Ø  HRD Ministry

       For long term Macro-Planning in respect of Education

Ø National Skills Development Commission

      for chalking out Skills Development Programs in collaboration with Companies / Industries

If undertaken – and executed seriously – then this Data mining project has the potential to place

Ministry of Statistics and Programme Implementation ,

 on the Centre-Stage of National Education Planning Scenario

What can / will such a project yield ?

Without exaggerating , it would be safe to assume that , this vast database of job advts would contain :

Ø  50 million phrases / sentences

Ø  500 million words

Obviously , each word / phrase / sentence , is nothing more than a

Database of Intentions “ of the Employer Companies

( to borrow from John Battelle’s well-researched book about Google )

Our goal shall be to make this ( Data mining Algorithm ) a dynamic / continuous “ Process “ , so that , we can measure the changing nature of these “ Intentions “ , over a long , long period

And we must enable a “ Researching Visitor ( of web site ) “, to benefit from these trends / patterns

Even though 5 million job advts may contain 500 million “ words “ , these are not Unique


Most of these are used again and again , hundreds or thousands of times


Thru data mining , it is not difficult to compute their “ Frequency of Usage “


And then , these frequencies can be graphically plotted against any particular time-period


Such Graphical Representations can be further broken up by ,


Ø  City Names


Ø  Company Names


Ø  Industry Names


Ø  Function Names


Ø  Designations ( Vacancy Names ).. etc


And such graphical analysis can be done , not only for “ Keywords “ but even for “ Key Phrases “ and “ Sentences “ !

Take a look at this project paper ( NOT ENCLOSED )

It is all about data mining of some 150 million records ( location points ) and about uncovering “ trends / patterns “ of physical movements of 300 human volunteers , over a “  period of time  “

I quote from article in Times of India ( 19 July 2013 ) :

“ ..the first system of its kind to predict long term human mobility in a unified way , parse the data. " Far Out " does not need to be told exactly what to look for  --- it automatically discovered regularities in the data “

“ Do you know precisely where you’ll be 285 days from now at 2 pm ?

Researchers have developed a new tracking software that can tell you exactly where you will be on a precise time and date , years into the future “

What we want to do with 5 million job advts database , is quite similar, viz ;
 predict ,

 WHO     ( which Company / Industry ) , will advertize

 WHAT   ( vacancies / positions / designations ),  and

 WHEN   ( time )

I am talking about developing an “ Expert System “ , thru discovery of specific “ Co-relations “ amongst various Data Fields of 5 million job advts

Eg :

Ø  What is the Co-relation between , any given

Ø  Designation / Vacancy-Name / Advertized Position ,


Ø   Educational Qualifications  ?

Here are some examples :

Ø  Any designation  such as “ Production Manager “ would call for an “ Engineering Degree / Diploma “ ( but never a CS / CA )

Ø  Any designation in “ Finance Function “ will require,
·       B Com
·       M Com
·       CA    etc
       But never a BE(M ) / BE (Chem )

Ø  Any designation at Manager level will call for a minimum experience of 5 years ( but never a Fresh Graduate with NIL experience )

Ø  MBA / BBA / MMS etc are the most preferred Edu Qualifications for positions in Marketing

Ø  No vacancy in an Automobile Manufacturing Company , will call for a degree in Pharmaceutical

Ø  No Electrical Machinery Manufacturing company will ever demand a Medical Degree (MBBS )

To a human mind , these ( rules ) are so obvious !

But , no human mind can write-down ALL of such RULES , in 2 minutes ! – something that your Data mining Software can – and will – do in 5 seconds !

All that you need , after computing “ Frequencies of Occurrences “ , is to :

Ø  Plot the Co-efficients of Co-relations between various Fields ( of job advts )

Ø  Compute Probabilities for each and create hundreds of Probability Tables

And , since a thousand new job advts are getting added to our Job Advt Database , daily , the SAMPLE SIZE is perpetually increasing – thereby , increasing the Accuracies of your Predictions !

Having done this , imagine the following scenario :

Recruitment Officer of Wipro , comes to our “ Post Job “ page and , in the field for “ Designation “ simply types ,

“ Business Analyst “

And Presto !

The entire Job Advt Form gets auto-filled , with MOST PROBABLE values !

Would not that amaze her ?

All that our software has done is analyzed job advts of all “ Software Companies “ ( an Industry ),– and of WIPRO – for the position of Business Analyst and filled in the most probable values

This is no rocket science !

We had actually , partially attempted it – albeit in a crude way – in our earlier web site ,

What surprises me is , how come no one has attempted this so far !

Especially , Naukri / TimesJobs / MonsterIndia , who have accumulated millions of job advts !

Anyway , the fact that they have , so far , ignored this  Line of Examination , will work to the advantage of

Ministry of Statistics and Programme Implementation

 – making YOU the very first person in the entire world to come up with a PREDICTION MODEL in the area of JOBS

However , without applying some simple data mining tool , it would not be possible to answer the following questions :

Where is the greatest decline of jobs being advertized ?

How much is the percentage decline ?

Ø  In which Industry ?

Ø  In which Company ?

Ø  In which City ?

Ø  In which Region ?

Ø  In which Skills ?

Ø  For which Positions ?

Ø  For which Education Levels ? ………… etc

With a data mining tool , such individual graphs could emerge ( within fraction of a second ) at the click of a button !

One could even co-relate these graphs with other ,
 publicly available statistical data such as :

Ø IIP   ( Index of Industrial Production )

Ø Stock Market Index

Ø Currency Exchange Rate ( eg; declining Rupee )

Ø Decline in GDP / Increasing Fiscal Deficit

Ø CAD ( Current Account Deficit )

Ø Foreign Investments

Ø  Primary Bank Rates of RBI…………………………….etc

With proper co-relations , one could even predict how much the job market will further shrink , over the next 6 months ! or grow ?

Such” Predictive Model of Job Market “, would be of immense interest to , not only the economists but also to the HRD Ministry / Planning Commission / Educational Institutions and of course the students themselves


Make    Yourself    Heard

Dear Visitor :

It is time for YOU to speak up - and demand that YOU are heard

By emailing this suggestion - incorporating your OWN improvements - to the following Policy Makers

(  Just multiple copy all the following Email IDs into the Recipient column of your Outlook and Copy / paste this suggestion in the Message Box ) :;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;   

Sunday, 25 October 2015


Hon PM , Shri Modiji has announced that , starting 01 Jan 2016 , there will be no interviews for recruitment in B / C / D categories of employees in Central Government

You may like to call this :

A / B / C / D of Interviewing  !  or , Non-Interviewing  ?

Henceforth , recruitment would be made only on the basis of marks scored by candidates in written exams , conducted by ,


Believe me , there will be 2589  applicants scoring 94 % mark !

In which case , who gets that job ?

>  Candidates with surnames starting with  " A / B / C " ?

>  Candidates belonging to OBC / SC / ST quota ?

>  State-wise / Population-wise Quota ?

>  Candidates having Ph D / Master degree ?

Last year , SSC recruited about 55,000 employees from 17 million applicants ( 309 applicants for each vacancy )

Now take a look at the following news ( DNA / 11 Sept 2015 ) :

Against 368 vacancies of " Peons " advertized by UP Govt , applications received were as follows :

*  Total Number of applicants.............. 20 lakh
    {  5,435 applicants for each post  ! }

*  No of PhDs....................................  201

*  No of Master Degree holders............ 20,056

*  No of Bachelor Degree holders.......... 1,23,000

*  No of 12th Standard pass................. 6,00,000

*  No of 10th Standard pass................. 8,20,000

   { Minimum prescribed Edu Qualification = 10th Standard }

I suppose the Edu Profile ( and Min Edu Quali ) of those applying for Central Govt jobs in B / C / D categories , is no different

If so , those applicants holding Ph D / MA , qualifications , can be expected to" Top the lists "with high marks! And corner all jobs ?

Will that lead to ,

>  Getting higher Educational Qualifications ?

>  Getting higher marks in 10th/12th Std for admissions to
    Colleges  ?

>  Bribing college examiners ?

>  Huge amount of frustration/resentment among those Ph Ds for
    taking up a job below their skills ?


What is the rationale behind the decision to do away with the interviewing ? HT ( 26 Oct 2015 ) quotes :

*  Shri Modiji :

    I have never heard of a psychologist who can evaluate a person during an interview of one to two minutes

* Shri Shekhar Singh

   Interview did not serve a purpose for junior posts, particularly when the government had never defined the attributes of a civil servant for a particular job

No doubt , these are very valid points , which " No Interview " decision seems to answer in short term

But the long term answer is :

*  Preliminary elimination based on 10th/12th Std marks

*  Define those " attributes " for each job

*  Design  ONLINE " Psychometric Tests " around those

*  Cut-off percentile based on these  ONLINE  tests

*  Distributing Successful candidates for personal ONLINE
     Interviewing by hiring Government Departments , using a
     portal called,

     This  ONLINE  interviewing can also be Out-Sourced to
     reputable private sector Recruitment Companies
     ( Public / Private Collaboration ? )

 *  Skype based  ONLINE  Video Interviewing , thru

*  An exhaustive/objective " ASSESSMENT SHEET " ( with 5 point
    weighted average scale on 10 attributes ) , to be filled up
    ONLINE , in the portal by the interviewers, with final
    recommendations on the suitability of interviewed candidates

    This will be transparently available for viewing by ALL citizens
    No need for RTI application !

When implemented , this would become , one of the finest example of e-Governance thru DIGITAL INDIA

And , not at all difficult for India - the Software Superpower , if there is a Political Will

TCS / INFOSYS / WIPRO would do this in 3 months  !


hemen  parekh

26  Oct  2015



Make    Yourself    Heard

Dear Visitor :

It is time for YOU to speak up - and demand that YOU are heard

By emailing this suggestion - incorporating your OWN improvements - to the following Policy Makers

(  Just multiple copy all the following Email IDs into the Recipient column of your Outlook and Copy / paste this suggestion in the Message Box ) :;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;