Moscow City Hack easy recommending system in Flask

This is the write-up on the hackathon Moscow City Hack that took place in Moscow, Russia from 11 to 14 of June. Our team consisted of members who are new to the concept of a Hackathon. Nevertheless, we had great time exchanging our ideas and coding which led us to finals where we achieved 10th place.

To begin with, our track focused on developing a prototype for Netology, Russia-based educational company. Most importantly they have asked for a system that would assess and validate skills of the users. In addition to recommending course, books and other valuable studying resources.

Description of a task in Russian

The key idea behind the project

The idea for project came from our product manager Alexey Volkov. He advocated the three pillars of education: learning, comprehension and practice. In a such system the goal for each user is to have a balance in three pillars.

Recommending System

We decided to build a custom scoring system to assess the candidate’s ability in each pillar.

Firstly we designed an output for our assessment and recommendation system:

Parser

Moreover, to get more understanding of the data to train our machine earning model we created a simple headhunter parser that accesses the in-built API to retrieve information based on a given search text and additional parameters.

Subsequently the libraries we need to use the parser:

import requests
import json
import time
import pandas as pd
import numpy as np
from tqdm import tqdm

Likewise, the following function that accesses the API and passes the chosen parameters: search text and the index of a page that we start searching from in the API:

    def getPage(self, text, page = 0):

        params = {
            'text': f'NAME:{text}',
            'area': 113,
            'page': page,
            'per_page': 100 
        }

        req = requests.get('https://api.hh.ru/vacancies', params) 
        self.data = req.content.decode()
        req.close()

The function that stores the data from the getPage function into json:

    def generate_pages(self, number=1):
        for page in tqdm(range(0, number)):
            js = json.loads(self.data)
            self.jsObj.append(js)
            if (js['pages'] - page) <= 1:
                break

After we obtain large json object we process the data from it and make further calls to the more granular data on the HeadHunter API. Besides, we create pandas dataframe where we finally store our data:

    def generate_vacancies(self, name ):

        IDs = [] 
        names = [] 
        snippet = []
        salary = [] 
        skills_name=[]

        for i in range(len(self.jsObj)):
            for j in tqdm(range(len(self.jsObj[i]['items']))):

                IDs.append(self.jsObj[i]['items'][j]['id'])
                names.append(self.jsObj[i]['items'][j]['name'])
                snippet.append(self.jsObj[i]['items'][j]['snippet']['requirement'])

                skills=str()

                req=requests.get(self.jsObj[i]['items'][j]['url'])
                data = req.content.decode()
                req.close()
                jsVac = json.loads(data)
                
                for skl in jsVac['key_skills']:
                    skills = skills + skl['name']+','
                skills_name.append(skills[:-1])
                
                try:
                    salary.append(self.jsObj[i]['items'][j]['salary']['from'])
                except:
                    salary.append(np.nan)

                time.sleep(0.25)
        pd.DataFrame({'ids':IDs,'names':names,'skills':skills_name,'salary':salary},index=IDs).to_csv(f'{name}.csv', index=False)

Analysis

Since we had only limited time we decided to focus on three programming languages for our project: C++, PHP and Python. The mean and median salaries in Russian Rubles per month were:

For example, 20 core skills that employers ask for Python programming language in the job descriptions:

The mechanics of our simplest ML recommending system

We have decided to ask a user to rate on how they evaluate their ability from 1 to 5 in each of the most popular skills among recruiters. In turn we would use 1-3 as a sign that a user lack ability and 4-5 that he or she is proficient.

We trained a Random Forest model on the 600 vacancies leaving skills as features and the programming language as a target variable. The accuracy score for our test split was incredibly high at 0.98.

After the user completes a survey the system returns an educational course for a programming language that the machine learning model considers the weakest.

Deployment

To conclude, after tinkering with file structure and additional files necessary for Heroku we deployed our project on the platform in Flask.

The repository on Github: https://github.com/Pfed-prog/Netology_Final/

2 thoughts on “Moscow City Hack easy recommending system in Flask”

Leave a comment