FAQ & GLOSSARY

FAQ

What is Headai’s SDG analysis?
For SDG analysis, we use data (texts) from UN’s official sustainable development goals (Agenda 2030) and other documents directly related to them. Headai’s machine has read the texts, identified meaningful words and word pairs and the connections between them. The visualization can be done e.g. as a concept map.

How to interpret SDG scorecard for an educational curriculum?
The SDG-Scorecard visualization describes the intersection of the curriculum and the UN goal and shows an overall picture of how the curriculum meets the 17 SDG goals. The first thing you can do is think about which UN goals are essential for your organization. Then you can look more closely what important themes and concepts you already dealing with, and which ones you might be missing.

How do you calculate the SDG score?
Our machine recognizes meaningful words and compounds (2 or more words) from UN texts as well as the customer’s texts and compares them with each other. The more words/compounds are found in common, the bigger the score. But additional to this, all words also get a coefficient depending on how descriptive the word/topic is, in a scale from 1 to 5: small number means generic/fuzzy topic, a bigger number means a descriptive topic. For example: “solar energy” would probably get a coefficient of 5, and “number” would get a coefficient of 1. So when calculating the score for this compound and word, it would look like this: 5×1+1×1= 6 (if the words are only found once).

Do you parse all the jobs listed on a job portal or just a percentage of them?
Our parser reads all job ads available on the site. However, in the analysis, we apply scientific sampling with sample size 3000 in order to keep different data size sets comparable. All methods we use are scientific.

Should I define a character set when using Headai APIs?
When you call Headai APIs, you should define UTF8 character set. It might work without it, but anomalities can occur.

Poor data, poor outcomes?
This is true with intelligent algorithms. There are many things that can make data poor in quality. There might be part of the data missing or simply errors. The data can be biased. Since people creating data can be unconsciously biased, this can sneak into data as well. The data can be fuzzy and non-descriptive. This means there is not enough descriptive information for the algorithm to build understanding. Poor data leads algorithms to do poor suggestions or predictions. Quality data is vital.

If 30% of overall skill needs can be identified from job ads, where can I find the missing 70 percent?

About one third of the current skill needs can be identified from job ads. Headhunters cover under 5 % of recruiting, and this data is not easily available. Nearly half of the skills and competencies move inside organizations, which doesn’t accumulate open data.

There are other open datasets that can help by giving different perspectives. While job ads and headhunters reflect current skill needs, data about investments can bring the visibility of 2 to 3 upcoming years. Research and development data can reflect the next 3 to 10 years. Sustainability topics like Agenda2030 and Green Deal can give visibility for over 10 years. There are a lot of data that you can follow, e.g. industry trends reports, governmental reports, foresight reports, patents, and startup databases to name a few.

Does your analysis highlight the level of expertise needed?
we don’t give levels for each skiills. We are interested on how one skill is connected. Math alone doesn’t have context. Based on context we make an assumption what’s the professional context and also level of the skill.

Your algorithm found me a word that doesn’t fit into my context, can you remove it?
Our machine recognizes common language. The purpose is to serve the identification of many different contexts. There are times when a word that was found might not fit into your context. However, the Headai language model is not modified based on single user feedback. Token-specific filtering lists will be introduced for new APIs, but this will not be implemented in the old functionalities.

Is there IBM Watson or other third-party AI behind your solution?
No. Headai’s dynamic ontology and algorithms are 100% Headai IP and Headai-made.

What AI genre do you belong to?
Our technology is based on machine-learning and neuro-calculation. We combine Natural Language Processing (NLP), Self-organizing maps (SOM) and reinforcement learning. Our technology is language-independent, which allows flexible scaling.

How does Headai’s ontology differ from traditional ontologies?
Headai’s ontology is dynamic – it learns all the time and becomes better the more it’s been used and the more it reads working-life-related textual data. Where a typical keyword-level ontologies are of hierarchical type, Headai uses self-organizing semantic neural networks where all the words are dynamically connected.

What is Digital Self?
A machine-developed, interoperable skills profile, that can represent a varitety of things: an individuals professional profile, company’s skill assets, areal labor market skill demand, an educational offering (skills supply) and many more. It provides an interoperable data format for predictive analytics and simulations.

What are the benefits of simulating skills?
Simulations offer near-real-time tools for efficient and fast skills-based comparisons of different entities. This helps e.g. decision-making, finding skills gaps, and spotting similarities (matches).

How does Headai “semantic” approach differ from keyword matching?
Headai learns the context and similarity. Here, the match can be found without knowing the actual ‘keyword’ because the whole subject matter is taken into account.

What are Headai’s ML & NLP benefits compared to Deep Learning applications?

Green AI – Uses only a fraction of energy compared to Deep Learning solutions.

Cognitive reasoning – Enables complex tasks like reasoning with controversial and/or incomplete information.

Expalinability – The AI results can easily be explained, there are no black boxes.

Ready to operate – Operates straight away, even with insufficient data and changing conditions. DL applications are sensitive to changes and require a massive amount of training.

GDPR compliance?
We strictly obey GDPR. In the best case, Headai does not keep any person register and deals only with anonym/pseudonym data. In this case, the identities are only known by the customer. The required security level is defined in the licensing agreement. The data is kept in the EU region (Finland). The customer always owns its data, we will not share it or distribute it to third parties. If needed, DPA (Data Processing Agreement) can be made to define personal data and the actions to protect it.

In an SDG analysis, are the source materials static or dynamic?
Static, to be sure of what they contain. The materials can be updated in the desired interval.

How do you ensure that your AI does not discriminate?
Headai’s mechanics do not know vocabulary related to age, gender, religion, or ethnic background.

Can the results be manipulated?
A machine that is trained with data can always be manipulated. This makes it important to choose carefully the training sets.

GLOSSARY

Actor An organization or an individual performing one or more roles.

AI A buzzword, can be anything.

Compound word two or more words joined together to create a new word with more accurate meaning.

Count The number of times an item appears in different sentences in the whole text entity.

Data interoperability Enables AI operations like simulations and building scorecards between any entities with a machine-built Digital Self. Data imports/exports are made in structured format like JSON or xAPI through APIs.

Digital Self A detailed replica of a professional skill profile of any entity (e.g. labor market, organization, or individual).

Futureproof To possess the required skill assets to thrive in the future’s uncertainty.

Interoperable Data modeled with AI into a language model (Digital Self) that can be compared (simulated) with other other Digital Selves. Data can be exported in JSON/xAPI.

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

NLP Natural Language Processing. A sub-category of Artificial Intelligence that operates with natural language (text), not numbers.

Relevancy Count x Weight

Semantic Meaning-based

Skills ecosystem A new way to connect the different level actors (individuals, companies, organizations, nations) in the skills domain. All are comparable with each other in the skills perspective. No system integrations needed, just open interfaces.

SOM Self-organizing map is a type of artificial neural network (ANN) that is trained using unsupervised learning. Introduced by a Finnish professor Teuvo Kohonen in the 80s.

Weight How descriptive the word/topic is: small number means generic/fuzzy topic, a bigger number means a descriptive topic.