Projects story :: A qualitative leap forward in Natural Language Processing for education
(09/02/2011) 'Natural language processing' (NLP) is a field of computer science and linguistics that is attracting interest from diverse academic, business and government sectors. NLP covers many applications: software to convert text into speech, automatic translation between languages or mining data for specific concepts. But it remains far from perfect.
Quantitative analysis tools have long featured in the fast-growing field of technology-enhanced learning, in which computers and software programs play a key role in how students gather information, study and communicate with teachers and among themselves. While quantitative analysis can reveal insights into how much work a student is doing, how often they are referring to educational materials, attending class or communicating with others, they can't measure the value of those interactions. That is where qualitative analysis comes in.
'Most analysis tools developed for the technology-enhanced learning community to date measure how or how much students do things, what's needed are tools and technologies that focus on "how well" students do things. They can then use that knowledge to help guide them in their progress,' explains Wolfgang Greller, Associate Professor for New Media Technologies and Knowledge Innovation at the Open University of the Netherlands.
Dr. Greller coordinated the 'Language technologies for lifelong learning' ( LTfLL) project, in which 11 academic and business partners in seven European countries developed innovative solutions focused on the qualitative side of technology-enhanced learning.
Supported by EUR 2.85 million in research funding from the European Commission, the project developed and field tested six tools to help learners and tutors in fields ranging from dialogue and text analysis to student positioning and conceptual development. In the process, they also made considerable technological progress in the area of NLP.
'If you have ever used Google Translate, Dragon speech recognition software or similar tools, you'll know that the accuracy is not 100 %. This is a relative issue, however, because if you expect 100 % and get 80 % the result is bad, but if you expect zero and the result is 80 % accurate, well, that is much better than nothing,' Dr. Greller notes.
In order to develop intelligent next-generation support and advice services and tools for individual and collaborative learning, the LTfLL researchers had to make progress on two key challenges facing NLP: accuracy and performance.
Assessing students accurately
With regard to accuracy, the researchers developed a technique to measure the accuracy of the output of their tools against benchmarks set by human experts. It was employed, for example, to enhance LTfLL tools dealing with positioning - i.e. assessing at what level of a certain course a new student should be placed based on their existing knowledge of the subject - and conceptual development - how well a student is grasping the subject matter and progressing once they start attending classes.
'For the positioning analysis, students were asked questions and had to respond with a free text. Their answers were then analysed by the tool against so-called text-book responses provided by experts and teachers, a procedure that rests heavily on the accuracy of NLP technology,' Dr. Greller explains.
The system is not designed to automatically allocate new students to a specific class or level, but rather to provide a guide for teachers about a student's existing level of knowledge on a subject in a much more time and cost-effective manner than the multi-choice tests and face-to-face interviews commonly used today.
Similarly, conceptual development analysis helps gauge students' performance by looking at the educational content they produce and measuring how closely the language they are using to define concepts matches the language used by the expert community.
'It is not just about looking at the number of keywords a student includes in an essay - if that were the case it would be very easy for a student to game the system - but rather determining how well their use of those keywords defines the concept they are trying to express,' the project coordinator notes.
That, in turn, requires NLP technology that is very fault tolerant and which can understand that two words spelt or misspelt in different ways can mean the same thing and that identical looking words can have very different meanings depending on context.
'The system needs to understand that ''webcam", "web cam" or "cyber cam" mean the same thing and be able to determine if the word "Java" refers to the island or the programming language, it needs to understand the context,' Dr. Greller notes.
Both the positioning and conceptual development analysis tools were tested by LTfLL project partner Bit Media. This Austrian e-learning solutions provider runs courses to help unemployed people return to work by training them in computer skills in order to obtain the European Computer Driving Licence, a certification in proficiency in information and communication technologies (ICT).
Improving NLP performance
With regards to improving the performance of NLP systems, the LTfLL researchers implemented latent semantic analysis, pre-processing techniques and built contextual environments for different subjects.
'One of the main technical challenges in NLP is that it needs a lot of processing power. Each time a student inputs information or makes a query a lot of processing is required. Building the contextual environment beforehand and pre-processing many of the terms in a given field - whether it is medicine or computer science - means results can be returned in seconds rather than days,' Dr. Greller explains. 'This was a key issue for us as we wanted our tools to work on-demand and in as near real time as possible.'
The LTfLL researchers' work on enhancing NLP technologies puts their results at the forefront of the emerging field of 'learning analytics', which focuses on the measurement, collection, analysis and reporting of data about learners and their contexts in order to better understand and optimise learning and learning environments. A key aspect of this is the combination of NLP with other technologies dealing with sentiment analysis and social networking, with a view to making learning in any field and about any subject more fun, interactive and engaging.
LTfLL received research funding under the European Commission's Seventh Framework Programme (FP7).