Author Options:

How to program a neural network. Answered

I am trying to create a neural network for the purpose of using it for vocal translation software which is currently completely inaccurate. There is a lack of actually code on the Internet about this and only abstract concepts. anyone wanna help me out?



7 years ago

Looks like your goal is an accurate speech-to-text software app, not the creation of said application.

If so, I'd recommend obtaining a software package that allows customization / learning of your voice.  (Dragon package, for example).

However, if you are interested in learning / practicing how a neural network helps a VR program, get freeware from an academic source and tweak, rather than creating your own.

Particularly if you have not experimented with neural nets before.  Smarter to extend / customize a proven solution than start from scratch.

Reduces your work to finding application and customizing to your voice. 
Just customizing an application can be a significant project.

Wikipedia page on "Speech Recognition" has a nice summary of current methods (Neural Nets are not being investigated, Hidden Markov Models are currently en vogue) and some good links.  Looks like CMU's Sphinx package could be a great place to start tinkering.

Generally, machine learning algorithms need to be trained. This is where you can tweak a application to your voice.  A cursory glance at the Sphinx documentation seems to show the software allows custom training and explains how to do it.  Looks plenty tough.

If you do create your own software, don't forget to apply for a PhD!

The whole point is not just to make speech recognition that is tuned to my voice but to make something that uses the unique ability of neural nets to recognize great many people as well as translate that grammatically correct to another language and SAY it back. so a speech to speech translator. not some of that Dragon crap that i already have and only gets 1/4 of the words i say.

A truly ambitious project- the universal communicator!

So I work in the machine learning field (however, not in speech processing) and what you propose would be very involved, although it is an area of current development.  (so there are post-doc positions available... how's your cv?)

Before we go farther, let's briefly discuss machine learningNeural netwoks, random forests, hidden markov models, support vectors machines... There are many computer programs that can make decisions and have training feed back loop ostensibly improving success (links, more links).

These algorithms each have strengths and Achilles's heels. Thus, algorithm selection an important part of an solution.  In fact, I've even seen ~5% accuracy differences in the same algorithm implemented in different computer languages; significant when 95% accuracy is the typical goal!

I'd like to emphasize that current ML ability is different from what is anticipated.  My own experience suggests ML is currently "good" at reading big data sets and simplifying (classifying) these inputs.  Essentially, quickly digesting huge data sets and outputting a concise report.  Netflix suggestions may be somewhat relevant, but I have yet to see an online music portal that makes efficient suggestions.  Where we are at now is: if we have the data and the CPU cycles, why not try, 10% improvement is better than 0%.

Certainly advanced decision making is the academic goal of ML, but I do not think it is a contemporary reality.  Your goal is a perfect fit for graduate work.

Back to your intent, here's a simple flowchart of the algorithm as I see it:
  1. Capture and process [speech, lang1] into known library of [phrases, lang1].
  2. Translate [phrase, lang1] > [phrase, lang2]
  3. Output in [synth-speech, lang2]
Step 1
Looks like there are some good programs available for input. Most seem to have capture and recognition algorithms, and language libraries.  Most seem to use statistical algorithms for processing.

Step 2
A quick survey indicates crowd-sourcing in conjunction with error-minimizing ML strategies are currently being used in [text,lang1] > [text,lang2] conversions (Google Translate, for example).  I suspect grammatical translation is beyond ML and brute-force solutions are our contemporary limit.  Grammatically accurate [lang1] > [lang2] conversion is a tricky step,  because it must  balance between Fidelity vs. transparency, which is a challenge inherent to translation.

Step 3
There are many programs that can output text to speech. I imagine there are free programs in most major languages, so this is solved.

It is important to recognize that software doesn't need to get it "right" just add accuracy, since 10% is better than nothing.  So, can you add value in step1 or step2?  Now the devil is in the details.

From a practical standpoint- I'd be very clear on defining your goals.  Is algorithm selection important? or is output important?  I think you could easily spend 2 years exploring a particular algorithm, with no guarantee of significant improvement. 

I've actually been through this in the past!  This is OK, if your goal is the algorithm, but not OK if your goal is the output.

If you are dead set on NN, use the available software as a proven process flow, swap out the old ML algorithm for NN (some here), and when it all works, start testing/training.

Definitely a worthy project, I wish you success! 

BTW: post your work as an instructable....

Thanks, I will post it if any good comes out of it. And the whole idea of not using standard algorithms is to utilize the unique ability of a neural network to learn and deviate. that way, instead of writing code for it to translate and a 100 grammar rules and let alone tts and speech recognition, it would be more like actually teaching a language to the computer with enforced learning which i have read up on for neural networks, using a combintion of supervised and unsupervised learning to allow the machine to almost have the illusion of understanding and translating the meaning. so the result is a voice to voice translation program if a neural network can accomplish that, as i have no clue if it can since i have not found any information about anyone trying to do this.

I remember there was work done "training" a CPU to text-talk back to humans. I think in 1990-2000'ish. Researchers created AI programs that responded to a text prompt, something like

>hi hal
HAL> hello, how are you?
>fine, how are you?
HAL> fine, thanks

In retrospect, I wouldn't be surprised these were early neural net ML's.  Timeline fits contemporary ML.  I remember one TV sound bit had a PhD sitting in front of the terminal discussing how the CPU was learning (being trained) and would choose it's own words to respond with.

You should be able to find some refs about this work- seems related to what I've outlined as step 2.

Definitely, you'll be able to find papers on training a translator.  The NN algorithm may not be documented, but it's just a different alg.  The discussion of creating // training will be relevant.

. If ya gotta ask in such general terms, it's beyond what you can do at this time. Get some programming training/experience and then give it a go.

I do have programming experience. I simply don't understand how to actually accomplish the concept of a modular system of weighted link. I dont know if that was proper English but i just dont really get how to turn the abstract concept i read about into a set of classes in c++

Besides the Web search suggestions I made, here's a bit of design advice, which is basically for a feed-forward back-prop net.

Write a class for a single neuron. It takes an input (or a list/vector of inputs) computes something from them, and provides an output. The neuron should have a weight associated with each input (so you'll need either two lists, or a list of pairs), and some internal parameters associated with the computation.

Now you interconnect neurons by providing for each one's inputs a pointer to the corresponding neuron. When you query the final output neuron for its value, it does the calculation and asks it's inputs for their outputs. That triggers the recursion.

You'll need to implement "external inputs" as special cases (subclasses) of neurons which each have a single input that isn't another neuron, but some other source.

For the back-propagation training, you'll also need member functions which either change the weights of a given neuron directly (because you've implemented back-prop as a factory), or which know how to do the variation directly.

back propagation neural network C++. There's a whole lot of code out there, you just need to be specific in your search. You could also look for "perceptron C++" or "Hebbian neural network C++" or any of the other popular algorithms.

Thank you this is very helpful