Quick question: would a JavaScript machine learning library be useful to you? If so, how? I'm thinking basic classification (NN, DT, Bayes) and optimization (GA, NN, RHC, etc.) and clustering (EM, KM) functions.
(1) Modern classification algorithms like SVMs need some pretty hardcore math routines (SVMs require Quadratic Programming, which isn't trivial to implement correctly). Do you intend to implement these yourself? If so, that alone might be useful as a separate library, with the ML library built on top of it.
(2) I've been thinking about a JS distributed computing library for a long time -- sort of like Folding@Home, but instead of having to download a program, you just visit a website and let JS do the crunching (Ajax will pull and push data chunks). With modern JS engines, this has become more of a reality. So to bring it back to your question -- why not try to abstract as much of the math+algorithms from as many distributed computing projects as possible, and then build a generic JS library for doing distributed computation. You could have each distributed computing project as a benchmark -- i.e., start by implementing SETI@Home using your library, then move on to Folding@Home. I guarantee you won't be bored... ;)
Good luck, and I'm really glad people are pushing the capabilities of JS these days.
On a lighter note, can you imagine the derisive laughter if you had suggested this 10 years ago? :)
this might be relevant to your (or our collective) interests- http://news.ycombinator.com/item?id=1645520 (MapRejuice - Distributed Client Side Computing (Node Knockout Entry))
No problem, I hope it helps you out or gives you some inspiration!
Since I don't know much about machine learning: what do you plan to build your JS library to do or support? Is it just for research, or are there ways to use it in a more everyday webapp?
Yes these would be useful to me. What would the performance be like compared to other options like Java? You'd have to use the new typed arrays to have any hope of comparable performance right?
It would not be immediately useful to me personally, but it is something I've been intending to build over the coming winter (for educational purposes and for interface experimentation).
We're building out our REST API to allow you to create, train, (and re-train) your own SVM. In theory, you could use our API entirely on the client side, using JavaScript.
From my experience in seeing performance and the kind of tweaking we've had to do to be able score 10K documents/s, you need some nitty gritty C code that I don't think can run in a browser.
Assuming you mean Javascript-in-the-browser, then meh... (I'm sure that's not the answer you were looking for, but hear me out):
Why would this be useful? Machine learning generally needs two things that browsers aren't very good at dealing with:
1) Large amounts of data
2) Fast I/O to process that data.
Why would someone prefer to use a client library rather than a remote call to a high performance serverside library, which will give better results?
Having said that, there are a few very specialized areas where this might make sense. For example, a Javascript Haar classifier would be useful for machine vision in a browser.
You are right, there are many things that can be done, using machine learning, like music classification by genre,artist or predictions api ( mentioned in a general fashion on a purpose because you are limited to the datasets and problems you choose to work on ). If a Javascript library that works smoothly and makes these things work better can be given birth i think it is definitely worth it. And you are right about the node.js thing.
I also point you out to some interesting resources about concepts that you could easily implement in your library.
Like the NGD ( Normalized google distance ), just an idea, to make smarter tag clouds ? http://www.complearn.org/
If you are planning to release this as a product, i doubt that it could gain traction, although the whole node.js thing makes me wonder whether everything is moving to the client, even heavy computational tasks as ML or AI problems. If it is a project just for the sake of it or for fun, then it would be cool to see your implementation.
(1) Modern classification algorithms like SVMs need some pretty hardcore math routines (SVMs require Quadratic Programming, which isn't trivial to implement correctly). Do you intend to implement these yourself? If so, that alone might be useful as a separate library, with the ML library built on top of it.
(2) I've been thinking about a JS distributed computing library for a long time -- sort of like Folding@Home, but instead of having to download a program, you just visit a website and let JS do the crunching (Ajax will pull and push data chunks). With modern JS engines, this has become more of a reality. So to bring it back to your question -- why not try to abstract as much of the math+algorithms from as many distributed computing projects as possible, and then build a generic JS library for doing distributed computation. You could have each distributed computing project as a benchmark -- i.e., start by implementing SETI@Home using your library, then move on to Folding@Home. I guarantee you won't be bored... ;)
Good luck, and I'm really glad people are pushing the capabilities of JS these days.
On a lighter note, can you imagine the derisive laughter if you had suggested this 10 years ago? :)