People always say to work with people smarter than you, but I think a much better strategy is to work with people that are experts in areas linked to your own, but not equivalent to your own.
— John Myles White (@johnmyleswhite) August 8, 2018
People always say to work with people smarter than you, but I think a much better strategy is to work with people that are experts in areas linked to your own, but not equivalent to your own.
— John Myles White (@johnmyleswhite) August 8, 2018
you want to replace unlimited history with size-k history because you think it is a good enough approximation? by all means please do. but don't expect to magically capture also the long-range stuff. it doesn't work this way. it cannot work this way.
— (((ل()(ل() 'yoav)))) (@yoavgo) August 8, 2018
Also, many people seem to hold both of the following beliefs at the same time:
— (((ل()(ل() 'yoav)))) (@yoavgo) August 8, 2018
- ha cool we can do language models with feed-forward nets instead of RNNs!
- if we do LM well we will model all of language and achieve AGI!
It doesn't work this way. These are conflicting.
aaargh the "When recurrent models don't need to be recurrent" paper is so frustrating!
— (((ل()(ل() 'yoav)))) (@yoavgo) August 8, 2018
On the one hand it presents important technical results.
On the other, so many people interpret it as "yo lets replace all RNNs with FF nets". This is wrong. This is NOT the result.
To some extent I feel the culture of optimizing for test results ("the tiger mom effect") spills over into academia—the need to achieve highest SOTA metrics, citation counts, etc. What do you think of a Kaggle where winner is not based entirely on performance but what they built?
— hardmaru (@hardmaru) August 8, 2018
Humans vs. machines is (often) not a very helpful way to think about AI https://t.co/zGxUlJ8xhs pic.twitter.com/GjtHlRBc1P
— Rachel Thomas (@math_rachel) August 7, 2018
That angst when reading such tweets, that overwhelming desire to comment, that stolen attention in the minutes of your day, they all go to a collective simulation for their benefit. Our stolen attention and rage fuel our collective thinking that is being used to benefit them -_-
— Smerity (@Smerity) August 7, 2018
The key advantage of deep learning is its reliance on global optimization -- it learns a hierarchy of features jointly, which solves the fundamental problem of information loss. That's also one of its main weaknesses: it makes DL extremely inefficient due a lack of modularity.
— François Chollet (@fchollet) August 7, 2018
I made a short summary and a map to explore last week's fascinating twitter megathread on Semantics and Meaning in NLP.
— Thomas Wolf (@Thom_Wolf) August 7, 2018
You can find it here https://t.co/A9tlpc90wM
(To the participants: please, don't hesitate to tell me if you have remarks on how I report your arguments) pic.twitter.com/bJQzkBkGnX
+1. Yesterday, I had a painful conv with some exec of super well-funded startup. He seemed so strongly convinced that dialog (chatbots for eg) can be “solved” using reinforcement learning. Any of my attempts to disabuse him of that notion made him more entrenched in that position https://t.co/1CIhtXbpRd
— Delip Rao (@deliprao) August 7, 2018
Just because it doesn't apply to every problem doesn't make it meaningless. It also doesn't apply to language understanding because we cannot simulate language. Which is why techniques that have this requirement probably won't get us there.
— Richard (@RichardSocher) August 7, 2018
"this" meaning sampling and simulating unlimited numbers of steps. My experience applying RL to wildfire management suggests that the number of required simulations is infeasibly large even for the massive farms Google is using, hence the claim is meaningless
— Thomas G. Dietterich (@tdietterich) August 7, 2018