Jascha Sohl-Dickstein

		Job
	Nando de Freitas	Chercheur chez Deepind
	Nige Willson	Conférencier
	Ria Pratyusha Kalluri	Chercheur, MIT
	Ifeoma Ozoma	Directrice, Earthseed
	Will Knight	Journaliste, Wired
	Dr Kate Crawford	Chercheur, Microsoft Professeur, Université de New York
	Justin Hendrix	Chercheur, NYU Tandon School of Engineering CEO
	Jenn Wortman Vaughan	Chercheur, Microsoft
	Dr Mona Sloane	Chercheur, Université de New York Sociologue
	Kathy Baxter	Architecte Ethique, SalesForces
	Amba Kak	Directeur, AI Now Institute
	Nico Grant	Journaliste, Bloomberg
	Madeleine Clare Elish	Chercheur, Google
	Frank Pasquale	Chercheur, Université du Maryland
	Emily Denton	Chercheur, Google Brain
	Scott Thurm	Journaliste, WIred
	Kyle M L Jones	Chercheur, Université d' Indianapolis
	Alessandro Bongioanni	Chercheur, Université d'Oxford
	Ayanna Howard	Presidente, School of Interactive Computing au Georgia Tech Chercheur Auteur
	Michael Veale	Conseiller, Open Rights Group
	Meredith Broussard	Professeur, Arthur L. Carter Journalism Institute NYU Chercheur Auteur
	Arvind Narayanan	Professeur, Princeton Chercheur
	Solon Barocas	Chercheur, Microsoft Fondateur, FAcct
	Sergey Levine	Chercheur, Berkeley Professeur
	Jurgen Schmidhuber	Chercheur, NNAISENSE Professeur, Dalle Molle Institute for Artificial Intelligence Research
	Jascha Sohl-Dickstein	Chercheur, Google Brain
	Niloufar Salehi	Professeur, Berkeley Chercheur
	Veena Dubal	Professeur, Californie University
	Tom Simonite	Journaliste, Wired
	Shalini Kantayya	Cinéaste
	Kareem Carr	Biostatistician, Harvard
	Devin Guillory	Chercheur, Berkeley
	Jack Clark	Directeur Politique, Open AI
	Cade Metz	Journaliste, New York Times Auteur
	Yeshimabeit Milner	Data Scientist, Highlander Research
	Nicolas Le Roux	Professeur, McGill University Chercheur, Google Brain
	Julia Angwin	Journaliste, The Markup
	Ryan Mac	Journaliste, Buzzfeed
	Vijay Chidambaram	Professeur, University duTexas Chercheur, VMware Research
	Michael Ekstrand	Chercheur, Boise State University
	Casey Fiesler	Professeur, Université de Boulder
	Mar Hicks	Professeur, iIllinois Institute of Technology
	Gideon Lichfield	Directeur Editorial, Wired
	William Isaac	Chercheur, Deepmind
	Cathy O Neil	Mathématicienne Data Scientist Auteur
	Luke Stark	Professeur, Université d'Ontario Auteur
	Talia Ringer	Professeur, Washington University
	Khadijah Abdurahman	Chercheur
	Tawana Petty	Directrice, Data for Black Lives Auteur
	Khari Johnson	Journaliste, Venture Bit

Plus

Jascha Sohl-Dickstein

Profil AI Expert

Nationalité:

Américain(e)

AI spécialité:

Neural Network

Deep Learning

Apprentissage Machine

Occupation actuelle:

Chercheur, Google Brain

Taux IA (%):

82.31'%'

Twitter:

https://twitter.com/jaschasd

TwitterID:

@jaschasd

Tweet Visibility Status:

Public

Description:

Jascha est chercheur scientifique senior au sein du groupe de Google Brain, où il dirige une équipe de recherche dont les intérêts couvrent l'apprentissage automatique, la physique et les neurosciences. Ses travaux récents se sont concentrés sur la théorie des réseaux de neurones sur-paramétrés, la méta-formation des optimiseurs d'apprentissage et la compréhension des capacités des grands modèles de langage. Auparavant, il était chercheur invité dans le laboratoire de Surya Ganguli à l'Université de Stanford et résident universitaire à la Khan Academy. Il est tres concerné par le domaine sur les réseaux sociaux,ayant actuellement le taux d'engagement en IA mesuré par Cafiac le plus élevé de la communauté d'Experts en IA. Il sollicite des contributions de tâches à un benchmark collaboratif conçu pour mesurer et extrapoler les capacités et les limites des grands modèles de langage.

Reconnu par:

Non Disponible

Les derniers messages de l'Expert:

Tweet list:

2024-03-01 00:00:00 CAFIAC FIX

2024-03-11 00:00:00 CAFIAC FIX

2023-05-19 19:00:00 CAFIAC FIX

2023-05-21 19:00:00 CAFIAC FIX

2023-04-21 00:00:01 CAFIAC FIX

2023-03-24 22:00:44 @DavidDuvenaud This announcement makes me very happy! Thank you for working to make the future better for your children and mine.

2023-03-14 02:20:58 @gwern But look at how unexpectedly clean the plots are! I do think it would be possible to make these definitions more objective -- check bonus section 6 in the blog post for some ideas

2023-03-10 16:03:10 @TechCapo These orderings were subjective judgements of others! I buy this though -- alphago is trained indirectly via value functions, so there's another imperfect link in the chain linking it's output to an objective, compared to eg a classifier.

2023-03-10 15:52:51 @georgebdavis Re (1) -- my own hypothesis is that evolution had to work *very hard* to make animals intelligent in a way that contributed positively to our fitness function. It's not that coherence can't be achieved, rather that we're going to have to work hard for every bit of coherence.… https://t.co/KUVplHxxks

2023-03-10 15:41:31 @Sheikheddy +100

2023-03-10 15:38:23 @DavidSKrueger Those are all possible! Here's a sketch of another possible low level mechanism: Agents interacting with the world are high dimensional dynamical systems world state → model output / action → new world state Smarter agents are: - more complex dynamical systems (shorter… https://t.co/YDlQmVJEG3

2023-03-10 15:28:20 @catherineols Yes! That is a risk scenario that sounds worryingly plausible to me.

2023-03-10 02:17:03 @Cory29565470 I didn't choose the organizations -- I asked a subject, who didn't know what the experiment was about, to choose them, so I wouldn't be able to bias the results by cherry picking.

2023-03-09 17:46:45 (And stochastically tagging a few people who might be interested. @KatjaGrace @DavidSKrueger @DavidDuvenaud @bucketofkets @EthanJPerez )

2023-03-09 17:00:46 Huge thank you to my generous volunteer subjects (tagging the few cases where I know your twitter handle -- sorry if I missed you!): @dmdohan @jesseengel @thisismyhat @DylanPaiton @neurotalker

2023-03-09 16:36:57 @nabla_theta I completely agree. But under that scenario we will need to work really hard for every scrap of coherent behavior. We won't accidentally get to a paperclip maximizer.

2023-03-09 16:15:53 See the post for details -- including discussion of the many ways these results are speculative and could be improved. This is my second blog post ever -- please continue to be harsh but also constructive! https://t.co/OukfipSkIJ

2023-03-09 16:15:52 The hot mess theory of AI misalignment (+ an experiment!) https://t.co/OukfipSkIJ There are two ways an AI could be misaligned. It could monomaniacally pursue the wrong goal (supercoherence), or it could act in ways that don't pursue any consistent goal (hot mess/incoherent). https://t.co/tdnZP65DTc

2023-03-05 10:00:00 CAFIAC FIX

2023-03-02 22:00:00 CAFIAC FIX

2023-02-27 01:00:00 CAFIAC FIX

2023-01-30 01:00:00 CAFIAC FIX

2022-12-23 20:23:24 Intuitive extensions to standard notation, that make it less ambiguous for common math in machine learning. This should become common practice in ML papers. This could have saved past me cumulative days of confusion (and worse, misinterpretations I probably never discovered). https://t.co/l6wpPT6hTF

2022-12-08 13:00:00 CAFIAC FIX

2022-12-07 08:00:00 CAFIAC FIX

2022-11-09 14:46:56 @ErikSchluntz +1. Generalizing/abstracting your example slightly, you're saying changes which increase efficiency in the *typical* case, may lead to worse performance in the *average* case, because of an increased risk of catastrophic failure? (A key phrase might be black swan event.)

2022-11-09 14:33:02 @athundt @peteflorence @ruha9 of the phenomenon with a moral judgement about the phenomenon, in a way that I think would make technical discussion, including around mitigations, difficult.)

2022-11-09 14:28:43 @athundt @peteflorence @ruha9 Thanks for the connection! I just added these to the list of related concepts. (* While I think these are excellent observations, I wouldn't be comfortable myself using these examples as the primary term for the underlying concept, because they seem to combine a description https://t.co/JQXJo6CFgY

2022-11-08 00:33:22 @RazMarinescu +1 to adapting goals+incentives being key to mitigating this.

2022-11-08 00:29:28 @PaulsonJonathan This is a really good point! If we could somehow observe the world where the listed thing changed, but everything else was held fixed, we might see absolute outcomes get worse. But we don't live in that world, and there are reasons everything changes at once. I will think on this

2022-11-07 14:43:09 @updateless_ This turns out to be really hard to write, because I have so much uncertainty. Predicting the future is hard.

2022-11-07 14:38:25 @sirbayes These are also worried I have!"In a world that will only become more influenced by mathematical intelligence, can we ruin culture through our attempts to perfect it?"

2022-11-07 14:30:57 @DavidSKrueger I hadn't seen that paper. I like that it introduces an ontology -- I think this was missing from how I thought about it. Thank you for the connection.

2022-11-07 04:45:54 RT @boazbaraktcs: 3/7 this should not detract from the general point, that in many cases, as a system, whether algorithmic, individual, or…

2022-11-07 01:29:20 Also @-ing some people I follow (and get a lot of value from) that might find this perspective interesting. @bucketofkets @AmandaAskell @albrgr @DavidSKrueger @KatjaGrace @OwainEvans_UK @sleepinyourhat @jackclarkSF @geoffreyirving @ESYudkowsky

2022-11-07 01:07:15 If there's one thing that AI will bring, it's dramatically greater efficiency across many domains. We should expect that this will cause similarly dramatic harmful unintended consequences, in every domain AI touches, *all at once*. This is going to be a hard period of history.

2022-11-07 01:07:14 The phenomenon of overfitting in machine learning maps onto a class of failures that frequently happen in the broader world: in politics, economics, science, and beyond. Doing too well at targeting a proxy objective can make the thing you actually care about get much, much worse. https://t.co/LNLOg5IBmA

2022-11-07 01:07:12 My first blog post ever! Be harsh, but, you know, constructive.Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's lawhttps://t.co/uR7pL7WNST https://t.co/NaibgX1bRb

2022-11-06 02:43:41 I'm on mastodon! @jascha@mathstodon.xyz. I will post new content there, before Twitter.I don't like my social+professional interactions being mediated+manipulated by a corporation with very different incentives than me. I'm hoping mastodon replaces scientific Twitter.

2022-11-02 18:09:39 @ericjang11 @dpkingma I think there is a qualitative difference between the magnitude degree of freedom, and other degrees of freedom. That is, I think getting relative magnitudes of activations correct is somehow easier for neural networks then getting the overall norm correct.

2022-11-01 21:19:13 @ericjang11 (though that observation really just moves the why question one step farther up, rather than answering it)

2022-11-01 21:18:30 @ericjang11 This is for the same reason that neural networks are often poorly calibrated. NNs are good at producing a vector that points in the right direction, but bad at getting the magnitude correct. For classification, you just need to get the vector direction right.

2022-10-23 19:12:19 I just read this, and got a lot out of it. https://t.co/yUn5EJMz1x

2022-09-27 17:14:18 @TacoCohen +1000 to this.

2022-09-23 03:56:14 RT @BorisHanin: PRINCETON ML THEORY POSTDOCI'm looking for a theory postdoc with background in math, physics, stats, CS. Share widely.…

2022-09-23 01:34:30 One of the largest challenges around learned optimizers is making inner and outer training *stable*. James shows how eigenvalue analysis and careful intervention can produce massive improvements. https://t.co/hcKmrytW4n

2022-09-14 15:38:19 RT @ARomanNovak: Quadratic scaling in the number of pixels is a huge bottleneck of the NNGP/NTK. Very excited about _orders-of-magnitude_ s…

2022-09-14 15:24:57 I'm very excited to help out with the AI Grant program! I know I'm going to learn a lot. Hopefully we can learn a lot together. https://t.co/oDzzmuy1Gi https://t.co/4GhCYkcmwM

2022-08-26 22:47:26 RT @ScienceInsider: BREAKING: White House issues new policy that will require, by 2026, all federally-funded research results to be freely…

2022-08-06 20:41:03 This thread is an excellent read. I don't know that I would characterize the observations as spicy, so much as maybe just worrisome. https://t.co/wnBZUlEpl2

2022-08-06 20:13:18 @jackclarkSF At least half the time, this is because the original authors didn't realize an aspect was actually very important, or didn't realize an insight suggested by their experiments.

2022-07-23 18:10:47 @karpathy (Animal Eyes is an amazing book. Every few pages you'll learn something you want to share with everyone near you. Bruno Olshausen uses it for a great course at Berkeley.)

2022-07-23 18:04:45 @FelixHill84 So I guess -- eventually I think the bitter lesson will apply, but we need to figure out a lot before we can blindly scale the number of interacting large models.

2022-07-23 18:03:07 @FelixHill84 Good Q! I suspect for a while we will design multi-agent systems, then once they're stable we will scale them, then when the agent count is large enough, we will wrap another layer of abstraction on top, and start designing ?societies? of many interacting multi-agent systems.

2022-07-23 04:39:02 I think we will increasingly build systems out of many large models interacting with each other. I think the cascades perspective -- write down a probabilistic graphical model, but with every node a language model -- is the right formalism for describing these systems. https://t.co/oVcHgEu7ad

2022-07-22 03:36:52 RT @sschoenholz: Paper is here with details: https://t.co/kgb8Wvkje5If you don't care about details, the finite-width NTK calculations in…

2022-07-22 03:36:31 RT @ARomanNovak: Will be presenting our work on fast finite-width NTK today at #icml2022 - please come to our talk at 10:55 EDT, or the pos…

2022-07-01 01:48:22 RT @ethansdyer: 1/ Super excited to introduce #Minerva (https://t.co/UI7zV0IXlS). Minerva was trained on math and science found on the web…

2022-07-01 01:47:08 RT @alewkowycz: Very excited to present Minerva: a language model capable of solving mathematical questions using step-by-step natural lan…

2022-06-27 21:55:05 RT @EthanJPerez: We’re announcing the Inverse Scaling Prize: a $100k grand prize + $150k in additional prizes for finding an important task…

2022-06-19 15:36:07 @laurence_ai Noted. We should add a discussion of this to our paper.

2022-06-18 06:33:57 @TheGregYang Good question! You can write the reparameterization in terms of either a feature x feature or data x data kernel, whichever is smaller (see Appendix B). So it's not a problem computationally. Large data/ width ratio will lead to a less smooth reparameterized distribution though.

2022-06-18 01:00:39 RT @hoonkp: Awesome work by @jirimhron and friends at Google: Bayesian parameter posterior of the infinite-width limit! Another concrete e…

2022-06-18 00:06:37 PS -- When I described these results to @TheGregYang a couple months ago, he initially described them as "too good to be true", so you know they have to be good!

2022-06-18 00:06:36 Many, many more details in the paper! My fantasy and hope for this work is that it not only helps us understand neural networks better, but will also help make Bayesian models (without egregious approximations) practical. https://t.co/wmjO5F3ozq

2022-06-18 00:06:35 Even better, because the KL between prior and posterior shrinks with width, MCMC sampling after repriorization grows *more efficient* with width. (almost all current common MCMC samplers instead grow dramatically less efficient with increasing dimensionality) https://t.co/wLhDDPmptO

2022-06-18 00:06:34 MCMC mixes much faster after repriorization (we show >

2022-06-18 00:06:33 We characterize the weight space posterior by defining a data-dependent reparameterization that causes the *posterior* distribution over parameters conditioned on a dataset to converge in KL towards the *prior* distribution over parameters. We call this mapping repriorization. https://t.co/7WSan1HCdJ

2022-06-18 00:06:32 Detour for acknowledgements:@jirimhron deserves the lions share of credit. He is also job hunting!! Jiri is brilliant and extremely patient, and you should hire him. Thank you also to @ARomanNovak and Jeffrey Pennington, who played crucial roles.More about the result:

2022-06-18 00:06:31 For years I've shown this 2x2 grid in talks on infinite width networks, but with just a big in the upper-left.No longer! In https://t.co/NyZaHUsYjC we characterize wide Bayesian neural nets in parameter space. This fills a theory gap, and enables *much* faster MCMC sampling. https://t.co/zTUsGJVIhf

2022-06-17 23:28:44 @TrendingML I asked an internal language model I have access to, and it says it will require 114,720 Tweets. That is my final answer.

2022-06-17 18:04:48 @pde33 @machinaut @realSharonZhou The Brier score submission from the three of you is the cause of an entire section on calibration in the BIG-bench paper. Thank you!

2022-06-15 17:17:04 RT @qlhoest: Thanks @LiamFedus @AJAndreassen @jaschasd @ethansdyer @guygr and team for the incredible work on BigBench !You can find it on…

2022-06-14 03:32:35 RT @james_y_zou: Excited to contribute to bias assessment of large language models in the BIG-bench!

2022-06-13 16:10:33 RT @vedantmisra: BIG Bench is not only a fascinating collection of tasks for LLMs, it's also a shining example of how open and collaborativ…

2022-06-13 05:09:56 RT @geoffreyirving: Whether LLMs are conscious or pass Turing Tests or what precisely a Turing Test means matters much less than whether yo…

2022-06-12 04:22:17 RT @adityagupta2211: Glad to have contributed to such a massive collaborative work! Excited to see DISFL-QA (https://t.co/gwdw5s9ici) and T…

2022-06-12 02:29:09 RT @ivanzhouyq: This is incredible work on LLMs! Reading through this paper, I'm not only amazed by the huge amount of work behind BIG-benc…

2022-06-11 19:55:44 This is a fascinating task, on which the performance of the largest models is still close to chance and not obviously increasing with scale. https://t.co/0l68wt7o8f

2022-06-11 19:34:35 The link to the task is here:https://t.co/kMTrrbkjO3This is a great task, that large models still perform roughly at chance on. https://t.co/r1miq5TlKK

2022-06-11 19:30:55 The implicatures task was one of my favorites!! Silly, but also requires some quite complex skills, possibly including a rich world model and theory of mind. https://t.co/uORNzPxqvZ

2022-06-11 18:38:35 @raphaelmilliere @OwainEvans_UK are to human capabilities for quite a while.

2022-06-11 18:36:21 @raphaelmilliere @OwainEvans_UK Good question! I don't want to hazard a timeline, because that's the sort of thing that gets screenshotted and turned into an embarrassing slide. BIG-bench includes many tasks that language models can't do at all though. I believe it will remain a useful test for how close LMs

2022-06-11 03:27:04 @billmdev We will still have the low and high scores that are part of task metadata for new tasks, which are useful for establishing a reasonable scale. To compare to humans though would be another project, which we don't currently have plans for.

2022-06-11 03:18:14 @tdietterich I think experimental physics has smoothed out all the rough spots for arXiv submissions with long author lists. @ethansdyer pasted all the names into the arXiv form field ... and it just worked.

2022-06-11 03:11:38 @tomgara Great! Now, tell me why them getting worse is expected (or at least funny if you have the right context).

2022-06-11 03:06:24 Owain's task is truthful_qa, which is a great tasks that targets a specific worrying failure of language models (that they will just make up incorrect things when they don't know the answer). Thank you!!https://t.co/wlhkuxqaXa https://t.co/THVa4Vcrcz

2022-06-11 03:03:42 @billmdev So scoring close to 100 corresponds to doing well.

2022-06-11 03:03:25 @billmdev We hired humans to do almost all the tasks in the benchmark, so we can compare LM performance to human performance. Each task also specified as part of its metadata their estimate for what "low" and "high" scores on their task would be. We normalize those to be between 0 and 100.

2022-06-11 01:12:16 RT @dmdohan: Huge props to the organizers for their leadership in pushing this to completion! Exciting model for large-scale collaboratio…

2022-06-10 21:11:59 RT @andrey_kurenkov: Generally really cool, but I also like this bit - "BIG-bench continues to accept tasks and evaluation results on a rol…

2022-06-10 21:11:42 And the corresponding task is here!https://t.co/Un3voQbmCXThank you! https://t.co/tI5SPAZQCu

2022-06-10 19:53:34 Here is the task, which is high quality (and somewhat distressing):https://t.co/TlFIH1cgIl https://t.co/NEAJc4uaUx

2022-06-10 19:50:39 Oops -- I just saw that you gave links to your tasks later in a thread. Comment still applies though -- your tasks were excellent!

2022-06-10 19:49:13 Your contributions were great Marie!! To list them for Twitter:https://t.co/voIHsJ0Iy1https://t.co/0v2HRTfJXChttps://t.co/IbaKBUzSK8(I particularly liked yes_no_black_white) https://t.co/YtkxMEdZX7

2022-06-10 19:42:32 @karpathy Unfortunately, tasks where models show breakthrough performance, and the way in which PaLM performance looks like the start of a sigmoid in terms of log-parameter-count, together mean that I'm still highly uncertain about what the near-future capabilities of large models will be.

2022-06-10 19:40:35 @karpathy My primary (personal) motivation for BIG-bench was that I was drawing straight lines on the plots in the GPT3 paper, and I really wanted to know what the *actual* capabilities of larger models would be.

2022-06-10 19:35:10 RT @karpathy: imo a major AI safety contribution, both in short-term (applications) and long-term (AGI) scope

2022-06-10 19:33:34 @kchonyc @thisismyhat You definitely have to work hard for it not to apply. Self-cite, but even a high dimensional random walk is concentrated in a low dimensional subspace, with energy in different eigenvalues of the iterate covariance falling off like a power low: https://t.co/04I1D8vtJl

2022-06-10 19:22:05 RT @karpathy: Incredible effort!!

2022-06-10 19:20:32 This was a great task! https://t.co/q8MHePbJUv

2022-06-10 19:15:22 @Suhail We do not, though a blog post is something we should really do. The paper and repository READMEs are hopefully pretty clearly written.

2022-06-10 17:55:14 RT @barret_zoph: It was a pleasure to be part of this effort! Very bullish on the impact this will have for the future of LLMs.Also very…

2022-06-10 17:21:38 @its_ericchu @snehapriscilla This tasks seems to require both a simple geometric world model, and also to internally perform multiple sequential reasoning steps -- it's great for probing weaknesses of current model architectures!

2022-06-10 16:22:03 @dk_gup @ethansdyer is the answer

2022-06-10 16:20:37 RT @BuzanDilyar: It was an amazing experience collaborating with amazing people @UvA_Amsterdam and contributing to the BIG-bench benchmark.…

2022-06-10 16:18:31 RT @douglas_eck: It indeed takes an army. Lots of interesting new research directions have been uncovered by the BigBench effort!

2022-06-10 15:17:50 @webis_de This is a cool task! Thank you!

2022-06-10 15:16:36 @rodrigfnogueira I think we should be comparing against the top rather than bottom baseline line on that plot. It's true that the trend looks worrying for humans though! (also, that plot is a subset of json tasks, which are generally easier than the programmatic tasks)

2022-06-10 15:12:45 @peppeatta This exists!! Start at one of the links below, and navigate to individual tasks. Performance vs. baseline is at the bottom of every task's readme.https://t.co/4YSK6aLvt4https://t.co/MkuXP5rVqB

2022-06-09 01:14:13 @stanfordnlp I didn't know anyone was saying otherwise! I think it's a mark of pride to manage a large collaboration (or even a small one). Projects in ML are also just going to keep on getting bigger, and so are author lists.

2022-05-25 05:06:10 RT @GoogleAI: Introducing Imagen, a new text-to-image synthesis model that can generate high-fidelity, photorealistic images from a deep le…

2022-05-24 19:10:38 RT @Chitwan_Saharia: We are thrilled to announce Imagen, a text-to-image model with unprecedented photorealism and deep language understand…

2022-05-20 08:11:00 CAFIAC FIX

2022-10-23 19:12:19 I just read this, and got a lot out of it. https://t.co/yUn5EJMz1x

2022-11-18 22:00:11 @yablak Many ML are going to https://t.co/1AiLl2tfTk.

2022-11-18 21:47:49 RT @ada_rob: Here is a real-world example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as…

2022-11-18 19:37:42 @deliprao @Luke_Metz @jmes_harrison @bucketofkets Nope, JAX only for the moment -- unless you want to make the PyTorch port?

2022-11-18 19:36:41 @AIjedi @Luke_Metz Have we mentioned how great JAX is yet? JAX is pretty great. If some bold stranger wanted to port VeLO to PyTorch though, then that would be amazing.

2022-11-18 19:34:12 @fedetask @jmes_harrison We tested this in the paper for Ant ... it's not great. RL problems are very different than the meta-training distribution. Future work to finetune VeLO to do well on RL tasks, or to include RL problems in meta-training distribution for VeLO 2.

2022-11-18 19:32:21 @DrJimFan @jmes_harrison @Luke_Metz @bucketofkets @poolio @ada_rob @IMordatch @amilmerchant @jekbradbury @naman33k The overhead due to the optimizer is ~10x the overhead of Adam, which is usually small compared to the compute cost of computing the gradients for the model you are applying VeLO to train. See leftmost pane in this plot: https://t.co/359UyP9INv

2022-11-18 19:28:20 @short_spy Nope, not currently. Have I mentioned how great JAX is? Because JAX is really great.

2022-11-18 19:27:45 @TexasBigData https://t.co/C2sKUiOYgX

2022-11-18 14:29:34 @albertzeyer @bucketofkets @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob The model params are published. See https://t.co/G7bLCn2vkZ

2022-11-18 14:01:30 RT @giffmana: My colleagues managed to *actually* learn a generic optimizer. What was impressive to me is that with absolutely zero tuning,…

2022-11-18 04:59:22 RT @bucketofkets: This was a really fantastic project—the culmination of literal years of work led by .@Luke_Metz. We’re proud to release…

2022-11-18 04:57:18 RT @poolio: Learned optimizers finally work! Swap out Adam for VeLO: a learned optimizer that outperforms human-designed optimizers withn…

2022-11-18 04:57:13 RT @ada_rob: Very excited to have been a small part of this amazing project. More work to be done to make this optimizer the go-to for gi…

2022-11-18 04:50:10 And huge thank yous to collaborators @bucketofkets Amil Merchant @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob as well!!! https://t.co/rKHQ2dInXt

2022-11-18 04:50:09 And the resulting learned optimizer works really well! We reached out to other researchers inside Brain, and had them try it on their tasks, and subject to the scale constraints I mention above it did as well or better than what they were currently using, with no tuning.

2022-11-18 04:50:08 Meta-training learned optimizers is HARD. Each meta-training datapoint is an entire optimization task, so building a large meta-training dataset is HARD. Each of N meta-training steps can contain N training steps applying the learned optimizer -- so compute is also extreme (N^2). https://t.co/GwDaEZLWgS

2022-11-18 04:50:07 If you are training models with <

2022-11-19 19:40:49 @GMartius The optimizer wall time overhead is about 10x the overhead of Adam. For most problems though this is still small compared to the time to compute the gradients. See the left pane in this plot from the appendix: https://t.co/QSZ1qR7NsQ

2022-11-19 19:34:13 @w_t_payne Yes! Or -- we don't address it in this work, but that is another clear target for meta-learning.

2022-11-19 19:32:45 @mauricetpunkt We have, but it doesn't seem to be necessary. We've also tried initializing the learned optimizer by *distilling* another optimizer, like Adam. This works OK, but has never really been pushed. (@Luke_Metz of course may have more to say)

2022-11-18 22:00:11 @yablak Many ML are going to https://t.co/1AiLl2tfTk.

2022-11-18 21:47:49 RT @ada_rob: Here is a real-world example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as…

2022-11-18 19:37:42 @deliprao @Luke_Metz @jmes_harrison @bucketofkets Nope, JAX only for the moment -- unless you want to make the PyTorch port?

2022-11-18 19:36:41 @AIjedi @Luke_Metz Have we mentioned how great JAX is yet? JAX is pretty great. If some bold stranger wanted to port VeLO to PyTorch though, then that would be amazing.

2022-11-18 19:28:20 @short_spy Nope, not currently. Have I mentioned how great JAX is? Because JAX is really great.

2022-11-18 19:27:45 @TexasBigData https://t.co/C2sKUiOYgX

2022-11-18 14:29:34 @albertzeyer @bucketofkets @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob The model params are published. See https://t.co/G7bLCn2vkZ

2022-11-18 14:01:30 RT @giffmana: My colleagues managed to *actually* learn a generic optimizer. What was impressive to me is that with absolutely zero tuning,…

2022-11-18 04:59:22 RT @bucketofkets: This was a really fantastic project—the culmination of literal years of work led by .@Luke_Metz. We’re proud to release…

2022-11-18 04:57:18 RT @poolio: Learned optimizers finally work! Swap out Adam for VeLO: a learned optimizer that outperforms human-designed optimizers withn…

2022-11-18 04:57:13 RT @ada_rob: Very excited to have been a small part of this amazing project. More work to be done to make this optimizer the go-to for gi…

2022-11-18 04:50:10 And huge thank yous to collaborators @bucketofkets Amil Merchant @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob as well!!! https://t.co/rKHQ2dInXt

2022-11-18 04:50:07 If you are training models with <

2022-11-19 19:34:13 @w_t_payne Yes! Or -- we don't address it in this work, but that is another clear target for meta-learning.

2022-11-18 22:00:11 @yablak Many ML are going to https://t.co/1AiLl2tfTk.

2022-11-18 21:47:49 RT @ada_rob: Here is a real-world example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as…

2022-11-18 19:37:42 @deliprao @Luke_Metz @jmes_harrison @bucketofkets Nope, JAX only for the moment -- unless you want to make the PyTorch port?

2022-11-18 19:36:41 @AIjedi @Luke_Metz Have we mentioned how great JAX is yet? JAX is pretty great. If some bold stranger wanted to port VeLO to PyTorch though, then that would be amazing.

2022-11-18 19:28:20 @short_spy Nope, not currently. Have I mentioned how great JAX is? Because JAX is really great.

2022-11-18 19:27:45 @TexasBigData https://t.co/C2sKUiOYgX

2022-11-18 14:29:34 @albertzeyer @bucketofkets @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob The model params are published. See https://t.co/G7bLCn2vkZ

2022-11-18 14:01:30 RT @giffmana: My colleagues managed to *actually* learn a generic optimizer. What was impressive to me is that with absolutely zero tuning,…

2022-11-18 04:59:22 RT @bucketofkets: This was a really fantastic project—the culmination of literal years of work led by .@Luke_Metz. We’re proud to release…

2022-11-18 04:57:18 RT @poolio: Learned optimizers finally work! Swap out Adam for VeLO: a learned optimizer that outperforms human-designed optimizers withn…

2022-11-18 04:57:13 RT @ada_rob: Very excited to have been a small part of this amazing project. More work to be done to make this optimizer the go-to for gi…

2022-11-18 04:50:10 And huge thank yous to collaborators @bucketofkets Amil Merchant @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob as well!!! https://t.co/rKHQ2dInXt

2022-11-18 04:50:07 If you are training models with <

2022-11-19 19:34:13 @w_t_payne Yes! Or -- we don't address it in this work, but that is another clear target for meta-learning.

2022-11-18 22:00:11 @yablak Many ML are going to https://t.co/1AiLl2tfTk.

2022-11-18 21:47:49 RT @ada_rob: Here is a real-world example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as…

2022-11-18 19:37:42 @deliprao @Luke_Metz @jmes_harrison @bucketofkets Nope, JAX only for the moment -- unless you want to make the PyTorch port?

2022-11-18 19:36:41 @AIjedi @Luke_Metz Have we mentioned how great JAX is yet? JAX is pretty great. If some bold stranger wanted to port VeLO to PyTorch though, then that would be amazing.

2022-11-18 19:28:20 @short_spy Nope, not currently. Have I mentioned how great JAX is? Because JAX is really great.

2022-11-18 19:27:45 @TexasBigData https://t.co/C2sKUiOYgX

2022-11-18 14:29:34 @albertzeyer @bucketofkets @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob The model params are published. See https://t.co/G7bLCn2vkZ

2022-11-18 14:01:30 RT @giffmana: My colleagues managed to *actually* learn a generic optimizer. What was impressive to me is that with absolutely zero tuning,…

2022-11-18 04:59:22 RT @bucketofkets: This was a really fantastic project—the culmination of literal years of work led by .@Luke_Metz. We’re proud to release…

2022-11-18 04:57:18 RT @poolio: Learned optimizers finally work! Swap out Adam for VeLO: a learned optimizer that outperforms human-designed optimizers withn…

2022-11-18 04:57:13 RT @ada_rob: Very excited to have been a small part of this amazing project. More work to be done to make this optimizer the go-to for gi…

2022-11-18 04:50:10 And huge thank yous to collaborators @bucketofkets Amil Merchant @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob as well!!! https://t.co/rKHQ2dInXt

2022-11-18 04:50:07 If you are training models with <

2022-11-21 18:38:37 RT @wtgowers: Note that if X is a finite set and we take all its subsets, then every element of X belongs to exactly half the subsets. Yes…

2022-11-19 19:34:13 @w_t_payne Yes! Or -- we don't address it in this work, but that is another clear target for meta-learning.

2022-11-18 22:00:11 @yablak Many ML are going to https://t.co/1AiLl2tfTk.

2022-11-18 21:47:49 RT @ada_rob: Here is a real-world example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as…

2022-11-18 19:37:42 @deliprao @Luke_Metz @jmes_harrison @bucketofkets Nope, JAX only for the moment -- unless you want to make the PyTorch port?

2022-11-18 19:36:41 @AIjedi @Luke_Metz Have we mentioned how great JAX is yet? JAX is pretty great. If some bold stranger wanted to port VeLO to PyTorch though, then that would be amazing.

2022-11-18 19:28:20 @short_spy Nope, not currently. Have I mentioned how great JAX is? Because JAX is really great.

2022-11-18 19:27:45 @TexasBigData https://t.co/C2sKUiOYgX

2022-11-18 14:29:34 @albertzeyer @bucketofkets @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob The model params are published. See https://t.co/G7bLCn2vkZ

2022-11-18 14:01:30 RT @giffmana: My colleagues managed to *actually* learn a generic optimizer. What was impressive to me is that with absolutely zero tuning,…

2022-11-18 04:59:22 RT @bucketofkets: This was a really fantastic project—the culmination of literal years of work led by .@Luke_Metz. We’re proud to release…

2022-11-18 04:57:18 RT @poolio: Learned optimizers finally work! Swap out Adam for VeLO: a learned optimizer that outperforms human-designed optimizers withn…

2022-11-18 04:57:13 RT @ada_rob: Very excited to have been a small part of this amazing project. More work to be done to make this optimizer the go-to for gi…

2022-11-18 04:50:10 And huge thank yous to collaborators @bucketofkets Amil Merchant @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob as well!!! https://t.co/rKHQ2dInXt

2022-11-18 04:50:07 If you are training models with <

2022-11-21 18:38:37 RT @wtgowers: Note that if X is a finite set and we take all its subsets, then every element of X belongs to exactly half the subsets. Yes…

2022-11-19 19:34:13 @w_t_payne Yes! Or -- we don't address it in this work, but that is another clear target for meta-learning.

2022-11-18 22:00:11 @yablak Many ML are going to https://t.co/1AiLl2tfTk.

2022-11-18 21:47:49 RT @ada_rob: Here is a real-world example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as…

2022-11-18 19:37:42 @deliprao @Luke_Metz @jmes_harrison @bucketofkets Nope, JAX only for the moment -- unless you want to make the PyTorch port?

2022-11-18 19:36:41 @AIjedi @Luke_Metz Have we mentioned how great JAX is yet? JAX is pretty great. If some bold stranger wanted to port VeLO to PyTorch though, then that would be amazing.

2022-11-18 19:28:20 @short_spy Nope, not currently. Have I mentioned how great JAX is? Because JAX is really great.

2022-11-18 19:27:45 @TexasBigData https://t.co/C2sKUiOYgX

2022-11-18 14:29:34 @albertzeyer @bucketofkets @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob The model params are published. See https://t.co/G7bLCn2vkZ

2022-11-18 14:01:30 RT @giffmana: My colleagues managed to *actually* learn a generic optimizer. What was impressive to me is that with absolutely zero tuning,…

2022-11-18 04:59:22 RT @bucketofkets: This was a really fantastic project—the culmination of literal years of work led by .@Luke_Metz. We’re proud to release…

2022-11-18 04:57:18 RT @poolio: Learned optimizers finally work! Swap out Adam for VeLO: a learned optimizer that outperforms human-designed optimizers withn…

2022-11-18 04:57:13 RT @ada_rob: Very excited to have been a small part of this amazing project. More work to be done to make this optimizer the go-to for gi…

2022-11-18 04:50:10 And huge thank yous to collaborators @bucketofkets Amil Merchant @giffmana @jekbradbury @naman33k @poolio @IMordatch @ada_rob as well!!! https://t.co/rKHQ2dInXt

2022-11-18 04:50:07 If you are training models with <

2022-11-25 01:59:21 @geoffreyirving @sbeckerkahn This just exceeded my mathematical depth. I don't disbelieve you though!

2022-11-24 21:09:10 @geoffreyirving @sbeckerkahn OTOH, rationals are dense in the 2d plane, and Brownian motion has a fractal dimension of 2, so probably a Brownian SDE would hit rational points?