New Yorko Times

Каков из тебя Старший Прикладной Ученый в Нвидео

#interviews

В журнале “Лиза” сразу после моих любимых рецептов идет секция с опросами. Вот там попался такой, получится ли из тебя Старший Прикладной Ученый. За каждый вопрос можно по баллу. Результат – в конце.

Intro

1. Do you have experience dealing with super-large language models? Do you like do model parallelism at all?

2. Did you work with 70b models or only with 7b and 13b?

3. Do you have production experience with model alignment?

4. Okay, so you're saying you haven't started with DPO and RLHF stuff yet, right? (1 балл – за отриц. ответ)

NLP

5. Can you explain to me how self-attention works?

6. Now the same in mathematical terms

7. Can transformer inference be parallelized?

8. What’s the complexity of the self-attention operation?

9. After the self-attention, what happens in the transformer?

10. How many feed-forward layers are there in the transformer block?

11. What’s the dimension of the feed-forward layer?

12. So internally, it’s super wide. Do you know any reason why people design like that?

13. Do you know this paper where people can edit the transformer memory? Have you heard this?

14. Basically the knowledge is stored in the weights of the transformers, right? So like, for example, the Eiffel Tower is in Paris, right? So this knowledge can be edited. So they find out where the memory located. You know this paper?

15. Have you read about like Hopfield network? (No) Yeah, this is called associative memory. So it's a Hopfield network. It's kind of like an ML, feed-forward network, MLP. Basically, that's where the memory happens. You can store this key value.

16. Have you read the RETRO paper?

17. So have you done anything with RETRO before?

18. Do you know how this RETRO external information is feeding into the language model?

19. Can you explain to me what's the difference between T5 and GPT?

20. How does the encoded information fit into the decoder in T5?

21. So, can you revisit the question about RETRO feeding the retrieved documents into the decoder?

Coding

22. Let me first start with some easy questions. Can you explain to me what's the difference between variables on stack versus variables on heap?

23. It’s about memory allocation. So what's the main difference, how it's stored in memory?

24. So, have you done like programming? Anything apart from Python?

25. In Java memory management, do you know the few generations of the variables in the memory?

26. How does garbage collection work in Java?

27. How does a variable on a stack work?

28. How is it related to the scope of variables, e.g. global and local ones? Where are those allocated in memory?

29. Why does recursion use a stack?

Algorithms

30. (3 балла) Describe a solution to the “8 queens” problem. Describe the pseudocode (no need to write code)

31. (3 балла) What’s the complexity of the algorithm?

32. (3 балла) What’s the classic CS 101 algorithm for this problem?

---

Итого макс 38 баллов.

- Если у тебя 30+ – добро пожаловать в следующий раунд (в котором неизвестно что). Ставь 🤓 к посту, глянем, сколько нас таких

- Если у тебя меньше 30 баллов – ты ~~злобный тупой урод~~ нормис и на Старшего Прикладного Ученого в Нвидео пока не тянешь

пс. К слову, я мог закончить собес сразу после 24-го вопроса.