Fork me on GitHub

June 24–25

PyCon Russia 2019

Рус Eng

Anton Bragin, JetBrains

Jupyter Notebooks – There is a Better Way

Scientific notebooks including Jupyter are extremely popular nowadays both among Python newbies and amateur data scientists. A six million estimate of the total amount of notebook users speaks for itself. However, there are notebook haters who point out that notebook usage contradicts good programming practices, the results are hard to reproduce and interactive development with notebooks is spoiled by the lack of consistency between a notebook code and running kernel state.

In my talk, I will review Jupyter pain-points and existing tools and approaches that could be used to fix or mitigate them.

The talk is recommended for both Jupyter fans, who are ready to critically review the whole concept behind scientific notebooks, and notebook pessimists willing to give a second chance to Jupyter, alongside with everybody who is interested in data analysis and presentation using Python.

Rishat Ibragimov, Yandex

Quantum computing with python: learning by example

Many have heard about quantum computers and the fantastic opportunities they open up. But not many people know that technologies have reached such a level that today anyone can write a simple Python program and execute it on a real quantum machine. Let us examine the basis of quantum computing with code samples, learn to run programs on a local simulator and a remote quantum computer.

Alexander Khaerov, Chainstack

Goodbye Virtual Environments?

A Pythonista has more ways to manage dependencies than any other programmer. What happened and how is this compatible with "The Zen of Python"? Should Python have only one instrument? In the past year, two new contenders entered the scene to fix packaging for good - Pipenv and Poetry. What PEP517 (build isolation) brings us? This talk is an attempt to review through DevOps point of view and real experience.

Nikita Grishko, Flo Health Inc

Evolution of dependency management

If you use Python, you most probably use virtual environments and pip to install packages. You may have requirements.txt with all dependencies; you may even have two of them, for example, requirements-dev.txt. But what if I tell you that this good old approach has apparent problems, and there is more than one instrument that tries to solve these issues?

My speech addresses the existing problems in managing dependencies. I will tell you as well on how the developers tried and still try to solve these issues, and we will take a closer look at a set of tools: pip-tools, pipenv, flit, poetry. Together we will decide if these instruments worth a shot, or is it just a trivial train of madness it's better to avoid.

Artem Korolev, Timur Kadyrov, Dentsu Aegis Russia

Creation of industrial datasets for deep learning approach

Using Python as a tool, we would like to talk about topic that is not covered by Coursera and data science blog on Medium.

— where does dataset come from

— is it worth to label pics with your own hands or give them to assessors market

— how to brief assessors team

— why some pictures seem to belong to the relevant class but not eligible for train dataset

— should i use the pic with a small resolution or trimmed badly

— should i wait for complete training or train network iteratively

— etc.

The main reason of our report is to prove that deep learning is not for big companies only with data science departments. We will tell you how to start image recognition with zero experience and come back to client with result in two weeks after first tests w/o Google, Azure, AWS vision API usage.

Anton Patrushev, Spherical

Python & Rust: it’s funnier together

Stanislav Kirillov, CatBoost

CatBoost and Python: fast training and prediction of CatBoost models in Python applications

Gradient boosted decision trees models solve many practical tasks. Training and prediction are often done with Python both in experimental and production settings. Understanding how input data is used inside GBDT libraries could give you clues to more efficient model trainig and prediction with low data transformation overheads.

Yandex lead developer Stanislav Kirillov will share experience on writing Cython wrapper for CatBoost C++ code. He will explain how to prepare data for effective passing from Python to C++. This talk will give you tips and tricks for training and prediction time optimization and will explain how to train CatBoost models on large datasets faster with low memory overheads

Sergei Borisov, Domclick.ru

Workshop «Testing asynchronous applications»

You all know about the importance of testing. I’m gonna show you how I test asynchronous apps including database and other infrastructural services with a little help of Docker and Pytest. During the master class we will pack the basic web application into Docker and seal it with tests. Furthermore, we will compare the usage convenience of unittest and pytest, look at how some of the pytest plugins are working and learn to use mocks only when they are truly needed. Finally, we will try to beat all the slow tests. All you need to participate is just Docker.

The master class is going to be particularly helpful for those who are just starting to write pytest tests. However, those who want to master fixtures and try new approaches to CI/CD are also welcome to come.

Denis Kataev, Tinkoff.ru

How to write applications on SQLAlchemy

Zlata Obukhovskaya, evangelist of Moscow Python community

Structured concurrency. Is asynchronous Python considered harmful?

Structured concurrency is an approach to asynchronous programming, implemented by Nathaniel Smith in his project Trio. In this approach, structurally connected concurrent functions are encapsulated within some context, which makes it easier to handle various failure scenarios, clean up resources and propagate errors.

This concept was adopted by developers from other ecosystems (Kotlin, C, Swift) thus raising new questions:

— What if instead of a set of concurrent functions we want to run a graph of concurrent functions?

— How to pass data between concurrent contexts in a safe manner?

— How to supervise concurrent graphs lifecycles?

— How to execute graph effectively on one core? On multiple cores?

— How to make execution deterministic?

Meanwhile these problems were more or less successfully approached in other programming languages.

Zlata Obukhovskaya, evangelist of Moscow Python community, will tell how these approaches are can be applied to Python.

Alexander Artemenko, Yandex

Michael Foord, Python core developer

Process Engineering: A Golden Age for Software Engineering

Python and its ecosystem are now mature and Python is one of the most popular programming languages in the world. From version control to continuous integration, IDEs to linters, documentation to deployment, virtualization to packaging, we have an unrivalled set of tools and practises available to us. Software Engineering is so much more than writing code and throwing it over a wall, in this talk we’ll take a slightly backward facing look at how modern tooling fits into powerful workflows for building and creating projects and how that’s been shaped by the community. Process Engineering: software development for humans.

Dmitry Khodakov, Avito.ru

CPU bound tasks in Python web services

In Avito, we often have to apply the machine learning models in realtime and combine asyncronous calls and heavy CPU bound operations. We are faced with the problems of scaling and high resource consumption by the final service.

— What to do if you need to combine i/o bound network operations and cpu bound calculations bound? Lets talk about multiprocessing in python and his friendship with asyncio.

— I'll show you how we accelerated one production system 50 times, abandoning pandas/numpy in favor of pure python.

Dmitry Orlov, Edadil

Asynchronous RabbitMQ driver from the author

When I started writing Open Source library for communication with rabbitmq using asyncio I imagined creating simple and clear interface for everyone. Now it looks like a success, but it also took a while. This story is about me going deep into the driver (pika), fixing random bugs, writing my own driver, and trying not to break public API and not to "harm" users. What's missing in asyncio to implement network libraries without pain. The AMQP 0.9 design problems.

Nikita Levonovich, EscapeRoomMakers

Micropython for arcade games and escaperooms

EscapeRoomMakers use Micropython for developing the last generation of escape rooms and arcade games. The modern escape room contains about 10 devices that communicate over a network with MQTT. Most of those devices are microcontrollers (ESP32) that control peripheral devices like mp3-players, relays, displays, diodes, buttons, etc. But a couple of years ago escape rooms had fewer devices, and the prevailing type of device was Arduino, which could communicate over several protocols.

I will dedicate my speech to the forming of architecture that EscapeRoomMakers use for solving such problems with Micropython. All examples in the speech can be run on the popular microcontrollers by Espressif Systems.

Raymond Hettinger, Python core developer

Build powerful, new data structures with Python's abstract base classes

  1. Learn how Python's abstract base classes work and why you would want them.
  2. Explore the rich ABCs for collections.
  3. Leverage that knowledge to build several, new powerful data structures:

— Bit sets

— Binary tree list

— Persistent file dictionary

— SQL based persistent, concurrent dictionary

Talk includes:

— Summary notes

— Working code

— Instructions for building your own collections

— Instructions for building your own ABCs

Vasily Litvinov, Intel

Profiling Python and C for fun and profit, or Pandas, go fast!

Brief coverage of current state of profilers, then focusing on tools that can do mixed-mode profiling (showing Python and C/other native functions in one go). Usage will be illustrated by analysing and speeding up some parts of Pandas.

Alexey Kuzmin, Domclick.ru

How to find and fix all the bottlenecks in Python

My name is Alexey. My services work fast, I never run out of memory and I don’t have to refactor my old code. Except for everything mentioned above is a huge lie. The truth is that throughout my career I’ve become extremely good at answering all the “why is it working slowly?”, “what’s wrong with that?” and “what should we do now?”. Now it’s time to share some of my life hacks and tools with you so you can save yourselves precious hours and avoid nervous breakdown while debugging. As a result you will leave with understanding which parts of your program are usually slow, how to reveal and fix them.

Antonio Cuni, core developer of PyPy

How PyPy can help for high-performance computing

Kirill Borisov, Booking.com

A Flat Too Smart: IoT + Python + whatever

Let's face it: we are caught in a vicious loop of earning money and spending them to have some respite from earning them. Each hour that you spend on this leaves you a little bit less of a person. To add insult to the injury, that leaves you with less time to try something new, forcing you to console yourselves with creating yet more legacy code on Python 2.

"Enough is enough!". With that thought in hand, I decided to fill my life with all the Python things trending on Twitter. Smart home! Python 3! Lambda! Async!

The Idea is simple on the surface: create a simple data acquisition system for gathering meter readings and data from sensors using only Python (provided that makes sense) and other hyped-up things: MicroPython, asyncio, living in the cloud and Telegram bot for a good measure. Come and see what had been accomplished, what turned out to be a folly and what you can learn from this unholy mess.

Maxim Mazaev, CIAN

Travis Oliphant, Anaconda Inc

Brief review of Array computing in Python, where it is heading, and why it matters

Python has been used to find gravitational waves and image black holes and has become the de facto language for machine learning. In this talk, I describe how this happened through the efforts of cooperative, community-driven open-source and dedicated volunteers. I will then describe how the big-data and deep learning communities have enhanced the landscape of array computing make accessible to Python users incredible computational capability. Finally, I will provide some thoughts and perspective on what may be coming next and what it will enable.

Nikolay Markov, Aligned Research

Packaging Python code from A to Z

A huge fraction of developers don't really bother about code culture and packaging, especially since things like Docker are around. Also, the infrastructure itself isn't a piece of cake, either, - we have "eggs", we have "wheels", we have "pipfiles" and "pyproject.toml"... All right, let's talk about some good practices of the Python project structuring (from CLI up to docs), walk through the classic approach of manual packaging and take a look at modern ecosystem around all that for our all-favorite language.

Andrew Vlasovskih, JetBrains

What's Coming in Python 3.8 and What isn't

Gleb Ivashkevich, datarythmics

Julia, Python and machine learning

Progress in machine learning and deep learning is blazingly fast, as new architectures emerge and shift in hardware for neural networks training and inference is on its way.

With these changes in landscape, Python is challenged by younger and more suitable languages. We will discuss one of Python contenders: Julia.

Julia was designed as a high performance, JIT compiled language for technical computing, and offers first-class metaprogramming facilities and interoperability with C and Python. We will explore the basics of Julia, and what it offers for machine learning and whether it worth a try if you're a data scientist.

Artem Malyshev, drylabs.io

Domain driven design tools

There is an essential and accidental complexity. I'll explain how to organize the first and to minimize the second. We'll talk about how to build your product around solving problems, not used frameworks. You'll know where is the best place to apply typing and dataclasses. We'll consider where contract programming and pydantic is useful to us. We'll try libraries from the dry-python project. And of course, we will mention tests. Practice only. No UML included.

Ivan Tsyganov, Positive Technologies

(Un)safe dependencies

The past seven years, the threat of "Using components with known vulnerabilities" has ninth in the OWASP TOP-10 rating. We will consider the consequences of using outdated versions of libraries and the interpreter itself. I'll show you how an attacker can exploit known vulnerabilities in the Django and Django Rest Framework, the SQLAlchemy, lxml, PyYAML, and aiohttp-session libraries, and in the Python 2 and Python 3 interpreters themselves.

Adil Khashtamov, Playrix

ETL tools in the Python ecosystem

Any organization which produces data while operating will eventually face the stage when the number of regular tasks increases and dependencies among them become complicated. It is crucial to make those pipelines reliable and fault tolerant. In my talk I would like to present the ecosystem of tools and frameworks to build data pipelines in Python in order to collect, enrich and load data to Data lakes and Data warehouses. Especially I will focus on Luigi, Airflow, Prefect and Celery and how we deploy our data pipelines with Luigi to production at Playrix.