How a Kaggle Master uses Link (Jupyter Notebook/Lab extension)

This posting is a translation of Kim Chan-ran’s posting who is a promising data scientist as well as a Kaggle Master. The copyright permission was obtained from the creator, and you can see the original posting here.

I’m writing this post about my impression of Kaggle X Link”

Link?

I’m sure there are many people out there who build in and use the JupyterLab environment for research and development on machine learning and artificial intelligence. JupyterLab makes it easy to handle everything from bash to ipynb. Have you ever used a Jupyter extension? Extensions are commonly used in writing VS code, but many people probably use JupyterLab itself without any extensions (I was one of them!). Link, which I’m introducing here today, is one of the representative JupyterLab extensions. Link is a development tool that really comes handy when you deal with pipelines while working on ipynb files!

Link is an extension for JupyterLab, a popular platform for data scientists, which lets developers convert AI/ML models to readable pipelines and easily share them with collaborators. Link allows users to:
1. Build interactive pipelines
2. Facilitate collaboration and communication
3. Expedite the development cycle
4. Track every modification to pipelines and source code
5. Synchronize the execution environment

Kaggle?

Kaggle is a platform that brings together people from all around the world who are interested in machine learning, data science, and artificial intelligence to participate in competitions with real-world data science problems and public datasets of diverse companies and organizations. Kaggle is truly a best-of-class community with many members working for leading global companies. You can learn a lot just by interacting with members of the Kaggle community. The Kaggle competition problems are quite challenging, the trends change quickly, and many source codes are shared. I know many of you are already well aware of the value of Kaggle, but here’s a quick summary! Please check it out and join Kaggle!

Opportunities to work with real datasets across multiple areas
Availability of various baseline codes
Latest trends-based code
Understanding of different approaches
Global networking
Opportunity to achieve

Tips to raise your score

Want to score high on Kaggle and improve your rank? Here are some tips! These are the tips that resonated with me the most during an interview with the top-tier Kaggle Grandmaster “bestfitting.”

Good CV
Post a good resume
Learn from other competitions
Read related papers
Show your mental strength

Learn and experiment to participate in Kaggle competitions

In Kaggle, ipynb the preferred file format, code is shared through notebooks, and many people grow together and compete based on the EDA and baseline. Kaggle offers a unique culture of sharing and competition!

Just speed up calculating atomic distances

However, depending on the nature of the Kaggle competition, you may face the common situation of not learning enough with only the GPU/TPU capacity provided by Kaggle for a one-week window. After the one-week period, you have to work in your local environment. When this happens to me, I always use JupyterLab.

In Kaggle, massive ipynb files are widely shared. This is because it’s easier to work on and view them in a web format. It seems to be a part of the Kaggle culture. If you’re downloading ipynb files and using them in your local environment, Link is the answer. It will help you learn and experiment more efficiently.

First of all, you can use Link to organize long code into pipelines. Check out the sample code at the link below.

[train] PyTorch-EffNetV2 baseline CV:0.49

If you look at the sample code here, you’ll notice that the code has so many lines. (Of course, the code also includes simple sample data tests or visualizations for training.) If you create pipelines through Link, this code can be organized like below.

Can you see the overall structure of the code? Rather than just looking at the lengthy code, it’s so much easier to understand if you can read the code along with these pipelines.

Let’s create a node to demonstrate!

Give the component a name and click the check button on the right to create a single node.

And when you create the next component, you can select the parent component, then the pipeline will be connected.

If you set the name and relationship to the components like that, they will look like this. Once you build your pipelines, they will appear in the form of a well-organized graph, as shown above!

If you build pipelines like this, you can utilize the various features Link offers.

First, click the desired node to move to the corresponding code block, which means you can use it as a bookmark! It makes navigating your code so much easier.

Next, you can easily execute any block you choose.

What has been executed, what is still running, what hasn’t been executed, and what has errors, all of these are indicated by color.

You can also cache execution results.

If the cached execution results are utilized in various cases, such as preprocessing or re-learning, you can save a lot of time and work efficiently because the same block doesn’t need to be re-executed.

Finally, code management and collaboration are possible. Link is also integrated with GitHub, so it really is very versatile!

For more details, please refer to the Link page on the MakinaRocks website!

» This article was originally published on our Medium blog and is now part of the MakinaRocks Blog. The original post remains accessible here.

MakinaRocks

2022-11-23

How a Kaggle Master uses Link (Jupyter Notebook/Lab extension)

How a Kaggle Master uses Link (Jupyter Notebook/Lab extension)

Systematic Experiment Management #1. Improving AI Model Performance

Application Specific Integrated Circuit (ASIC) Floorplan Automation - Part II

Building a Reinforcement Learning Environment

Introduction to Anomaly Detection with Machine Learning

Application Specific Integrated Circuit (ASIC) Floorplan Automation - Part I

Enhancing Predictive Maintenance with Machine Learning