[Solved] “File size limit Exceeded” error in Zeppelin (Amazon EMR Cluster)

When we import a JSON notebook into Zeppelin server, we are pretty aware that the size of the JSON notebook should not exceed 1 MB. But most of the times when creating a copy of the zeppelin notebook, we forget to clear output thus amounting to huge memory.

This tutorial illustrates how to increase the limit on JSON notebook import (Considering that the zeppelin is hosted in Amazon EMR cluster). But this will also work for most hosted servers

  1. Enter the SSH terminal of the cluster. (For Amazon EMR, the list will be next to ‘Master Public DNS’
  2. Now navigate to the Zeppelin configuration folder
    • [hadoop@ip-1-1-1-0 ~]$ cd /etc/zeppelin/conf/
  3. Create a copy of the Zeppelin-Site XML template and rename it as Zeppelin-site.xml
    • [hadoop@ip-1-1-1-0 ~]$¬†sudo cp zeppelin-site.xml.template zeppelin-site.xml
  4. If you do not define a new value in the zeppelin-site.xml file, the zeppelin server will continue to use it’s default limits. In case of JSON notebooks, it is 1 MB
  5. To replace the default value, we will have to edit the zeppelin-site.xml (Here we will use insert mode)
    • [hadoop@ip-1-1-1-0 ~]$ sudo vim zeppelin-site.xml
  6. Now press i to enter into insert mode and press enter
  7. Now the enter file will be displayed before you in the SSH terminal.
  8. You will have to find the property “zeppelin.websocket.max.text.message.size” and “‘zeppelin.interpreter.output.limit” . Mostly they will be in line 239 and 322. Default value will be 1024000 (in bytes)
  9. Change the value into something bigger (Usually bigger than the size of your zeppelin notebook)
  10. Now we will have to save the changes
    • For this press ‘Esc’
    • Press ‘:’
    • Then type ‘wq’
    • Then press ‘Enter’
  11. After saving the changes, stop and restart zeppelin
    • [hadoop@ip-1-1-1-0 ~]$ sudo stop zeppelin
    • [hadoop@ip-1-1-1-0 ~]$ sudo start zeppelin

Now you will be able to upload JSON notebooks which are bigger than 1 MB limit.

Got any more questions? Feel free to comment below.

Author: Vignesh Kumar Sivanadan
Data scientist for a leading data sciences company. And I am a freelance tableau developer and a consultant in Upwork too. (Hire me to work for you in Upwork. Click Here). These blog posts are my experiences.

1 Comment

  • Abhishek

    Use the following for emr version post 5.30 to start and stop the service
    sudo systemctl stop zeppelin
    sudo systemctl stop zeppelin

    verify the status of the services via :
    sudo systemctl status zeppelin

Leave a Reply