I've installed the aws command line on my Mac. It's super handy. However, the aws s3 command creates $folder$ files for every "directory" when a recursive copy is performed. It's super annoying.
For example, you could have a "directory" in S3 named "myfiles". When you download the objects with "myfiles" in the path you will end up with a file named "myfiles_$folder$".
Running aws --version returns this info:
aws-cli/1.10.6 Python/2.7.10 Darwin/14.5.0 botocore/1.3.28
I haven't found anything that explains how I can prevent those files from being created, so I've been doing manual cleanup afterwards. This is the command I run:
> rm $(find . "*$folder$")
Monday, March 7, 2016
Tuesday, January 12, 2016
Debugging a local Spark job using IntelliJ
A coworker was working on a local Spark job and shared how he set up his environment for debugging the job (which is basically the same as debugging any other remote process). These are the instructions I followed:
1. Create a remote debug configurations.
2. Copy the command line argument to use and modify it however you see fit.
1. Create a remote debug configurations.
Go to IntelliJ's "Run | Edit Configurations" screen
Click on the "+" to "Add New Configuration"
Select "Remote"
Click on the "+" to "Add New Configuration"
Select "Remote"
2. Copy the command line argument to use and modify it however you see fit.
I'm using Java 8, so I used the example command line arguments from the top edit box. The only change I made was to set "suspend=y" so the spark job would stop and wait for me to start my "Remote Debug" process.
This is what I used:
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
3. Export the command line arg as SPARK_JAVA_OPTS (Spark uses this value when you submit a spark job).
I set the SPARK_JAVA_OPTS like this:
export SPARK_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
4. Start the spark job.
You should see your spark job start up, and then pause with the following line printed on the console:
Listening for transport dt_scoket at address: 5005
5. In IntelliJ, create whatever breakpoints you want to use and start the remote debug configuration.
Subscribe to:
Posts (Atom)