Nullable schema if read avro files by spark
Versions
Spark - 2.2
Problem
If you read avro files by spark you can notice that in the resulted schema all fields are optional.
For example, if you read avro file with schema to dataframe
And then save dataframe to avro file back, then resulted schema will be:
There is oen difference in the schema: the field id became optional. That is happening because of one line dataSchema = dataSchema.asNullable
in DataSource class source code
There is a jira ticket where this topic was discussed. This logic has sense when you need to work with CSV file and there is no way to predict whether the field is nullable or not.
Workaround
Idea: get correct schema from one of avro files and then recreate dataframe based on correct schema.