Did you see this post on MaxInteractiveDataSize? It could be relevant to both parts of your question.
Regarding the error, some questions:
- Please click on the Status Icon and look at the Monitor: System screen.How much of your memory utilization is your datasets and how much is your tables?
- Was that memory utilization screencap from after the operation?
- Did the % of memory utilization change during the attempted operation?
- Which cluster were you using, Dev or Prod?
Following the logic of the post (linked above), a maximum of 49% of your cluster's memory is set aside for datasets if using default settings. That would suggest that if your dataset is large enough (~28% of the total memory of your cluster), then both your cluster could have been at 30% memory utilization and adding the dataset could be too much.
I've asked for support to look at your logs for the last day.
Regarding you other questions...
MaxInteractiveDataSize affects import, but not modeling. Your modeling operations don't change the data, so the data isn't growing. Also, this doesn't apply to operationalizing your insights, so once you've figured out how to extract the data you need from these sources, you can apply the finding to even bigger data sources with high performance.
Regarding your unstructured data, how do you intend to model with it? Is it binary data to add as a blob, and apply a machine learning algorithm of some kind, or is it text data? The most common case is text data, where you will need to choose a delimiter to import into fields. You can use modeling operations to break those fields down further.
*If you don't know about the contents of the data, then your first step is to look at the raw data view to see if you can find a reasonable delimiter:
Once you select the file, click VIEW RAW DATA.
You will see the contents of the file.
If the data is proprietary, we can continue to discuss this using private messaging, and I will update this thread with the parts I can share. Please reply with the answers to the questions above, if the data sources were not too large relative to the size of your cluster.